Class Meetings:
Backup slot: please also hold Thursdays 5-6:30PM. See the schedule below for currently identified such dates.
Addresses:
Announcements, questions: the class Piazza site, which you sign up for here.
Feel free to email any questions/comments you want to make privately to the instructor at vern@berkeley.edu.
Course Description:
Summary: This class explores techniques and considerations for conducting computer science research rooted in empirical observation. Topics include measurement methodology, meta-data, use of external datasets, assessing data quality, calibration, sampling, statistical summaries, visualization techniques, goodness-of-fit, hypothesis testing, periodicities, non-stationarities, change-points, structuring the analysis process, and presentation of results. Ultimately, the goal is to foster analysis of empirical data that is both sound and illuminating.
Prerequisites:
Basic probability/statistics;
basic familiarity with networking helpful;
active in research;
complete the student survey.
Auditors and undergraduates require instructor approval.
Expectations/Grading: Students will "bring their own data" for exploration (via presentation) during class meetings. Ideally, this will come from their current or previous empirical research efforts, but if not, students can instead (or in addition) select empirical studies from the literature or publicly available datasets for presentation and analysis. The number of presentations will depend on the class size, though will not be more than 2 or 3. The course will also include occasional reading and/or data analysis assignments.
1.5 hours of lecture per week. 1-2 units. Grading for 1 unit of credit is based on class presentations, participation/engagement, and writeups for the occasional assignments. For 2 units, students also submit a substantial report (due 2PM Mon Dec 15) detailing a research effort that includes significant empirical analysis, which is graded on the correctness/thoroughness/clarity of the data presentation, and the soundness of the analysis.
Note: this is the first offering of the course and as such its structure
is experimental and subject to change. Feedback on what to improve
(and/or what's going well) appreciated and highly helpful!
Students potentially interested in this course might also want to consider
the
INFO 290: Exploratory Data Analysis
seminar,
INFO 271B: Quantitative Research Methods for Information Systems and
Management,
CS 294-103: Mathematical Foundations of Data Science,
and/or
CS 194-16: Introduction to Data Science.
Assignments:
Homework #2: due 5PM Mon Sep 15.
Class Presentation: topic and preferred dates due Fri Sep 26.
Homework #3: due 4PM Wed Nov 19.
Schedule:
Tuesday meetings are in 320 Soda.
The location for Thursday meetings are as noted in the schedule.
Date | Room | Topic | Notes |
---|---|---|---|
Thu 8/28 | 320 Soda | Organizational Meeting | |
Thu 9/4 | 320 Soda | Data Characterization: Keystrokes | Slides |
Tue 9/11 | No lecture | ||
Tue 9/16 | Data Characterization: Keystrokes, con't | Slides | |
Tue 9/23 | Data Quality: Route Measurements | Slides | |
Tue 9/30 | Data Quality: Route Measurements, con't | Slides | |
Tue 10/7 | Shankari (bicycle usage): Pablo (sensor indications of stress) | Shankari's slides; Pablo's slides | |
Tue 10/14 | Brad (malware assessments); Jethro (Heartbleed) | Brad's slides; Jethro's slides | |
Thu 10/23 | 380 Soda | Paul (click fraud) | (Contact Paul for slides.) |
Tue 10/28 | Joao (Amazon reviews); Frank (Twitter compromise) | Joao's slides; Frank's slides | |
Tue 11/4 | No lecture | ||
Thu 11/13 | 606 Soda | Kristin (edX and CS169); Zack (edX and CS169) | Kristin's slides; Zack's slides |
Thu 11/20 | 531 Cory | Neeraja (Hadoop workloads) Lecture on tweet automation analysis | (Contact Neeraja for her slides.) Lecture slides |
Tue 11/25 | Peter (loan performance) Lecture on tweet automation analysis, con't | Peter's slides. Corrected slides. Lecture slides | |
Tue 12/2 | Allon (drug mechanisms); Sara (meta data exploration) | (Contact Allon for his slides.) Sara's slides | |
Tue 12/9 | Sarah (large-scale scraping); Kaifei (Wifi localization) | Sarah's slides; Kaifei's slides | |
Thu 12/11 | 320 Soda | Mangpo (edX demographics); Grant (ad injection ecosystem) | |
Thu 12/11 | HKN evaluations (last 10 minutes) |