CS 294-105: Homework #3 - Due 4PM Wed Nov 19
Complete this assignment by by 4PM Wednesday Nov 19.
Turn in your answers via email to vern@cs.berkeley.edu
with the term Homework in the Subject.
This assignment is meant to lay the ground work for some class discussion
about analysis techniques. It is fine for you to spend only a modest
amount of time on it. Those of you either (1) presenting on Nov 20, or
(2) putting together draft materials to post on Piazza on Nov 18 for your
presentation the following week, can skip this assignment if you wish,
given that I didn't provide much advance notice that it would be forthcoming.
(Please send me a note if you're doing this so I track this correctly.)
- Download the Tweet timings dataset.
This dataset consists of the timestamps of tweets sent by about
500 Twitter users, organized per-user.
- Briefly characterize the dataset and assess quality issues.
- The analysis question we would like to go after with this
dataset is: To what degree can we confidently state whether some
of the accounts reflect automated posting, rather than a user
tweeting their thoughts as they compose them? (This question arose
in the broader context of how much of Twitter's population is
in fact robots rather than humans.)
Briefly write up your thoughts on what sort of analyses you could
undertake to assess this question. (Feel free to scope the effort
as you see fit.) Include discussion of potential difficulties you
foresee, and what other data/meta-data you wish the dataset included,
if any. Be prepared to talk about your approach at the Nov. 20
class meeting.
- Optional: Present results (as far as you wish to pursue
them) and conclusions developed by your analyses.
(As always, I welcome feedback on this assignment.)