Introduction
Google Scholar is great resource, but it's lacking an
API. Until there is one,
scholar.py
is a Python module that implements a querier and parser for
Google Scholar's output. Its classes can be used
independently, but it can also be invoked as a command-line
tool. It could definitely use a few more features, such as
detailed author extraction and multi-page crawling. If
you're interested in adding features, do send patches!
(Thanks to those of you who have—you know who you are.)
Features
- Can extract publication title, main online URL, number
of citations, number of online versions, link to Google
Scholar's main cluster for the work, and Google Scholar's
cluster of all works referencing the publication.
- Can print entries in CSV format or plain text.
Example
Try scholar.py --help for all available options. A simple example:
$ scholar.py -c 1 --txt --author einstein quantum
Title Physics and reality
URL https://www.sciencedirect.com/science/article/pii/S0016003236910475
Citations 322
Versions 5
Citations list https://scholar.google.com/scholar?cites=6799563874330167610&as_sdt=2005&sciodt=1,5&hl=en
Versions list https://scholar.google.com/scholar?cluster=6799563874330167610&hl=en&as_sdt=1,5&as_subj=eng
Download
The code used to be available here, but as of November
2013 resides over on GitHub
for all your fork and pull request needs. Thanks!