Introduction
Newspaper The Guardian (and its sister paper The Observer) is available digitially via Newspaper Direct in an "as printed" form. Reading an as printed form on-line doesn't make much sense, but would be nifty would be automatically get the latest copy and transfer it to an e-reader. You could then sync that reader automatically overnight and have the latest paper ready to go in the morning.
Unfortunately Newspaper Direct don't make it easy to get the latest copy as a PDF or other format, so you have to connect to the site and download each section manually which pretty much defeats the point.
This script goes through the login process and scrapes the html files to find each section available. It then downloads a PDF of each section to a directory of your choice, structured by date. It also creates a symlink to the latest edition of each paper. If a section already exists no download is done, so it's safe to run multiple times overnight.
guardiangrab is an enhancement of work done by Ladislav Snizek in guardianpdf.
Install
- Grab guardiangrab-0.1.tar.gz
- Unpack
- make install
In your home directory create .guardiangrab containing:
login=
# Your Newspaper Direct password
password=
# Base directory for stored files
destdir=/media/paper
Using
Just guardiangrab
