PyThumbnail
PyThumbnail is a Python script that uses Gecko (Firefox's rendering engine) to generate a thumbnail of a web page. It is headless, meaning you can call it from the command-line or from another program and it doesn't require a running X server. It is based on Ross Burton's screenshot-tng.py, which itself derives from previous work by Matt Biddulph et al.
My main additions are the automatic launching of a VNC server, for headless / batch usage, and a guardian process to kill the main process after a timeout, so that the script won't wait forever in the case of network problems or other nuisances.
I have also written a helper script to call many instances of PyThumbnail in parallel, reusing the VNC servers.
VNC
PyThumbnail launches its own VNC server when the DISPLAY environment variable is missing.
This is a great way to generate a thumbnail from cron scripts or other non-interactive facilities. VNC server
creation and destruction are guarded by a lockfile, to prevent concurrent calls to the system command
vncserver from two instances of PyThumbnail. This is needed because vncserver does
not lock its own files and directories and will fail miserably if called more than once at the same time.
Guardian process
PyThumbnail launches a guardian process before creating the GTK window. This process will sleep for a given time, waiting to be killed by the main process as soon as the thumbnail has been generated. If the guardian returns from the sleep before the main process has completed its business, it means that something is taking more time than allowed: an error message is printed on stderr and everything is killed.
Usage
python pythumbnail.py Usage: pythumbnail.py [-w WIDTH] [-h HEIGHT] [-o OUTPUT_FILE] URL Will launch its own VNC server if DISPLAY environment variable is missing Will write to standard output if the -o option is missing
PyThumbnail Launcher
Doing some batch thumbnail generation, I noticed that (unsurprisingly) half of the time was spent creating and destroying VNC servers. So I wrote PyThumbnail Launcher, another Python script that creates a pool of VNC servers and uses them to launch many concurrent PyThumbnails, to work away at a list of URLs.
On my test system, this launcher can generate 100 thumbnails of remote websites in under 1 minute, with just 10 threads.
Usage
python pythumbnail-Launcher.py Usage: pythumbnail-launcher.py [-v] [-d TARGET_DIR] [-n N_THREADS] URLS URLS can be provided either in the commandline or on standard input
More threads mean more parallelization, but also more resource usage, as each thread starts its own VNC server. The default is 10.
Download
pythumbnail.py
(size: 4.8K;
license: BSD-like;
last updated on 8 Oct 2008)
pythumbnail-launcher.py
(size: 3.1K;
license: BSD-like;
last updated on 8 Oct 2008)
Requirements
You will need a POSIX system with Python, VNC server and PyGtkMoz.
Instructions for installing the requirements on a Debian Sid system:
- sudo apt-get install vnc4server python-gtk2-dev libxul-dev
- Run vncserver in a terminal the first time, under the userid you will use to generate the thumbnails (you might want to create a system user just for that purpose); enter a password when requested, which you will not use for PyThumbnail but is still required.
- Kill the VNC server you just started, with vncserver -kill DISPLAY (use the display number printed to the screen in the previous step.)
- Remove
~/.vnc/xstartupfor the PyThumbnail user and put a symbolic link to/bin/truein its place: this will strip auxiliary processes such as window managers from that user's VNC servers. - Download and extract
pygtkmoz-0.1.tar.gz - Edit
setup.pyandMakefile, changing every occurrence ofmozilla-gtkmozembedintoxulrunner-gtkmozembed - make
- sudo python setup.py install