~drscream

PyPI Mirror on Ubuntu machine

This post will contains some details howto create a PyPI mirror on an Ubuntu machine. I write this post because I run into some problem with an cronjob setup.

Setup the mirror

You should use the bandersnatch scripts to sync your mirror from the master server. It works much better than the pep381run script. I followed the official guide from the PyPI site.

Use virtualenv:

I recommend to use a python virtualenv to keep your system clean.

# Setup virtualenv to /opt
virtualenv /opt/mirror-pypi

# Switch to virtualenv
source /opt/mirror-pypi/bin/activate

Install via pip:

Simple install bandersnatch via pip. It also handle the requirements for you.

pip install bandersnatch

Issue with cronjob

The default shell for cronjobs in Ubuntu is /bin/sh which is a symlink to /bin/dash. The documentation of bandersnatch contains an howto setup a cronjob.

The problem is, it’s using a bash function with piping to logger:

# Will not work in dash or sh
bandersnatch mirror |& logger -t bandersnatch[mirror]

To fix the problem switch to bash as default cronjob shell. Modify /etc/crontab and change the SHELL variable:

# Replace SHELL from /bin/sh to /bin/bash
SHELL=/bin/bash

Now it’s time to setup the cronjob which using the virtualenv:

# Modify crontab
crontab -e

# Add the following line to run every hour
0 * * * /opt/mirror-pypi/bin/python /opt/mirror-pypi/bin/bandersnatch mirror |& logger -t bandersnatch[mirror]

Monitoring mirror status

Because of the cronjob issue I created a monitoring script for nagios. This script check the last-modified file created by bandersnatch. It’s a simple bash script which covert every date to UTC and unix timestamp to check the age of the mirror.

#!/bin/bash
# Thomas Merkel <tm@core.io>
# Check PyPI mirror with nagios

PROGPATH=$(echo ${0} | sed -e 's,[\\/][^\\/][^\\/]*$,,')
REVISION="1.1"

source ${PROGPATH}/utils.sh

# Function to print help
function help() {
	print_revision ${0} ${REVISION}
	echo
	echo "${0} -u <url/last-modified> -w <warning seconds> -c <critical seconds>"
	echo
	echo "OPTIONS:"
	echo " -u <url/last-modified>:  URL to the last-modified file"
	echo " -w <warning seconds>:    Difference in seconds for warning (1800)"
	echo " -c <critical seconds>:   Difference in seconds for criticial (3600)"
	exit ${STATE_UNKNOWN}
}

# Parse all option
WARN=1800
CRIT=3600
while getopts "h?u:w:c:" opt; do
	case "$opt" in
		h|\?)
			help
			;;
		u)
			URL=${OPTARG}
			;;
		w)
			WARN=${OPTARG}
			;;
		c)
			CRIT=${OPTARG}
			;;
	esac
done

if [ $# -eq 0 ]; then
	help
fi

shift $((OPTIND-1))

# Download and check curl return date
l_date=$(date -u)
r_curl=$(curl -sq ${URL})

if [ ${?} -ne 0 ]; then
	echo "CRIT: failed to download last-modified file"
	exit ${STATE_CRITICAL}
fi

# Convert remote date to utc timestamp, because the file contains
# another date format we need to convert it
r_date=$(echo "${r_curl} UTC" | sed "s:T: :")

# Convert to unix timestamp
l_unixtime=$(date -d "${l_date}" +%s)
r_unixtime=$(date -d "${r_date}" +%s)

# Check difference
if [[ ! $((${r_unixtime}+${CRIT})) -lt ${l_unixtime} || \
	  ! $((${r_unixtime}+${WARN})) -lt ${l_unixtime} ]]; then
	echo "OK: mirror is up-to-date [remote ${r_date}]"
	exit ${STATE_OK}
fi
if [ $((${r_unixtime}+${CRIT})) -lt ${l_unixtime} ]; then
	echo "CRIT: mirror out of sync [remote ${r_date}]"
	exit ${STATE_CRITICAL}
fi
if [ $((${r_unixtime}+${WARN})) -lt ${l_unixtime} ]; then
	echo "WARN: mirror out of sync [remote ${r_date}]"
	exit ${STATE_WARNING}
fi

It’s build for Nagios and using the utils.sh script from it. Move the file to /usr/lib/nagios/plugins or maybe any other plugins folder which contains utils.sh.


  1. Jason says:

    Tue 11/08/15, 5:32 pm

    I came across your Nagios script for monitoring Bandersnatch to ensure your mirror is staying updated. That’s perfect for what I need. Can you tell me what url you point your script to? I know it’s “URL to the last-modified file”, but I’m not sure what URL that would be. Just pick a specific package URL to monitor?

    /

  2. Jason says:

    Tue 11/08/15, 6:18 pm

    Ah nm, figured it out. I didn’t realize there was an actual file called last-modified in the web root J. Great script – The github link to it was dead, but you should add it back there.

    /

  3. drscream says:

    Tue 11/08/15, 10:13 pm

    I will also do my best to update the GitHub link as soon as possible. I hope the script is working for you.

    /

Send your comment by mail.