Scratchpad

To content | To menu | To search

Sunday, April 11 2010

Python project from scratch with distutils and friends

Python has several more or less standard tools to properly create and/or distribute python code:

  • virtualenv allows to create a self contained python environment, with any dependencies you want, without impacting the system instance. Extremely useful for developing, testing or even running any python code.
  • distutils provide the basics of python package management. It's now quite old, so to do something you must use some of the extensions. Distutils2 is around the corner.
  • setuptools are extensions to distutils; it's one of the most wide spread library for python packages currently. It includes easy_install which allows to easily install python packages.
  • distribute is a fork of setuptools. It evolves regularly (which is not the case of setuptools) and is maintained by the creators of distutils2.
  • PIP is a replacement for easy_install.

We're going to use Distribute and PIP here it looks like they become the standard.

Distribute guide has more complete and accurate documentation. What I'm describing below is really the minimal set to get started with a basic python project.

Let's assume that your project is called $PROJECT. We'll store it in $BASEDIR, typically something like hello-0.1. $BASEDIR is the current directory.

  • First, let's prepare a python environment to play safely. Get virtualenv; that's probably the easiest way:
  • wget 'http://bitbucket.org/ianb/virtualenv/raw/tip/virtualenv.py'
    
  • On my machine, there's a small issue with multiarch mode, so an extra symlink is needed. Most of the time it should not be needed:
  • mkdir -p env/include/multiarch-x86_64-linux
    ln -s ../python2.6 \
      env/include/multiarch-x86_64-linux/python2.6
    
  • And then create the python environment:
  • python virtualenv.py --no-site-packages --distribute env
    

    It will create a subdirectory env containing the virtual environment.

  • Activate the python environment:
  • source env/bin/activate
    

    From now on, everytime you invoke python or similar commands, it will be using the self-contained environment. That includes when you call a setup.py which is pretty useful.

  • Let's now create our basic source. Edit file $BASEDIR/$PROJECT/main.py. It will contain the main function of your application. We put the file in a subdirectory called with your project name; this way, all your code is part of the python module $PROJECT, which is way cleaner. The content of the file is something like that:
  •  def main():
       print 'Hello, World!'
    

    This is not a completely usual self contained script. The packaging system will take care of creating the binary, which will call your function.

  • To make sure that Python knows that your subdirectory is an actual module you need to add an empty file named __init__.py next to your main.py.
  • Now, let's create setup.py, which will create the package description.
  •  from setuptools import setup, find_packages
     setup(
       name='hello',
       version='0.1',
       entry_points = {
         'console_scripts': ['hello = hello.main:main']
       },
       packages=find_packages(exclude=['ez_setup']),
       install_requires=[
         'nose'
       ],
     )
    

    Many other things are possible to add here (including all details such as author name and so on), but I've kept the minimum here. The install_requires list describe the list of package your software need. Here, I've requested only nose, which is a test framework. You can list whaterever you want here, including things like Twisted. You can found all Python package on Pypi.

    The line console_scripts describe the list of binaries that you package provides. In this case, we provide one binary called hello, which will call function main from module hello.main.

From here, we're pretty much set. You can run python setup.py to have more information on how to manipulate your package. Remember to always do that while being in your virtual python environment.
  • To easily work on your project, you can do the following command:
  • python setup.py develop
    

    It will prepare your package exactly like if you were installing it, but will reference directly your python code with symlink. This way, you can easily test your code while working on it, without having to reinstall everytime.

  • To distribute your package, simply do the following command:
  • python setup.py sdist
    

    It will create a .tar.gz in the dist/ subdirectory.

There are many other possibilities. Executing python setup.py can give you a lot more information. The default sdist tarball doesn't provide a virtual env script which is not practical. It's possible to create a custom version of virtualenv do distribute along your package, allowing extremely easy installtion.

Sunday, January 31 2010

Migrating from an IMAP server to GMail

I started to configure a Google Apps for domains and I want to cleanly migrate existing mails from my server. The wonderful imapsync does all the hard work. However, to have a clean migration to GMail, you need to have the right flags. Here is what I'm using:

imapsync \
    --host1 imap.example.com \
    --user1 myuser \
    --passfile1 passwd1 \
    --port1 993 \
    --ssl1 \
    --authmech1 LOGIN \
    --prefix1 INBOX. \
    --host2 imap.gmail.com \
    --port2 993 \
    --ssl2 \
    --user2 myuser@mygoogleappsdomain \
    --passfile2 passwd-gmail \
    --prefix2 '' \
    --regextrans2 's/Drafts/[Gmail]\/Brouillons/' \
    --regextrans2 's/Sent/[Gmail]\/Messages envoy&AOk-s/' \
    --regextrans2 's/Spam/[Gmail]\/Spam/' \
    --regextrans2 's/Trash/[Gmail]\/Corbeille/' \
    --authmech2 LOGIN \
    --syncinternaldates \
    --skipsize \
    --useheader 'Message-Id' \
    --useheader 'Date' \
    --exclude 'INBOX.Trash' \
    --exclude 'INBOX.Spam' \
    --expunge \
    --delete2 \

Those parameters work perfectly for migrating from a Courier-IMAP server to a GMail configured in french. It would work with GMail in other languages, you just have to change the regextrans2 options to match your locale.

A few details about the flags I'm using:

  • Folders renaming:
    --regextrans2 's/Drafts/[Gmail]\/Brouillons/' \
    --regextrans2 's/Sent/[Gmail]\/Messages envoy&AOk-s/' \
    --regextrans2 's/Spam/[Gmail]\/Spam/' \
    --regextrans2 's/Trash/[Gmail]\/Corbeille/' \

Those regexps will make sure that your "special" folders (sent mail, spam, trash, drafts) are correctly mapped in GMail. Be careful, the target name depends on your locale in GMail.

  • Folder exclusion:
    --exclude 'INBOX.Trash' \
    --exclude 'INBOX.Spam' \

There's probably no need to sync up your trash and even less your spam box.

  • Message matching:
    --skipsize \
    --useheader 'Message-Id' \
    --useheader 'Date' \

imapsync is a synchronization tool, you can re-run it several time to keep the copy up to date. However, it needs for that to be able to identify what is a duplicated message. That's an harder job than what one can think of as the semantic of each imap server can differ quite a lot. The best combination I've found to move to gmail is to ignore the size and rely on Message-Id and Date headers. Message-Id is supposed to be unique, which should be enough. However, it can happens that a message has no Message-Id; in my case, it was a bunch of sent mail from an old outlook express. To avoid having to restransfer those messages at each sync, adding header 'Date' helps a lot.

  • Message deletion
    --delete2 \

Be careful with this flag. This flag will delete any message on the target mailbox which is not anymore on the source mailbox. It's really useful when you do a several step synchronization for a migration, as it will take into account messages removed between each sync.

  • Message cleaning:
    --expunge \

This flag will expunge messages on the source server. It will avoid having to (slowly) copy messages already marked as \Deleted on the source. It should be safe, but still keep an eye on what you're doing. In practice, IMap deletion works in 2 times: messages is marked as \Deleted and then those deleted messages can be finally deleted, during an expunge. Depending on your server configuration, messages can be automatically expunged or not. In my case, I had a few not expunged messages dating from when my mail server was not doing automatic expunge.

Friday, January 29 2010

Perl virtual environment

While Python as the extremely useful virtualenv tool, I haven't found anything similar for Perl. What I want is quite simple: being able to test random perl tools. They usually depend on a various set of libraries, some of them either too old on my system or not even present. Most of the time, I just want to test the tool, so installing random perl modules system-wide is not an option. Moreover, I really don't like having to install something system-wide which is not coming as a package for my distribution. The solution is to have a virtual environment where I can install all the libraries I want, somewhere in my user directory. This has the advantage too of allowing to use a different version of a library for different perl tools.

While I haven't found an all-in-one tool for perl, there's the CPAN module called local-lib which helps a lot. You can find its documentation here. Here is how to use it to create a virtual perl environment:

  • ${PERLENV} represents the directory where you want to have your perl virtual env. E.g.:
export PERLENV="$HOME"/perl5
  • Create the directory and go in it to make our life easier:
mkdir -p "${PERLENV}"
cd "${PERLENV}"
wget -O local-lib.tar.gz "${TARBALL_URL}"
tar xvf local-lib.tar.gz
cd local-lib-*
  • Bootstrap it:
  perl Makefile.PL --bootstrap="${PERLENV}"
  • If that's the first time you use CPAN, you will have a message like, that you probably want to confirm:
   Would you like me to configure as much as possible automatically? [yes]
  • Then install local-lib:
   make test && make install

The environment is now more or less setup. However, to use it, you need to set a few environment variables. Here are 2 helpers for that:

  • A wrapper script for perl. Anytime this particular perl wrapper will be called, it will behave like a normal perl, but using your environment:
cat > $PERLENV/perl << EOF
#!/bin/sh
PERLENV="${PERLENV}"
perl -I\$PERLENV/lib/perl5 -Mlocal::lib=\$PERLENV -MCPAN "\$@"
EOF

chmod 755 $PERLENV/perl
  • If you need to use other programs without an easy way to override the perl binary to use, you can create an activate script which would setup the environment in your current shell. Once created, you will just need to do source "${PERLENV}"/activate and then anything that needs perl in the current shell will use your custom environment.
cat > "${PERLENV}"/activate << EOF
PERLENV="${PERLENV}"
eval \$(perl -I\$PERLENV/lib/perl5 -Mlocal::lib=\$PERLENV -MCPAN "\$@")
EOF

Tadam, you local perl is now usable. You can start installing the CPAN modules you want and they will be installed in the environment:

"${PERLENV}"/perl -MCPAN -e 'CPAN::install(POE::Filter::IRCD)'

Or the CPAN shell:

"${PERLENV}"/perl -MCPAN -e shell

And if you want to install your favorite perl tool:

source "${PERLENV}"/activate
cd myfavoritetool
./runit

Various notes:

  • I hardly know anything about perl - I might have missed some things. However, this approach works and seems somehow clean.
  • This is not a fully isolated environment; system libraries will be available too. That's usually more an advantage than an issue though.
  • The main issue is that it still relies on your //~/.cpan// configuration directory, so you cannot have CPAN parameters specific to your environment. AFAIK, there's no easy way around that.

Saturday, April 4 2009

Booting on a virtio disk with qemu

Recent versions of Qemu, like version 0.10.1, are able to use virtio disks if your guest kernel support it:

While kvm has an option boot=on for -drive descriptions, qemu doesn't have it yet, and so is not able to boot from virtio disks by default.

To overcome this limitation, I'm using the same disk image both as a virtio disk and as a regular hda:

qemu -drive file=vm.img,snapshot=on -drive file=vm.img,if=virtio -boot c [...]

I add snapshot=on to the "fake" drive. Theorically, as the linux guest is configured to boot on /dev/vda1, it should never try to modify /dev/hda. However, this trick is prone to mistake, and having the same image written for 2 different drives would probably be really bad.

So, qemu will boot on drive @hda@, the boot loader (grub in my case) will load the kernel from it too, but then, because I give root=/dev/vda1 to the kernel, it will properly use virtio disk, while avoiding complicated setups with a dedicated boot partition.

Saturday, February 21 2009

Redirecting a port to another one with iptables, including local packets.

For some reason, I have a smtp server running completely as user and which cannot open port 25; it listens to port 2225 instead. So, I want to redirect port 25 to port 2225, using iptables.

The standard redirection is simple and can be easily found on your favorite search engine:

iptables -t nat -A PREROUTING -p tcp --dport 25 -j REDIRECT --to-port 2225

However, due to the way iptables works, this redirection won't work if you're trying to connect to port 25 from the machine itself, through localhost or through one of the IPs of the machine.

The problem is that there's no INPUT chain in the nat table, so we cannot modifies the packets when they arrive on the host. A classic option is to use the CONNMARK jump target to mark appropriate packets when you can detect them and act on it later just by detecting the mark. In this case, it would mean marking the packet in the INPUT chain of the filter table and modifying the packet in the OUTPUT chain of the nat table. However, the INPUT chain for packets going to the machine is obviously taken after the OUTPUT chain for packets issued by the machine.

The solution I've kept is to simply do a maching based on the IPs of the machine. It means doing basically one rule per IP of the machine, which is not that great but probably ok as long as you don't often change of IPs.

In my case, this machine has 2 IPs and of course the localhost interface. In pratice, for the two IPs:

iptables -t nat -A OUTPUT -p tcp -d 1.2.3.4 --dport 25 -j REDIRECT --to-port 2225
iptables -t nat -A OUTPUT -p tcp -d 5.6.7.8 --dport 25 -j REDIRECT --to-port 2225

And for localhost:

iptables -t nat -A OUTPUT -p tcp -d 127.0.0.0/8 --dport 25 -j REDIRECT --to-port 2225

Wednesday, February 11 2009

Simple generator of relay recipient map for basic secondary MX

A common problem with a secondary MX server for simple mail domains is the back-scatter spam. The spammer sends a mail to the secondary MX of a domain, using a non-existant username. If the secondary MX is configured to accept everything for the domain (which is usually the case), it will accept the mail and then try to transmit it to the primary. The primary will see that the username doesn't exists and bounce the message back to the spoofed FROM, hence spamming it.

The solution for that is to have the secondary MX check the existence of the username. Since the purpose of the secondary is to receive mail when the primary is down, it cannot ask for it dynamically and so must have a kind of static copy. For small domains (typical unix box with username being the local users and a few aliases), there's no existing sharing protocol for this information.

Since I'm not modifying often my aliases, I've choosen to simply generate a file with all aliases, copy it to the secondary and make it used by the local postfix.

This dead simple script will generate a map file suitable for postfix from the local users extracted from /etc/passwd and from local aliases in /etc/aliases. You just need to give it the domain name as parameter (e.g., palats.com ) and it will generate the list. It will match your actual configuration only if local delivery is based on aliases and local unix users. You can check that by looking for the following line in the output of postconf :

local_recipient_maps = proxy:unix:passwd.byname $alias_maps

Once you have this list, generated on the primary, you can copy it to the secondary, as example in /etc/postfix. To make postfix use it, you need to :

  • Run postmap /etc/postfix/relay_palats_com to generate the .db file corresponding to the map file. In this case, the file is named relay_palats_com.
  • Add your map file to relay_recipient_maps (in postfix main.cf file). As example:
 relay_recipient_maps = hash:/etc/postfix/relay_palats_com
  • Restart postfix to apply the changes.
  • And probably test that it works as expected, by connecting to the secondary and sending mail.

Be careful, such a change can easily bounce some mails if mistaken. It's common, even for legitimate mails with working servers, to have mail flowing from a secondary MX.

Monday, December 15 2008

Automatically creating a disk image with partitions and bootloader.

I'm often playing with tools to manipulate full system images for virtual machines and so I often need to create disk images.

The script that you can find here allows to create a disk image of an arbitrary size with partitions and a working grub bootloader. This kind of script can be a bit dangerous, so I put it there just as an example, be careful. It is using qemu-img to create the disk image (which can be easily replaced by dd), sfdisk to create the partitions, and grub to install a boot loader.

There is behind that a couple of not often documented issues.

Disk size

The PC partition table still uses the wonderful Cylinder/Head/Sector (a.k.a. CHS) scheme to address the disk. Of course, those values do not correspond anymore to any physical reality, but they are still here to annoy you. The idea with PC partition table is that partitions can start and stop only at CHS boundaries. Typically, you count 255 heads and 63 sectors, and you have a number of cylinders depending on the disk size. Usually one sector is 512 octets, so the math is easily done.

It is not a problem to have a disk size not strictly aligned with a CHS boundary. Worst thing is that you lose a few kilobytes, which is not an issue. However when it comes to creating a disk image from an host, you can have the following problem:

  • You create the image:
qemu-img create -f raw "${IMGFILE}" ${IMGSIZE}
  • Then you create the partition table. I'm using here option -D for sfdisk. It will move the first partition a bit forward (one head further). In practice, I do that to match the usual partition scheme of tools such as fdisk. The following command creates only one linux partition, taking the whole disk and marks it as bootable.
sfdisk -D $IMGFILE <<EOF
,,L,*
;
;
;
EOF
  • And then, you want to format the partition. The partition is not starting at the beginning of the file, so you first need to do some trickery using linux loopback devices. The offset 32256 come from 63 * 512, with 63 being the number of sectors and 512 the size of one sector.
losetup -o 32256 /dev/loop0 "${IMGFILE}"
  • Now, you have a block device, /dev/loop0, which corresponds to the partition, so you can format it. Be careful, there's no confirmation here :
mkfs.ext3 /dev/loop0

And now you have a problem if you look carefully. The mkfs called will determine (by default) the filesystem size based on the block device size. But if your disk size was not aligned on CHS boundaries, it means that you have more place on the disk image, hence on /dev/loop0 than you really have on the partition. Another way to say that is that the end of the filesystem will be after the end of the partition. And of course, fsck won't be too thrilled about that.

There's a couple of solutions:

  • Explicitely set number of blocks on the mkfs command. E.g, mkfs.ext3 -b 4096 /dev/loop0 <number of block>. You can compute number of blocks from the disk geometry.
  • Set the size of the disk to perfectly fit with the geometry, so you don't have to teach mkfs about size. That the approach I'm doing in the script, because sfdisk is a pain to parse to obtain the geometry.

Configuring grub

Grub can be easily put a disk image. In its default setup, it needs to have a few files on the partition to be able to boot and show up a menu. At the beginning, I was simply counting on the files provided by the linux distribution I was putting on the disk. However, it can be sometime incompatible with the grub version you're using from the host machine. So you need to put grub files from the host machine first. I'm doing something along those lines after having mounted the partition on $INSTDIR:

cp /boot/grub/{stage1,stage2,e2fs_stage1_5} $INSTDIR/local/grub
ln -s /boot/grub/menu.lst $INSTDIR/local/grub/menu.lst

I'm not putting the host file in the usual /boot/grub directory. The idea is that I'm going to put a full distribution image here, and I don't want to override the real grub files. Moreover, this way, if I reinstall grub from the virtual machine afterwards, it will probably do the right thing by using files from /boot/grub.

Now, I just need to setup grub using the following commands:

grub --batch <<EOF
device (hd0) ${IMGFILE}
root (hd0,0)
setup --prefix=/local/grub (hd0)
quit
EOF

There are two things here. I need to specify that hd0 is in fact my disk image using the device stanza, to avoid writing on the real disk. Then, I use --prefix in the setup command to make sure that grub will be using the files I copied from the host.

Sunday, December 7 2008

Configuration of my network through an ALIX

My Media Center is located in the living room. Until now, to have networking on it, there was an old fashionned network cable between the office and the living room, which prevented the door to be closed. So, I've configured on of my Alix to serve as a kind of wifi bridge for the media center.

The PC Engines / ALIX is a small box, with a x86 geode processor in it, a wifi card and a network plug (some model have several network plugs).

There's no hard disk on it, but a flash card. I've installed Voyage Linux on it. While I usually don't like to install niche linux distributions, having a flash card as disk means modifying a lot of things to avoid disk writes. Voyage linux has the advantage of being based on a regular Debian. so there's still access to all the classical packages and configuration system. So far, I'm happy with it. Everything is read-only by default and you can easily remount the disk as read/write when needed.

My first aim was to have the ALIX act as a pure level-2 bridge, so the media center would have been able to talk directly with the dhcp server and so on. However, my wifi router is most probably crappy, and it was not possible; packets were discarded at its level. I suspect that it didn't like seeing several mac address on a WPA authenticated connection.

To circumvent this problem, I've choosed (well, not much choice :) to have the alix act as a router. But to make the access of the media center possible and transparent from the main network, the ALIX box do a 1:1 NAT between the IP of the media center on the media center network, and a "visible" IP on the main network. In practice the ALIX has:

  • One public IP, 192.168.1.3, on the main network, just to be able to access it.
  • One IP, 192.168.2.1, on the private network, to be able to act as a router.
  • And one extra IP, 192.168.1.42, which will correspond to the public IP of the media center.

So, here is the /etc/network/interfaces on the alix:

# Because we always need a loop back :)
auto lo
iface lo inet loopback
 
# The network interface on the private part, which act 
# as a router. Nothing fancy here, static IP.
auto eth0
iface eth0 inet static
        address 192.168.2.1
        netmask 255.255.255.0
        broadcast 192.168.2.255
 
# The wifi interface, which is an atheros card, hence
# the name.
auto ath0
# It absolutely needs to be in manual mode to have wpa-* 
# stanzas working. It is possible to still have dhcp on top 
# of that with a default interface, but I don't need it
# here.
iface ath0 inet manual
        # Atheros network interfaces need to be 
        # instanciated from the generic wifi0 card. We want 
        # to be in managed mode (aka client of an access 
        # point), so wlanmode is 'sta'.
        pre-up wlanconfig ath0 create wlandev wifi0 \
                        wlanmode sta
        # We now configure a regular wpa_supplicant with 
        # the following two stanzas. That's an atheros 
        # card, so the driver is madwifi. All wpa 
        # configuration (essid, passphrase and so on) is 
        # in the wpa_supplicant.conf file.
        wpa-driver madwifi
        wpa-roam /etc/wpa_supplicant.conf
        # On a 'up' event (see 'man interfaces'), assign a 
        # static address. That's the 'public' address of the
        # ALIX, the one I use to connect by ssh on it.
        up ifconfig ath0 192.168.1.3
        # Since I'm in manual mode, I add the default 
        # gateway, which is my wifi router.
        up route add -net default gw 192.168.1.1 ath0
        # And when trying to ifdown this interfaces, clean
        # everything.
        down route del -net default gw 192.168.1.1 ath0
        post-down wlanconfig ath0 destroy
 
 
# And now, create a virtual interface on the wifi side, 
# which will be the visible IP of the media center.
auto ath0:42
iface ath0:42 inet static
        # Regular and boring static configuration of this 
        # IP.
        address 192.168.1.42
        netmask 255.255.255.0
        broadcast 192.168.1.255
        # And now, the interesting part. Those 4 commands 
        # tell iptables to forward everything that is 
        # coming for .1.42 to .2.42 and viceversa. So, at 
        # the IP level, access to the media center is 
        # completely transparent, as if it was on the main 
        # network. Since it's 1:1 NAT, we're using iptables 
        # in stateless mode, as we don't care about 
        # connection tracking.
        # For that to work, you need to have 
        # /proc/sys/ipv4/ip_forwarding set to 1; it's the 
        # default on voyage linux, but ymmv.
        post-up iptables -t nat -A PREROUTING \ 
                          -d 192.168.2.42 -j DNAT \
                          --to-destination 192.168.1.42
        post-up iptables -t nat -A PREROUTING \ 
                           -d 192.168.1.42 -j DNAT \
                           --to-destination 192.168.2.42
        post-up iptables -t nat -A POSTROUTING \ 
                           -s 192.168.1.42 -j SNAT \
                           --to-source 192.168.2.42
        post-up iptables -t nat -A POSTROUTING \
                           -s 192.168.2.42 -j SNAT \
                           --to-source 192.168.1.42

And that's it. I've just configured the media center to use 192.168.2.42 with 192.168.2.1 as gateway and everything went well.

Monday, December 1 2008

Better integration of qemu and screen

I've updated the script to launch qemu in screen that I described in this post. You can get the new version here.

It nows take care of creating a new screen session if needed, and if launched from an existing screen, it will simply had new tabs for the serial port and the monitor. Its usage changed too, you now needs to specify the qemu binary to launch, since the wrapper now just add the necessary options to the qemu line. And it now doesn't close the screen, so you can look at what qemu wrote on its output.

Basic usage : qemu-wrapper qemu -hda vm.img -nographic

Sunday, November 30 2008

How to boot a debian netinst over serial in a qemu without display

The debian netinst is able to work over serial but if a graphic card is detected, the bootloader won't be sent over serial, which make appending options to display over serial a bit tricky. And of course, when booting a qemu with -nographic, it doesn't remove the graphic card.

The trick here is to use the monitor to send keys to the bootloader. You just have to send the keys to tell the kernel and the install to boot and start using ttyS0. Which means that you need to write the following in the monitor :

sendkey i
sendkey n
sendkey s
sendkey t
sendkey a
sendkey l
sendkey l
sendkey spc
sendkey c
sendkey o
sendkey n
sendkey s
sendkey o
sendkey l
sendkey e
sendkey equal
sendkey t
sendkey t
sendkey y
sendkey shift-s
sendkey 0
sendkey ret

That will start the image install with argument console=ttyS0.

Sunday, November 9 2008

Automatic detection of mails moved from/to the spam folder

I'm currently using dspam to filter my mails. However, as I'm using IMap, spam filtering is done server side. So, to identify false negative (FN) and false positive (FP), I cannot use some built-in feature of my mail clients (I have severals), I need to communicate with the server. Until recently, I was using the classic approach: when I got a FN or FP, I redirect the mail (with full headers) to a special address, which send it to dspam, telling it that it was a misclassification.

The problem with this approach in practice is that to mark a FP/FN I need to retransmit the mail, and move it to the correct folder, which is redundant. Of course, most mail clients can help doing that with some configuration, but still, that's several operations where it is not really needed. Moreover, in the case of FN, it means sending through SMTP a spam, which can sometimes be a problem.

So, I've made a script which watches the content of the spam folder and detects mails which are added and removed. This way, to mark a FN as spam, I just need to move it to the Spam folder: the script will detect that a mail has been added, and will re-train dspam with the signature of the email. For FP, it's the same thing: I just need to move the mail out of the spam folder, the script will detect that and call dspam with the signature of the moved email.

The script is a single-file python script : dspam_auto.py

It works with Maildir style mailboxes, dspam and a mysql database. However, the principle is simple and can easily be adapted. The implementation is currently really dumb and could be enhanced (especially resource-wise, for the regular scan) but it's working.

The principle of the script is to scan the directory regularly to look for missing and added mails. The script must be plugged to the delivery system too (procmail in my case) to avoid trying to re-learn a spam already classified as spam.

How to use it :

  • First, your dspam must be configured to put the signature as an header, not in the body of emails.
  • Download the script on the server, let's say in ~/bin/dspam_auto.py and don't forget to mark it as executable.
  • Edit the beginning of the script to adapt the settings to your configuration
    • DB_USER, DB_PASS, DB_NAME : Access to the mysql database. You can reuse the DB you're using for dspam.
    • DB_TABLE is the name of the table which will be used to store the script information. It shall be a non-existing table, the default value is probably usually ok.
    • DSPAM_USER is the name of the dspam user you're using; usually your login name.
    • DSPAM_UID is the uid of the user for the script. It's probably good practice to use the same as in dspam, but it's in practice independant. You can check for user/uid in the table 'dspam_virtual_uids' of your dspam database.
    • LOG_FILE : Where to log all of script runs. It's really useful for debugging or just checking that the script isn't going rogue.
    • The dspam command to re-classifies FN/FP is in the classify function, feel free to adapt it to your installation. E.g., the script is currently using the option --client which you might not need.
  • Check that DRY_RUN is True ; that's needed to initialize correctly the script database without polluting the dspam database.
  • Initialize the database: ~/bin/dspam_auto.py init
  • Add a regular scan, in cron (using crontab -e as example, all on one line):
*/10 * * * * $HOME/bin/dspam_auto.py update $HOME/Maildir/.Spam

This line make the scan run every 10 minutes which is probably largely enough (especially that the current version of the script is not really nice to database :). Note that the first scan will detect all existing spam as FN, so double check that DRY_RUN is True before screwing your dpsam.


  • Modify the procmailrc to tell the script when each spam is detected. I have something like that in my .procmailrc :
# Spam filtering:
:0fw
| /usr/bin/dspam --stdout --deliver=spam,innocent --user pierre

# Tell the script for each detected spam
:0 ic
* ^X-DSPAM-Result: spam
| /home/pierre/dspam/dspam_auto.py push

# And deliver spam in the spam folder
:0:
* ^X-DSPAM-Result: spam
.Spam/

Note that the script is slightly racy, as calling the script and delivering the script is not atomic. However, as long as you don't run the scan every 10 seconds it shall not matter much, and recover itself from previous mistakes anyway. The way to implement that with no race condition would be to do the delivery ourselves, but I prefer not to for reliability reason: if my script is screwed up, it won't trash mails.

  • Configuration is done. As long as you're in dry run mode, you can watch the effect of the script by moving mail in and out from the Spam folder. Typically, moving a spam out then in (don't forget to wait for the cron scan between operations) will produce those kind of log lines :
INFO 2008-11-09 11:40:09,710 [dryrun] Classify command: /usr/bin/dspam --signature=4916ba3b179033708835974 --class=innocent --source=error --client --user pierre
INFO 2008-11-09 11:50:09,338 [dryrun] Classify command: /usr/bin/dspam --signature=4916ba3b179033708835974 --class=spam --source=error --client --user pierre
  • Once you think that your all set (a.k.a, you've configured the above and at least one scan was fully done), you can set DRY_RUN to False and enjoy a simple way to mark FP and FN in imap :)

Sunday, November 2 2008

Simple integration of Gallery2 in Dotclear2

This patch enable an integration of Gallery2 images on a Dotclear 2 blog.

The patch is really basic and crude; a plugin might be better but that's an adaptation from a patch for dotclear 1 :) Nevertheless it's working well. You can apply it with a simple patch -p1 < dc2-gallery2.patch from the directory of your dotclear2 installation.

It works only for wiki mode and add a new kind of tag:

##folder/image.jpg##

That will put the corresponding image from the gallery in the blog post. You can easily get the path and image name from the Gallery URL of the image you want to insert, without the base path and the html extension.

The full tag syntax is similar to regular images:

##folder/image.jpg|position|size##

where:

  • position is L or G for a left position, R or D for a right position, or C for centered (default).
  • size determine the size of the image, in pixels.

Both are optionnal.

The patch adds a few parameters available in the gallery section of the about:config of your blog instance:

  • gallery_enable : Activate gallery2 integration.
  • gallery_embed : Where to find the embed.php on the local server from the Gallery2 embedding system. E.g, /home/user/gallery-site/embed.php.
  • gallery_uri : Path relative to your domain of the gallery instance. E.g., /gallery/main.php.
  • gallery_size : Default size of images. E.g, 400.

Unfortunately, this patch doesn't support yet integration of a gallery on another domain name.

Sunday, October 26 2008

US keyboard with non intrusive easy accents

I'm now always using a US keyboard. However, I still often need to write french text with those keyboards, hence needing accents. Of course, I don't like when what's written on the keyboard doesn't match what it does :-)

On classical linux, there is an international version of the US layout. However, it modifies "standard" behaviour; i.e., if you want to do a double quote, you need to press it twice because in this intl layout, it is, by default, a dead key. Same applies for a couple of other symbols, such as the simple quote or tilde. So, I've made a custom version of the US layout, providing easy access to accent through alt-gr (both with dead keys and common accents in french), but without modifying standard behaviour of keys.

It's partly based on us-intl, with deadkeys removed from default bindings, and the following bindings added (mainly):

  • altgr-` : dead grave accent
  • altgr-' : dead acute accent
  • altgr-^ : dead circonflex accent
  • altgr-" : dead diaresis
  • altgr-a : à
  • altgr-e : é
  • altgr-c : ç
  • altgr-u : ù

You can find the xkbmap file here : us-custom

It still needs tuning, but it works already quite well for my purposes. You just need to the file in /usr/share/X11/xkb/symbols and do a setxkbmap us-custom (if you named the file us-custom) to load it.

Wednesday, October 22 2008

Using Qemu in screen

The following python script allows you to start a Qemu in a screen with separated screen windows for qemu monitor and for serial port.

#!/usr/bin/env python
import os
import subprocess
import sys

# Qemu command. Parameters to this script will be added to the command.
CMD=['qemu-system-i386',
     '-serial', 'pty', '-monitor', 'pty']
CMD += sys.argv[1:]

SCREENNAME = os.environ['STY']

# Qemu is not really verbose, so we need to know in which order pty names
# will appear.
TITLES=['Monitor', 'Serial']
PREFIX = 'char device redirected to '

# And then start qemu
p = subprocess.Popen(CMD, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, bufsize=0)

devcount = 0
# For obscure buffering reasons, iterating over stdout doesn't work.
while p.poll() is None:
    line =  p.stdout.readline()
    sys.stdout.write(line)
    if line.startswith(PREFIX):
        devname = line[len(PREFIX):].strip()
        try:
            title = TITLES[devcount]
        except KeyError:
            title = devname
        # This command add a window to the current screen, using given pty.
        subprocess.call(['screen', '-x', SCREENNAME, '-X', 'screen', '-t', title, devname])
        devcount += 1

To use it, put this script in something like qemu-wrapper, make it executable and do a screen qemu-wrapper <qemuargs> to start qemu inside the screen with the nice dedicated windows.

UPDATE: A new version is available here