# Parsing your mail with Perl

If you’re like me, then you enjoy categorizing and sorting your email so that you can quickly find important emails, and quickly.

One advantage that I have is the ability to access the filestore on which my email resides.  Meaning, I don’t use any protocols like IMAP to access my mail.  Instead, all of my mail gets delivered to a directory on my mail server which I can log into to view and or modify my mail.

I’ve uploaded my scripts to my github repo where you can view the scripts.  To download them, you will need to be able to clone my repo.

## Running the script

There are a few hardcoded assumptions made by the script.

1. it assumes that you’re executing it from the path in which it is located.
2. it assumes that the rule files are located in the same directory as the script
3. it assumes that the “lib” dir exists as a subdirectory of the directory in which the script resides.

Unless you change anything, this is how it is setup in the git repo, so nothing should require any modification.

Because the “*” BASH variable craps out at around 1000, we will use the find command and the “xargs” command to help us parse our mail.

What we are seeing is the result of our LinkedIN rule:

You see that we can easily place any regex within the quotes to be matched against the body of the message.  The plugins also support header checks, and you can view the code to the Perl module to see how to define those.

## Takeaways

Sure, this is re-inventing the wheel.  There are already plenty of good email filtering software out there that you can use, but I find it useful to write my own tools from time to time to ensure that I’m getting exactly what I need.  Sometimes a custom tool is worth the effort to create.

# Handling spam with mutt

If you’re like myself, then you use mutt as your primary mail client and will need a way to easily handle the spam coming into your mailbox.

I’ll make some assumptions:

1. You’re using procmail as your LDA
3. You’re using mutt as your MUA

Assuming that your setup is similar to mine, you’ll be able to leverage some of the simple scripts that I have written to handle the mis categorized messages entering into your inbox.

I have a macro file under ~/.mutt/macro.spam with the following contents:

source ~/.mutt/macros.spam

Now, when you have a missed spam in your inbox, press “S” to have it moved to the appropriate location for learning.

Once you have identified missed spam and false positives, and mutt has moved them into their respective folders, it’s time to run a few little scripts on them to help you to train your system.

### False Positives

These are messages that were in your spam folder erroneously.

You’ll have to change the directories to suit your environment.  But the script is pretty simple.  It attempts to strip off any spamassassin markup from the message, it will train your spamassassin-bayes DB to recognize messages like this as ham, and it will feed the message back into procmail to be filtered/delivered accordingly.

### Missed Spam

The missed spam script is very similar to the false positives script, but it takes the message and sends it to your spamcop account via the spamcop reporter script.

### Automating

Now that the scripts are setup and running, consider placing them into a crontab that fires every 5 minutes.

### Room for improvement

There’s always room for improvement.  One easy solution would be to whitelist all of the mail from your sent folder.   Please share your tips/techniques in the comments!

# Skills

As an adult I have the gift of hindsight with which I can re evaluate some of my decisions growing up.  As a child many of the choices governing my life were made for me, and as an adolescent I was too narrow minded and self absorbed to conduct any meaningful critical thinking with regards to my education outside of school.

I’m making a list of things that I wish I had learned to do and a list of things that I would like my children to learn.

I wish that I had known how to:

* use a multimeter to test electronics, car batteries, car alternators
* troubleshoot and replace household appliance switches
* plumbing and soldering
* wire an outlet.

These are simple skills that many people do without, but more and more often I’m finding that my life would be rather less complicated and less expensive if I had these skills.

For my children, I would like to teach them all of these things plus a few other niche skills:

* what TCP/IP is
* HTTP 1.1 fundamentals (cookies, posts, gets)
* JavaScript
* DNS
* Bash scripting

I believe that the top four items are important so that you can navigate online services and better understand how these systems are put together. We live in a world where we operate within these services and many people do not have the slightest clue how these services operate.

I feel that Bash scripting is important as it teaches a computer user how to become a computer operator. If you haven’t already, please visit Daniel Rushkoff’s Program or be Programmed site and book.

# Working socially

In highschool, before we as a whole had the Internet, I would log into local BBS for games and discussion. At 14 it was my first exposure to adult discourse and near instant communications.  My computer would log on every day at 6am, pull the latest messages and I would read and reply to them over breakfast. With any luck I would have a reply by the time school ended.

By the mid 90s most of my friends’ families were going online and they would share a family email account.  At this time I was experimenting with Unix and learning C. I had a Slackware server running on a dedicated phone line with a USR Courier 33.6, dialed into my ISP with a static PPP IP address and DNS records pointed to me.

I provided http, ftp, telnet and mail services for free to my friends.  This was 1996 IIRC.

We didn’t have any social networks then. What we would do instead is update our .plan file each day with whatever plans we had or where we would likely be. None of us had cell phones and it wasn’t easy keeping tabs on each other.

My server had a php page that would iterate over each user’s .plan file and render it out chronologically by mtime. It was rudimentary, but it worked.

By 2000 I was on a friends social network site. I was a beta user (my user ID was sub 20) and I left the site a year or two later after the membership numbers were in the tens of thousands and my friend had dropped out of developing it.

Half a decade passed as I was busy with work and family. I had dropped off of the internet for the most part. I was writing C++ for embedded devices, and although there was an online component, I wasn’t spending that much time on the Internet.

In 2007 I switched jobs. I went from being a developer to an operations guy, and began working on the email team of a mailbox provider with millions of mailboxes under management.

My favourite part of the job was the social aspect. I was once again helping people to connect and interact. I loved email from my days using Eudora and win 3.1.  Now I was writing MTAs and anti spam systems to keep my users’ experience good and removing the noise from the signal.

My move to Facebook makes total sense to me. There are plenty of work places that I would find myself happy to be a part of, but throughout my adolescence and adult life I’ve found the most rewarding working experience in the act of helping to bring people together.

I find it fundamentally important that we have a vehicle for reaching out and connecting with each other. Creating tools that makes this easier is what I find to be the most rewarding.

# UNIX Testing

In April I posted a quick little technical blog post regarding BDD style testing of a UNIX environment. It’s striving to be lean on the dependencies, and allow you to write cucumber style tests against your UNIX system to check configurations, working environments, etc. This allows you the ability to test and validate that you’re server’s running environment is in line with the production expectations.

# Fun with Markov Chains

Markov Chains are fun. They’re used in a bunch of stuff, including anything where you’re trying to learn and or predict the next item in a sequence, given a historical set upon which you can make some kind of probability table from.

I hacked together a tiny little Python script to demonstrate how MC work. There’s a bash and Perl glue script that will format the input file into something that Python can easily eat. Yum

You’ll need something like Linux, or UNIX to run these.

# gvim colours that are nice on the eyes

I spend most of my time editing software in vim or gvim — I’m most comfortable in vim, since I like to drop into a shell and run snippets etc, which isn’t amenable in gvim. But, gvim can be quite nice on the eyes.

I use the following in my .gvimrc file:

syntax on colorscheme darkslategray set gfn=Inconsolata\ Medium\ 10

# Managing mailing lists – The right tool for the job

I find that using mutt as an email client can really help when managing the different mailing lists that I subscribe to. One of the features that it supports is a send-hook which will change a couple of headers depending upon the address recipient of the email.

Meaning, if I’m emailing the mailing list address, I want my reply-to to be the mailing list address, and I may want to change my from address to match the address that I’ve subscribed with.

Not a lot of people take the time to follow proper mailing list etiquette, but they should.

I’m becoming a grumpy grey beard. I know.

# Perl closures — the visitor pattern

For some reason, Perl closures have been eluding me for the past couple of months, and I just realized that they’re simply a visitor pattern. I made a rough example using C++0×11 and Perl:

Note, I understand that strictly speaking, a closure is “is a function or reference to a function together with a referencing environment”, but it made a lot more sense when I thought of it as a visitor.

# Creating debian packages from Perl modules

I just wrote a little blog post at my other blog that outlines how to create debian packages from the latest Perl module tarball.

I use these scripts as I don’t like maintaining Debian package info, and just publish my distribution tarballs to a public dir — then I can use my script to pull the latest version from that dir (via HTTP) and create a .deb file that reflects it.

Take a look — enjoy.