# 14 years later

Katharine and I are coming up to our 14th wedding anniversary.  Our marriage is still full of passion after all of this time, as evidenced by our twitter interactions.

# IRC Life: Creating a better workflow: Notification scripts

I like to use irssi.  It’s not for everyone, but it supports a nice(ish) Perl scripting environment and you can run it via a SSH session which is super handy.

When I’m working from home, people at work use IRC to get my attention.  Sometimes I’m not in front of my laptop, and I don’t want to leave people waiting, so I’ve come up with the following two scripts that really help me to keep my acknowledgement latency down.

First, I use the notify.pl script to format out some of the event messages from IRC, and subsequently call the ~/bin/notify script passing in the text from the notify.pl script.

You can find the scripts here.

You’ll have to update the ~/bin/notify shell script to use your mailserver, credentials, etc.  And you’ll need to have swaks installed, as it is doing the heavy SMTP lifting here.

Now, go into your gmail (I’m assuming that you’re sending these to your gmail) and configure a filter that sends all mail to the address provided to a particular label, bypassing the inbox.

Now, on your phone’s gmail app, setup an alert on that particular label.

What will happen is as soon as someone uses your nick in a channel, or addresses you in a private message, that line will be passed to the notify shell script.  The shell script in turn will make note of the time, and check when the last message was received.  If the time between messages was greater than 15 minutes, it will assume that you don’t currently have eyes on your IRC terminal and fire off an email notification to your phone.

Of course, you may not want this to be running all of the time, so as long as you’re running a connection multiplexer you can setup a dedicated irssi session that only runs this script.

# IRC Life: Creating a better IRC workflow: BIP

When working in Operations it’s important to have a work environment that is easy and quick to operate in.  IRC is the communication backbone of many unix shops, and you’ll need to be able to take full advantage of it.

IRC Bouncer: BIP

An IRC bouncer allows you to multiplex a single IRC session amongst multiple clients, effectively sharing your IRC nick.  This is very handy for a number of reasons:

• Your bouncer keeps you in the channels
• You don’t have to keep your IRC client connected
• You can run certain clients that are amenable to scripting, while running a different client that has a look/feel that suits you better
• Logging.

First install bip and take a look at my example configuration.  It has a server and user section.  For mine, I have the user “pete” and the server “tucows” which connects to my work’s internal IRC server.  This way, when I connect via my IRC client, I first connect to the machine on which bip is running, then issue

/quote pass pete:PASSWORD:tucows

which will drop me into my sessions.  From here I can join channels, and they will be logged for me on my workstation for easier reference later on.

# Programmer or engineer?

My wife recently asked me what the difference is between software developers and software engineers.

She quickly realized that I was going to talk for far too long and she said “forget it, I don’t really care.”

Most people don’t care because for the most part it doesn’t matter, but it got me thinking.

You have to back up and look at the different schools of computer thought:

• computer science
• computer engineering
• software engineering.

Computer Science:

In computer science (CS) you are taught to look at solving problems by thinking of problems in terms of data structures and algorithms.  The hardest part is learning the many different structures in which you can contain your data so that you can apply algorithms to the data containers.

(This isn’t explaining anything yet, I know. Its why my wife stopped listening.)

As a programmer, you are given the task of taking data in, doing something with it, and outputting some form of that data for another system to consume. The difficulty arises in scaling your process, or pipeline, so that you can operate this “work” on your data stream faster than the data is arriving. If you’re working on a small amount of data every day, then it could be quite sufficient to use a naive algorithm to accomplish the task. But, if your data continues to grow to the point where your naive algorithm can’t keep up then you have two choices: buy more hardware or re write your software to implement algorithms that accomplish your work in less time.

Certain algorithms are served better when the data is stored in particular containers (lists, arrays, sets, trees, heaps, etc).

The majority of a computer science degree is spent learning algorithms and which are best to apply to different problems.

Computer Engineering.

This is less of a science discipline and more of an electrical engineering school of thought. You must learn the fundamentals of how computers work (boolean algebra, truth tables, nand/nor circuit design, CPUs, busses, memory, etc). You must learn how to take requirements for a given piece of hardware, and turn out circuit designs that will accommodate different types of circuit path routing, clock timing components, ensuring that circuit signal is strong and no part of your device is ‘noisy’ and creates interference that would slow down or corrupt the data signals.

Software Engineering:

It is similar to Computer Engineering, but has more focus on the systems and processes that allow you to take a requirements spec, generate software, and verify that the software written is valid, meaning it reflects the requirements set out at the beginning.

You cannot verify every line of code in large code bases, but you can ensure that the process of writing software conforms in such a way that it is verifiably valid.

When you need to write software that conforms to certain legal regulations (landing gear software, pace maker software, etc) you need to ensure that the software driving your device will operate the device within the bounds of its governing legislation.

Programmer.

A programmer is a multidisciplinary role, capable of writing software in such a way that:

• they can recognize data patterns and apply effective algorithms to make the most of their computer systems resources
• understand the fundamental operation of a computer, how the CPU caches, how hard drives operate to waste as little wall time as possible
• understand how to write software that you can trust. You can’t always be certain that your code will be without bugs, but you need to be able to write code that can be tested to operate properly and verifiably.

Engineer? Developer?  These are titles that may be misleading. We’re all programmers, we just have different strengths and focus of work.

# Testing your mail with Flingo and DMARC

DMARC has been getting a lot of attention recently with Yahoo! implementing a reject policy on their personal mail platform, so it seemed like an appropriate time to write a little blog post on how you can use Flingo to test for potential brand infringements or phishes with DMARC and Flingo.

I’ll assume that you are running a Debian-like system and can compile software.  So, let’s create the flingo package and install our libraries and binaries:

Now we can execute:

$cd /tmp$ sudo dpkg -i flingo*.deb

to install our newly minted package.

## Testing the script

Now that Flingo has been installed, let’s try to test a supplied example set.

$cd /tmp/Flingo/examples$ flingoc dmarc.flingo paypal-spam.rfc821

And we’ll see:

which shows us a number of things:

• The subject
• The sender
• The recipient
• A DMARC reject warning

This is giving us enough to tell us that the domain used in the RFC821.From address has a published DMARC record with a reject policy, and that the message does not pass DKIM verification.  We are honouring the domain owner’s request to reject these messages and will stop processing of this mail immediately.

# Logging in C++

I’ve always found logging in C++ to be rather annoying.  With each project I inevitably take a different route.  I’ve had enough, and have finally decided to sit down and write a package that will allow me to re-use the same logging with each project.

My goal was to overload the std::clog object so that it can write to either a given file or syslog.

It’s still under active development and I encourage all pull requests.

# Filtering DKIM messages with Flingo

Flingo is my little mail filtering language used to parse email and flag them for any kind of policy actioning: spam folder placement, deletion, etc.

I’ve recently implemented DKIM verification within Flingo allowing for messages to be parsed using the libopendkim library.

Let’s create a rule file that:

1. Matches emails with a From address matching the gmail.com domain
2. Fails DKIM verification where the d= must be “gmail.com”

Now, let’s execute this and see what happens:

$flingoc dkim-gmail-fail.flingo ../tests/t004_dkim_fail.rfc821 METARULE[dkim-test-fail] ACTION:log:Failed-to-pass-DKIM-signature-verification ACTION:log:sender:some.user@gmail.com ACTION:log:subject:Test Email ACTION:reject: END: Great! It detected that it was scanning a gmail message, but caught the failure to verify. Now, let’s do this with a valid message: $ flingoc dkim-gmail-fail.flingo ../tests/t003_dkim.rfc821
END:

Perfect.

# Displaying how much git work has been done

Sometimes I want to know how much work I’ve done over a certain time period.  Many people (myself included) will argue that work cannot be quantified by lines of code alone, but I still find myself using this little script at least once a week.

11:12 ~/git/abuse-fraud (master)$git accumulate.pl last week Deleted 199 and inserted 1966 lines # Using IntervalTrees for matching IP addresses to CIDR ranges First, we need some data. Thankfully I have access to a number of datasets, including connection addresses on from our mailservers, and an RBL in CIDR notation. First, let’s extract out just the CIDR addresses from our RBL: $ perl -lane 'print $1 if /^((?:\d+\.){3}\d+\/\d+)/' rbl > rbl.cidr And extract just the connecting IP addresses from our mail server log: $ zfgrep -i connect\ from postfix.log.gz | awk '{print $NF}' | cut -d$-f2 | cut -d$ -f1 | grep "\." > connections.ip Now, we will use the software found on my github repo and compile using “make t.it” Now, test it against the first 500000 IP records: $ head -500000 /tmp/test/connections.ip | /usr/bin/time ./t.it /tmp/test/rbl.cidr
12658 elements in interval tree
Matched 10262
4.30user 0.03system 0:04.33elapsed 99%CPU (0avgtext+0avgdata 4864maxresident)k
0inputs+0outputs (0major+1353minor)pagefaults 0swaps

We see that it took 4.33s to check half a million IP addresses against 10262 CIDR ranges.  We can run this multiple times with multiple counts of CIDR ranges to test how fast/efficient this software is at different sizes.

# Parsing your mail with Perl

If you’re like me, then you enjoy categorizing and sorting your email so that you can quickly find important emails, and quickly.

One advantage that I have is the ability to access the filestore on which my email resides.  Meaning, I don’t use any protocols like IMAP to access my mail.  Instead, all of my mail gets delivered to a directory on my mail server which I can log into to view and or modify my mail.

I’ve uploaded my scripts to my github repo where you can view the scripts.  To download them, you will need to be able to clone my repo.

## Running the script

There are a few hardcoded assumptions made by the script.

1. it assumes that you’re executing it from the path in which it is located.
2. it assumes that the rule files are located in the same directory as the script
3. it assumes that the “lib” dir exists as a subdirectory of the directory in which the script resides.

Unless you change anything, this is how it is setup in the git repo, so nothing should require any modification.

Because the “*” BASH variable craps out at around 1000, we will use the find command and the “xargs” command to help us parse our mail.

What we are seeing is the result of our LinkedIN rule:

You see that we can easily place any regex within the quotes to be matched against the body of the message.  The plugins also support header checks, and you can view the code to the Perl module to see how to define those.

## Takeaways

Sure, this is re-inventing the wheel.  There are already plenty of good email filtering software out there that you can use, but I find it useful to write my own tools from time to time to ensure that I’m getting exactly what I need.  Sometimes a custom tool is worth the effort to create.