Programmer or engineer?

My wife recently asked me what the difference is between software developers and software engineers.

She quickly realized that I was going to talk for far too long and she said “forget it, I don’t really care.”

Most people don’t care because for the most part it doesn’t matter, but it got me thinking.

You have to back up and look at the different schools of computer thought:

  • computer science
  • computer engineering
  • software engineering.

Computer Science:

In computer science (CS) you are taught to look at solving problems by thinking of problems in terms of data structures and algorithms.  The hardest part is learning the many different structures in which you can contain your data so that you can apply algorithms to the data containers.

(This isn’t explaining anything yet, I know. Its why my wife stopped listening.)

As a programmer, you are given the task of taking data in, doing something with it, and outputting some form of that data for another system to consume. The difficulty arises in scaling your process, or pipeline, so that you can operate this “work” on your data stream faster than the data is arriving. If you’re working on a small amount of data every day, then it could be quite sufficient to use a naive algorithm to accomplish the task. But, if your data continues to grow to the point where your naive algorithm can’t keep up then you have two choices: buy more hardware or re write your software to implement algorithms that accomplish your work in less time.

Certain algorithms are served better when the data is stored in particular containers (lists, arrays, sets, trees, heaps, etc).

The majority of a computer science degree is spent learning algorithms and which are best to apply to different problems.

Computer Engineering.

This is less of a science discipline and more of an electrical engineering school of thought. You must learn the fundamentals of how computers work (boolean algebra, truth tables, nand/nor circuit design, CPUs, busses, memory, etc). You must learn how to take requirements for a given piece of hardware, and turn out circuit designs that will accommodate different types of circuit path routing, clock timing components, ensuring that circuit signal is strong and no part of your device is ‘noisy’ and creates interference that would slow down or corrupt the data signals.

Software Engineering:

It is similar to Computer Engineering, but has more focus on the systems and processes that allow you to take a requirements spec, generate software, and verify that the software written is valid, meaning it reflects the requirements set out at the beginning.

You cannot verify every line of code in large code bases, but you can ensure that the process of writing software conforms in such a way that it is verifiably valid.

When you need to write software that conforms to certain legal regulations (landing gear software, pace maker software, etc) you need to ensure that the software driving your device will operate the device within the bounds of its governing legislation.

Programmer.

A programmer is a multidisciplinary role, capable of writing software in such a way that:

  • they can recognize data patterns and apply effective algorithms to make the most of their computer systems resources
  • understand the fundamental operation of a computer, how the CPU caches, how hard drives operate to waste as little wall time as possible
  • understand how to write software that you can trust. You can’t always be certain that your code will be without bugs, but you need to be able to write code that can be tested to operate properly and verifiably.

Engineer? Developer?  These are titles that may be misleading. We’re all programmers, we just have different strengths and focus of work.

Testing your mail with Flingo and DMARC

DMARC has been getting a lot of attention recently with Yahoo! implementing a reject policy on their personal mail platform, so it seemed like an appropriate time to write a little blog post on how you can use Flingo to test for potential brand infringements or phishes with DMARC and Flingo.

I’ll assume that you are running a Debian-like system and can compile software.  So, let’s create the flingo package and install our libraries and binaries:

Now we can execute:

$ cd /tmp
$ sudo dpkg -i flingo*.deb

to install our newly minted package.

Testing the script

Now that Flingo has been installed, let’s try to test a supplied example set.

$ cd /tmp/Flingo/examples
$ flingoc dmarc.flingo paypal-spam.rfc821

And we’ll see:

which shows us a number of things:

  • The subject
  • The sender
  • The recipient
  • A DMARC reject warning

This is giving us enough to tell us that the domain used in the RFC821.From address has a published DMARC record with a reject policy, and that the message does not pass DKIM verification.  We are honouring the domain owner’s request to reject these messages and will stop processing of this mail immediately.

Logging in C++

I’ve always found logging in C++ to be rather annoying.  With each project I inevitably take a different route.  I’ve had enough, and have finally decided to sit down and write a package that will allow me to re-use the same logging with each project.

My goal was to overload the std::clog object so that it can write to either a given file or syslog.

It’s still under active development and I encourage all pull requests.

filter

Filtering DKIM messages with Flingo

Flingo is my little mail filtering language used to parse email and flag them for any kind of policy actioning: spam folder placement, deletion, etc.

I’ve recently implemented DKIM verification within Flingo allowing for messages to be parsed using the libopendkim library.

Let’s create a rule file that:

  1. Matches emails with a From address matching the gmail.com domain
  2. Fails DKIM verification where the d= must be “gmail.com”

Now, let’s execute this and see what happens:

 

$ flingoc dkim-gmail-fail.flingo ../tests/t004_dkim_fail.rfc821
METARULE[dkim-test-fail]
ACTION:log:Failed-to-pass-DKIM-signature-verification
ACTION:log:sender:some.user@gmail.com
ACTION:log:subject:Test Email
ACTION:reject:
END:

Great! It detected that it was scanning a gmail message, but caught the failure to verify.

Now, let’s do this with a valid message:

$ flingoc dkim-gmail-fail.flingo ../tests/t003_dkim.rfc821
END:

Perfect.

1d-bst-tree-interval

Using IntervalTrees for matching IP addresses to CIDR ranges

First, we need some data.

Thankfully I have access to a number of datasets, including connection addresses on from our mailservers, and an RBL in CIDR notation.

First, let’s extract out just the CIDR addresses from our RBL:

$ perl -lane 'print $1 if /^((?:\d+\.){3}\d+\/\d+)/' rbl > rbl.cidr

And extract just the connecting IP addresses from our mail server log:

$ zfgrep -i connect\ from postfix.log.gz | awk '{print $NF}' | cut -d\[ -f2 | cut -d\] -f1 | grep "\." > connections.ip

Now, we will use the software found on my github repo and compile using “make t.it”

Now, test it against the first 500000 IP records:

$ head -500000 /tmp/test/connections.ip | /usr/bin/time ./t.it /tmp/test/rbl.cidr 
12658 elements in interval tree
Matched 10262
4.30user 0.03system 0:04.33elapsed 99%CPU (0avgtext+0avgdata 4864maxresident)k
0inputs+0outputs (0major+1353minor)pagefaults 0swaps

We see that it took 4.33s to check half a million IP addresses against 10262 CIDR ranges.  We can run this multiple times with multiple counts of CIDR ranges to test how fast/efficient this software is at different sizes.

 

 

 

Coding with Perl

Parsing your mail with Perl

If you’re like me, then you enjoy categorizing and sorting your email so that you can quickly find important emails, and quickly.

One advantage that I have is the ability to access the filestore on which my email resides.  Meaning, I don’t use any protocols like IMAP to access my mail.  Instead, all of my mail gets delivered to a directory on my mail server which I can log into to view and or modify my mail.

Downloading the scripts

I’ve uploaded my scripts to my github repo where you can view the scripts.  To download them, you will need to be able to clone my repo.

Running the script

There are a few hardcoded assumptions made by the script.

  1. it assumes that you’re executing it from the path in which it is located.
  2. it assumes that the rule files are located in the same directory as the script
  3. it assumes that the “lib” dir exists as a subdirectory of the directory in which the script resides.

Unless you change anything, this is how it is setup in the git repo, so nothing should require any modification.

Find is your friend

Because the “*” BASH variable craps out at around 1000, we will use the find command and the “xargs” command to help us parse our mail.

 

What we are seeing is the result of our LinkedIN rule:

You see that we can easily place any regex within the quotes to be matched against the body of the message.  The plugins also support header checks, and you can view the code to the Perl module to see how to define those.

Takeaways

Sure, this is re-inventing the wheel.  There are already plenty of good email filtering software out there that you can use, but I find it useful to write my own tools from time to time to ensure that I’m getting exactly what I need.  Sometimes a custom tool is worth the effort to create.

spam

Handling spam with mutt

If you’re like myself, then you use mutt as your primary mail client and will need a way to easily handle the spam coming into your mailbox.

I’ll make some assumptions:

  1. You’re using procmail as your LDA
  2. You’re using spamassassin as your main content filter
  3. You’re using mutt as your MUA

Assuming that your setup is similar to mine, you’ll be able to leverage some of the simple scripts that I have written to handle the mis categorized messages entering into your inbox.

Configuring your mutt macros

I have a macro file under ~/.mutt/macro.spam with the following contents:

And place into your .muttrc:

 

source ~/.mutt/macros.spam

Now, when you have a missed spam in your inbox, press “S” to have it moved to the appropriate location for learning.

Configuring your learning scripts

Once you have identified missed spam and false positives, and mutt has moved them into their respective folders, it’s time to run a few little scripts on them to help you to train your system.

False Positives

These are messages that were in your spam folder erroneously.

You’ll have to change the directories to suit your environment.  But the script is pretty simple.  It attempts to strip off any spamassassin markup from the message, it will train your spamassassin-bayes DB to recognize messages like this as ham, and it will feed the message back into procmail to be filtered/delivered accordingly.

Missed Spam

The missed spam script is very similar to the false positives script, but it takes the message and sends it to your spamcop account via the spamcop reporter script.

Automating

Now that the scripts are setup and running, consider placing them into a crontab that fires every 5 minutes.

Room for improvement

There’s always room for improvement.  One easy solution would be to whitelist all of the mail from your sent folder.   Please share your tips/techniques in the comments!

Skills

As an adult I have the gift of hindsight with which I can re evaluate some of my decisions growing up.  As a child many of the choices governing my life were made for me, and as an adolescent I was too narrow minded and self absorbed to conduct any meaningful critical thinking with regards to my education outside of school.

I’m making a list of things that I wish I had learned to do and a list of things that I would like my children to learn.

I wish that I had known how to:

* read a circuit diagram
* use a multimeter to test electronics, car batteries, car alternators
* troubleshoot and replace household appliance switches
* plumbing and soldering
* wire an outlet.

These are simple skills that many people do without, but more and more often I’m finding that my life would be rather less complicated and less expensive if I had these skills.

For my children, I would like to teach them all of these things plus a few other niche skills:

* what TCP/IP is
* HTTP 1.1 fundamentals (cookies, posts, gets)
* JavaScript
* DNS
* Bash scripting

I believe that the top four items are important so that you can navigate online services and better understand how these systems are put together. We live in a world where we operate within these services and many people do not have the slightest clue how these services operate.

I feel that Bash scripting is important as it teaches a computer user how to become a computer operator. If you haven’t already, please visit Daniel Rushkoff’s Program or be Programmed site and book.

facebook-logo

Working socially

In highschool, before we as a whole had the Internet, I would log into local BBS for games and discussion. At 14 it was my first exposure to adult discourse and near instant communications.  My computer would log on every day at 6am, pull the latest messages and I would read and reply to them over breakfast. With any luck I would have a reply by the time school ended.

By the mid 90s most of my friends’ families were going online and they would share a family email account.  At this time I was experimenting with Unix and learning C. I had a Slackware server running on a dedicated phone line with a USR Courier 33.6, dialed into my ISP with a static PPP IP address and DNS records pointed to me.

I provided http, ftp, telnet and mail services for free to my friends.  This was 1996 IIRC.

We didn’t have any social networks then. What we would do instead is update our .plan file each day with whatever plans we had or where we would likely be. None of us had cell phones and it wasn’t easy keeping tabs on each other.

My server had a php page that would iterate over each user’s .plan file and render it out chronologically by mtime. It was rudimentary, but it worked.

By 2000 I was on a friends social network site. I was a beta user (my user ID was sub 20) and I left the site a year or two later after the membership numbers were in the tens of thousands and my friend had dropped out of developing it.

Half a decade passed as I was busy with work and family. I had dropped off of the internet for the most part. I was writing C++ for embedded devices, and although there was an online component, I wasn’t spending that much time on the Internet.

In 2007 I switched jobs. I went from being a developer to an operations guy, and began working on the email team of a mailbox provider with millions of mailboxes under management.

My favourite part of the job was the social aspect. I was once again helping people to connect and interact. I loved email from my days using Eudora and win 3.1.  Now I was writing MTAs and anti spam systems to keep my users’ experience good and removing the noise from the signal.

My move to Facebook makes total sense to me. There are plenty of work places that I would find myself happy to be a part of, but throughout my adolescence and adult life I’ve found the most rewarding working experience in the act of helping to bring people together.

I find it fundamentally important that we have a vehicle for reaching out and connecting with each other. Creating tools that makes this easier is what I find to be the most rewarding.