Using PEAR’s mimeDecode module

While most MIME email is totally useless, it would be better if it was sent as a plain text message, MIME sometimes is useful. When sending attachments, for example. In this article, we will see how we can decode MIME messages with PHP.

We will use the mimeDecode module to do the actual decoding for us. mimeDecode is part of the PEAR library. PEAR is already installed if you have a newer version of PHP, but if you don't it's also easy to install.

How to get PEAR and mimeDecode

Before trying to install PEAR and the mimeDecode module, you should first make sure that you don't yet have it. If you have a recent version of PHP (> 4.3.0pre1), the PEAR base installation is already installed on your system. Since the mimeDecode module is part of the PEAR core, that's also installed. To check whether you have to install the mimeDecode or not, you can run the following PHP script:

<?php
include('Mail/mimeDecode.php');
?>

If this doesn't give an error, you the mimeDecode module is installed on your system. If it does, you'll have to install PEAR (or upgrade your version of PHP). Read the PEAR manual for instructions.

MIME: an introduction

Before you can successfully write a script that decodes MIME email, you'll have to know a little about the anatomy of a MIME message. I'll give you a short introduction to MIME. If you want to read more about the details of MIME, there is a chapter available from O'Reilly.

A tradition email

The source of a basic, non-MIME email looks like this:

From: Gijs van Tulder <gvtulder@example.com>
To: thelist@lists.evolt.org
Subject: Decoding MIME mail
Date: Wed, 12 Mar 2003 10:26:59 +0100

Hi, this is my message.

The first lines of this email contain headers, data about this message. These headers consist of a header name, before the colon, and some data, after the colon. (If you are familiar with the HTTP headers, you'll notice that email headers use the same syntax.) There are actually many more possible headers that I didn't include in this example, but all headers are in the form Name: Data. You can see the headers of your email in your email program. (In Microsoft Outlook 2000, for example, right-click on the message and select 'Options'.)

The body of the message, in our example Hi, this is my message. starts after the blank line that follows the headers.

A MIME email

We'll now see what happens with our message when we add a MIME attachment to it.

From: Gijs van Tulder <gvtulder@example.com>
To: thelist@lists.evolt.org
Subject: Decoding MIME mail
Date: Wed, 12 Mar 2003 10:26:59 +0100
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="MyBoundary"

This is a multi-part message in MIME format.

--MyBoundary
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Hi, this is my message. See the attached image!

--MyBoundary
Content-Type: image/gif; name="myimage.gif"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="myimage.gif"

R0lGODlhogBrAPcAAAAAAP///zwKC2UhIrdPUKU0OMmLjagEDJIEDZEmLduztW8HD64YJlUJEXkX
...omitted many lines like the above...
gahQUgx4n5kQWQQJ0sqEREAAADs=

--MyBoundary--

As you can see, our message now contains two different parts: the message body, Hi...image!, and an encoded form of the image, R01GO...AADs=. We'll take a closer look at these parts in a moment. How does your mail program know where the different parts begin and end? Look at the headers, and you'll see the answer: there is a new header called Content-Type, with a boundary variable. In this example, I've set the boundary to MyBoundary. The MIME parts are separated by a line with two dashes and that boundary string: --MyBoundary. At the end of the message, the boundary is followed by two dashes to indicate the end.

Note that this boundary can be set to any possible string. It's fairly obvious that normal mail programs don't use 'MyBoundary' as the boundary string, but take a longer random string. It still has the same effect, though.

The MIME parts

Now, let's take a look at the MIME parts. You'll notice that these parts look a lot like the email message: it starts with a number of headers, followed by an empty line and then there is the body.

Content-Type: image/gif; name="myimage.gif"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="myimage.gif"

R0lGODlhogBrAPcAAAAAAP///zwKC2UhIrdPUKU0OMmLjagEDJIEDZEmLduztW8HD64YJlUJEXkX
...omitted many lines like the above...
gahQUgx4n5kQWQQJ0sqEREAAADs=

This is the MIME part containing the attached image. You see the Content-Type header, that tells us more about the type of this part. In our example, this part is a gif image. The original filename is also included in the headers.

The body of this part looks very strange. Many similar lines followed the first line of characters, but I deleted them except from the last line. What you see here is an encoded version of the original file. Since an email message can only contain normal text, the binary form of the image had to be translated to a text form. This is called MIME encoding. The Content-Transfer-Encoding tells us what type of encoding was used. The PEAR mimeDecode module, and every other MIME compatible email reader, will use this information to decode the file.

Decoding the message

Now that we know a little about MIME email, we can almost start writing the script. But wait, you'll first have to get a message that your script can parse. If you're using Linux/Unix, you can send yourself a MIME email and copy your mbox file to get the source of that message. You can also set up a email to PHP script to get the message source. If you just want to test the mimeDecode module, you can also just download the source of my example message. In this example script, I will assume that the source email is saved as $input.

Initialising mimeDecode

To load the mimeDecode module, we just have to include mail/mimeDecode. The PEAR path is specified in PHP's include_path, so we don't have to worry about the absolute location of this file.

include('mail/mimeDecode');

Setting the parameters

The mimeDecode module accepts five parameters. include_bodies determines whether we want the bodies of the MIME parts to be returned. If decode_bodies is set to true, the encoded MIME parts (files etc.) are decoded. If decode_headers is true, mimeDecode decodes the message headers. Since we want the whole message to be decoded, we'll set these three boolean parameters to true.

The fourth parameter, crlf, tells mimeDecode the line ending type. The default is \r\n (carrier return, line feed), which works in most cases. We won't specify this. The last parameter is called input and is used to send the input message to mimeDecode.

$params['include_bodies'] = true;
$params['decode_bodies'] = true;
$params['decode_headers'] = true;
$params['input'] = $input;

Running decode()

It's time to run mimeDecode's decode() function, the function that decodes the MIME message and returns a nice structure. The easiest way to run decode() is without creating a new object. That does mean that we have to tell PHP in which class decode() is to be found.

$structure = Mail_mimeDecode::decode($params);

The decoded message is saved in $structure.

The mimeDecode output

We've now got the decoded message in the $structure variable. If you run print_r($structure);, you'll get a nice view of the anatomy of that structure.

$structure->headers: An array containing the headers of the message. $structure->headers['subject'] contains the Subject: header and so on. Note that multiple headers with the same name are saved in an array, like: $structure->headers['received'][0].
$structure->ctype_primary: The first part of the message's content type. If the content type is multipart/mixed, this is set to multipart.
$structure->ctype_secondary: The second part of the message's content type. (multipart/mixed gives a value of mixed.
$structure->ctype_parameter: An array of the parameters of the content type. In case of a MIME message this includes the boundary= value.
$structure->content_disposition: Contains the value of the Content-Disposition: header, if set. In case of an attachment, this is set to attachment.
$structure->d_parameters: As with the Content-Type: header, any parameters of the Content-Disposition: header are returned in this array. Attachments, for instance, bring a filename parameter.
$structure->body: The body of the message or the MIME part. In general, the main message of a MIME email doesn't have a body value.
$structure->parts: This is an array of all MIME parts found in the message. Each of these parts has the same properties as the $structure variable described here. Eg. the headers of $structures->parts[0] can be found in $structures->parts[0]->headers etc.

Saving all attached files

We can now walk through the $structure->parts array and save each attachment we find.

foreach ($structure->parts as $part) {
    // only save if an attachment
    if (isset($part->disposition) and
        ($part->disposition=='attachment') {
        // open file
        $fp = fopen($part->d_parameters['filename'], 'w');
        // write body
        fwrite($fp, $part->body);
        // close file
        fclose($fp);
    }
}

For each part, we check if it has a Content-Disposition: header set to 'attachment'. We open a file with the name given by the filename parameter of the Content-Type: header. We then save the (decoded) body of this part in that file.

Listing all images

In the same way, we can list all images and their sizes by checking the ctype_primary and body values. We can then send this list to the sender of the original email.

$list = '';
foreach ($structure->parts as $part) {
    // is this an image?
    if ($part->ctype_primary=='image') {
        $list .= $part->d_parameters['filename'].': '.
                 strlen($part->body)." bytes\n";
    }
}
// send this list
$to = $structure->headers['from'];
$subject = 'Re: '.$structure->headers['subject'];
$body = "You sent us these images:\n\n$list\n\nThank you very much!";
$headers = 'From: '.$structure->headers['to'];
mail($to, $subject, $body, $headers);

If the ctype_primary is 'image' (from image/gif, image/jpeg etc.), we add the filename and the length of the body string to the list. After we viewed all images, we send the list to the sender of the original email. We use the information of the original email, the From:, Subject: and To: headers, to make a nice reply.

What's next?

By now, you should be able to write your own scripts using the mimeDecode module. You could, for example, write a script that lets you just email images and get them published on your web log. In the rare event that you are a teacher, you could save all documents that your students send you in your 'incoming files' directory, without having to open each message first.