Using PEAR’s mimeDecode module
While most MIME email is totally useless, it would be better if it was sent as a plain text message, MIME sometimes is useful. When sending attachments, for example. In this article, we will see how we can decode MIME messages with PHP.
We will use the mimeDecode module to do the actual decoding for us. mimeDecode is part of the PEAR library. PEAR is already installed if you have a newer version of PHP, but if you don't it's also easy to install.
How to get PEAR and mimeDecode
Before trying to install PEAR and the mimeDecode module, you should first make sure that you don't yet have it. If you have a recent version of PHP (> 4.3.0pre1), the PEAR base installation is already installed on your system. Since the mimeDecode module is part of the PEAR core, that's also installed. To check whether you have to install the mimeDecode or not, you can run the following PHP script:
<?php include('Mail/mimeDecode.php'); ?>
If this doesn't give an error, you the mimeDecode module is installed on your system. If it does, you'll have to install PEAR (or upgrade your version of PHP). Read the PEAR manual for instructions.
MIME: an introduction
Before you can successfully write a script that decodes MIME email, you'll have to know a little about the anatomy of a MIME message. I'll give you a short introduction to MIME. If you want to read more about the details of MIME, there is a chapter available from O'Reilly.
A tradition email
The source of a basic, non-MIME email looks like this:
From: Gijs van Tulder <gvtulder@example.com> To: thelist@lists.evolt.org Subject: Decoding MIME mail Date: Wed, 12 Mar 2003 10:26:59 +0100 Hi, this is my message.
The first lines of this email contain headers, data about
this message. These headers consist of a header name,
before the colon, and some data, after the colon. (If you
are familiar with the HTTP headers, you'll notice that
email headers use the same syntax.) There are actually many
more possible headers that I didn't include in this
example, but all headers are in the form Name:
Data
. You can see the headers of your email in your
email program. (In Microsoft Outlook 2000, for example,
right-click on the message and select 'Options'.)
The body of the message, in our example Hi, this is
my message.
starts after the blank line that follows
the headers.
A MIME email
We'll now see what happens with our message when we add a MIME attachment to it.
From: Gijs van Tulder <gvtulder@example.com> To: thelist@lists.evolt.org Subject: Decoding MIME mail Date: Wed, 12 Mar 2003 10:26:59 +0100 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="MyBoundary" This is a multi-part message in MIME format. --MyBoundary Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Hi, this is my message. See the attached image! --MyBoundary Content-Type: image/gif; name="myimage.gif" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="myimage.gif" R0lGODlhogBrAPcAAAAAAP///zwKC2UhIrdPUKU0OMmLjagEDJIEDZEmLduztW8HD64YJlUJEXkX ...omitted many lines like the above... gahQUgx4n5kQWQQJ0sqEREAAADs= --MyBoundary--
As you can see, our message now contains two different
parts: the message body, Hi...image!
, and an
encoded form of the image, R01GO...AADs=
.
We'll take a closer look at these parts in a moment. How
does your mail program know where the different parts begin
and end? Look at the headers, and you'll see the answer:
there is a new header called Content-Type
,
with a boundary variable. In this example, I've set the
boundary to MyBoundary
. The MIME parts are
separated by a line with two dashes and that boundary
string: --MyBoundary
. At the end of the
message, the boundary is followed by two dashes to indicate
the end.
Note that this boundary can be set to any possible string. It's fairly obvious that normal mail programs don't use 'MyBoundary' as the boundary string, but take a longer random string. It still has the same effect, though.
The MIME parts
Now, let's take a look at the MIME parts. You'll notice that these parts look a lot like the email message: it starts with a number of headers, followed by an empty line and then there is the body.
Content-Type: image/gif; name="myimage.gif" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="myimage.gif" R0lGODlhogBrAPcAAAAAAP///zwKC2UhIrdPUKU0OMmLjagEDJIEDZEmLduztW8HD64YJlUJEXkX ...omitted many lines like the above... gahQUgx4n5kQWQQJ0sqEREAAADs=
This is the MIME part containing the attached image. You
see the Content-Type
header, that tells us
more about the type of this part. In our example, this part
is a gif image. The original filename is also included in
the headers.
The body of this part looks very strange. Many similar
lines followed the first line of characters, but I deleted
them except from the last line. What you see here is an
encoded version of the original file. Since an email
message can only contain normal text, the binary form of
the image had to be translated to a text form. This is
called MIME encoding. The
Content-Transfer-Encoding
tells us what type
of encoding was used. The PEAR mimeDecode module, and every
other MIME compatible email reader, will use this
information to decode the file.
Decoding the message
Now that we know a little about MIME email, we can almost
start writing the script. But wait, you'll first have to
get a message that your script can parse. If you're using
Linux/Unix, you can send yourself a MIME email and copy
your mbox file to get the source of that message. You can
also set up a email to
PHP script to get the message source. If you just want
to test the mimeDecode module, you can also just download
the source of my
example message. In this example script, I will assume
that the source email is saved as $input
.
Initialising mimeDecode
To load the mimeDecode module, we just have to include
mail/mimeDecode
. The PEAR path is specified in
PHP's include_path, so we don't have to worry about the
absolute location of this file.
include('mail/mimeDecode');
Setting the parameters
The mimeDecode module accepts five parameters.
include_bodies
determines whether we want the
bodies of the MIME parts to be returned. If
decode_bodies
is set to true, the encoded MIME
parts (files etc.) are decoded. If
decode_headers
is true, mimeDecode decodes the
message headers. Since we want the whole message to be
decoded, we'll set these three boolean parameters to true.
The fourth parameter, crlf
, tells mimeDecode
the line ending type. The default is \r\n (carrier return,
line feed), which works in most cases. We won't specify
this. The last parameter is called input
and
is used to send the input message to mimeDecode.
$params['include_bodies'] = true; $params['decode_bodies'] = true; $params['decode_headers'] = true; $params['input'] = $input;
Running decode()
It's time to run mimeDecode's decode()
function, the function that decodes the MIME message and
returns a nice structure. The easiest way to run
decode()
is without creating a new object.
That does mean that we have to tell PHP in which class
decode()
is to be found.
$structure = Mail_mimeDecode::decode($params);
The decoded message is saved in $structure
.
The mimeDecode output
We've now got the decoded message in the
$structure
variable. If you run
print_r($structure);
, you'll get a nice view
of the anatomy of that structure.
-
$structure->headers
: An array containing the headers of the message.$structure->headers['subject']
contains theSubject:
header and so on. Note that multiple headers with the same name are saved in an array, like:$structure->headers['received'][0]
. -
$structure->ctype_primary
: The first part of the message's content type. If the content type ismultipart/mixed
, this is set tomultipart
. -
$structure->ctype_secondary
: The second part of the message's content type. (multipart/mixed
gives a value ofmixed
. -
$structure->ctype_parameter
: An array of the parameters of the content type. In case of a MIME message this includes theboundary=
value. -
$structure->content_disposition
: Contains the value of theContent-Disposition:
header, if set. In case of an attachment, this is set toattachment
. -
$structure->d_parameters
: As with theContent-Type:
header, any parameters of theContent-Disposition:
header are returned in this array. Attachments, for instance, bring afilename
parameter. -
$structure->body
: The body of the message or the MIME part. In general, the main message of a MIME email doesn't have a body value. -
$structure->parts
: This is an array of all MIME parts found in the message. Each of these parts has the same properties as the$structure
variable described here. Eg. the headers of$structures->parts[0]
can be found in$structures->parts[0]->headers
etc.
Saving all attached files
We can now walk through the
$structure->parts
array and save each
attachment we find.
foreach ($structure->parts as $part) { // only save if an attachment if (isset($part->disposition) and ($part->disposition=='attachment') { // open file $fp = fopen($part->d_parameters['filename'], 'w'); // write body fwrite($fp, $part->body); // close file fclose($fp); } }
For each part, we check if it has a
Content-Disposition:
header set to
'attachment'. We open a file with the name given by the
filename
parameter of the
Content-Type:
header. We then save the
(decoded) body of this part in that file.
Listing all images
In the same way, we can list all images and their sizes by
checking the ctype_primary
and
body
values. We can then send this list to the
sender of the original email.
$list = ''; foreach ($structure->parts as $part) { // is this an image? if ($part->ctype_primary=='image') { $list .= $part->d_parameters['filename'].': '. strlen($part->body)." bytes\n"; } } // send this list $to = $structure->headers['from']; $subject = 'Re: '.$structure->headers['subject']; $body = "You sent us these images:\n\n$list\n\nThank you very much!"; $headers = 'From: '.$structure->headers['to']; mail($to, $subject, $body, $headers);
If the ctype_primary
is 'image' (from
image/gif, image/jpeg etc.), we add the filename and the
length of the body
string to the list. After
we viewed all images, we send the list to the sender of the
original email. We use the information of the original
email, the From:
, Subject:
and
To:
headers, to make a nice reply.
What's next?
By now, you should be able to write your own scripts using the mimeDecode module. You could, for example, write a script that lets you just email images and get them published on your web log. In the rare event that you are a teacher, you could save all documents that your students send you in your 'incoming files' directory, without having to open each message first.