Archiving emails: how and why?

Digitally archiving emails is a challenge for many organisations. Mailboxes are bursting with incoming and outgoing messages, and increasingly becoming a repository for information and knowledge, partly because they can include all sorts of attached files. Emails are born-digital documents, and this digital feature is an essential characteristic that needs to be stored.[1]

When storing emails, retaining the link between attachments and the email itself poses an extra challenge.


Why?

There is no standard format for storing emails on mail servers or in email clients. Email clients, such as Microsoft Outlook or Apple Mail, use their own closed file format. This can mean that not all metadata is stored and emails become unreadable if the email client stops being available. People are also increasingly using webmail services such as Gmail or Outlook (formerly Hotmail), so if these services stop functioning or start to require (high) costs, you risk losing all your emails.

How?

Organise your emails

You can organise your emails by structuring your mailbox[2] according to your organisation's folder structure. (See Draw up an organisational plan/folder structure). If you also use your personal email address, you can create separate folders for your personal emails and emails that you've sent and received to perform work or other tasks for your organisation. This makes it easier to find emails again, and to save emails and attachments outside your email client in the right folders in your folder structure at a later date.

Clean up your mailbox

Your email archive is more accessible if you regularly clean up your mailbox. After all, email correspondence always contains a lot of dead weight – such as spam, adverts and news articles – which doesn't need to be kept. So cleaning up your mailbox makes it easier to find emails and ensures you only keep emails that are worth archiving. You can agree a selection list in your organisation to decide which emails need to be stored and which don't.[3] Emails that are sent or received as part of your organisation's work or other activities are kept; purely informative emails that don't have any direct link with your organisation can be deleted.

Use a suitable protocol

Emails are retrieved from the mail server for archiving, and you should use the IMAPS protocol for this. IMAPS is a form of IMAP, a standard protocol[4] for retrieving emails from the mail server and sending them to an email application over an encrypted (and therefore secure) connection. It saves emails according to the structure you use to organise your emails in your mailbox, and doesn't delete them when they are retrieved from the server.

Store all the essential email properties

Authenticity and integrity are central concepts for archiving. Authenticity assures you that an archive item is what it claims to be, and integrity ensures that the content of an archive item is complete and true. The following elements need to be saved in order to preserve these two properties for emails:

  • Origin context: this is all the details that are displayed for an archive item in relation to the archive creator's activities. It clarifies the subject or matter that the email relates to, its origin, and the mutual relationship between related emails, attachments and archive documents.
  • Structure displays the relationships between the different components of an email (Header, body[5] and attachments) and between related emails (e.g. when replying to or forwarding an email).
  • Content consists of the email subject, the text that is sent, and the attachments.
  • Appearance: layout is not an essential feature of emails; after all, it depends on the email client and the device on which you open the email. If an email has artistic value or the layout clarifies the email message structure or content, it can be important to also save this aspect, however.

You can store these essential features by choosing a file format that saves emails in accordance with the Internet Message Format (IMF). IMF is a standard format for transporting emails, which makes it possible for your email application to read any email that someone sends to you via their email client or webmail service, even if you don't use the same application.

You should therefore never print out emails because they contain hidden metadata that you don't see when you open them in your email application. This metadata contains information about the elements of the document that you want to save, which you lose if you print it out. You can also preserve the origin context of emails by filing them in a well-organised folder structure (see 1. Organise your emails).

Separate emails and attachments

When attachments are not sent in a permanent file format, there's a risk of obsolescence. And even emails that are written using HTML sometimes include images that are stored on an external web server because otherwise the message becomes too big, meaning you can lose these images if you don't store them separately. You should therefore save images and attachments separately from the email in a suitable archiving format, but make sure that the relationship between the emails, images and attachments remains clear by using the same filename for the different components.

Choose a suitable file format

A suitable file format is a format that is standardised, with an open file specification, and which can be read by different applications, so you're not reliant on a particular software supplier. It is crucial that the file format stores all the essential email properties.

EML and MBOX are the de facto standards for storing emails.[6] These file formats save emails with attachments in accordance with the Internet Message Format (IMF) and can be opened by most email applications and word processing programs. EML saves files separately (one email is one EML file), whereas MBOX can save an entire email archive (one email archive is one MBOX file). MBOX saves an email archive in the email client according to the structure in which the emails are organised. Because MBOX stores an entire email archive in one file, it is difficult to store attachments and emails separately and still maintain the connection between attachments and emails by using the filename, so try to use EML. MBOX can however be a useful format when you want to bulk export an email archive, for example when an employee leaves the organisation and wants to transfer all their emails.

Save emails permanently

Email clients and mail servers are not designed to store emails permanently, so emails that are worth archiving always need to be saved outside the email application. Some email clients and webmail applications provide an archiving function, but this is not permanent storage. Email clients use proprietary and compressed formats that can result in a loss of metadata and information. It is not clear how emails are archived with webmail, and you remain reliant on companies such as Google and Microsoft to manage your files.

It is often not possible to easily export emails with webmail applications, so it's better to use an email client such as Mozilla Thunderbird. This is a free and open source email application that makes it possible to export emails in EML and MBOX format. Email clients have the advantage that you can build a folder structure in the mailbox.[7] Note that some email clients, such as Microsoft Outlook, cannot export in EML or MBOX.[8]

Apart from that, the general rules for long-term email storage apply. Always make sure that you use good back-up procedures and that you store different back-ups of your files in different (geographical) locations. Use checksums to safeguard the integrity of your files and check the files periodically, and keep a close eye on developments in file formats. This is particularly important for attachments because of the wide variety of file formats available. (See Storing your digital archive).

Get started with some tools


Authors: Nastasia Vanderperren (meemoo) with help from Rony Vissers (meemoo) and Pieter De Praetere

  1. See also: Guideline 1 from Edavid (in Dutch).
  2. mailbox refers to both the inbox and the outbox.
  3. A selection list is a document that determines which documents to keep or delete. You can find an example of a selection list for emails here (in Dutch)
  4. Protocols are rules that computers need to follow to communicate with each other. POP and IMAP are protocols for retrieving emails from the server. Internet Message Access Protocol, usually abbreviated to IMAP, makes it possible to synchronise emails so that you can look up your emails on all your devices – smartphone, tablet, laptop, computer, etc. POP, short for Post Office Protocol, removes the emails from the mail server when you retrieve them. IMAPS is a form of IMAP that encrypts the traffic, which makes it more secure.
  5. The body is the text field in an email, the section in which the sender writes their message.
  6. EML and MBOX are standardised and widely supported, but not open.
  7. See also: https://kadoc.kuleuven.be/advies/digitaalorde (link in Dutch)
  8. The aforementioned Apple Mail can export emails in EML, however.

Share this article:          

TRACKS is a collaboration between these partners: