Is there any way to reliably parse email messages?

What I’m talking about is constructing a tree from an email message. Usuallu when you get an email and reply back and forth then you end up with something that looks something like this:

[cc lang=”email”]
Lorem ipsum dolor sit amet, consectetur adipisicing elit,
sed do eiusmod tempor incididunt ut labore et dolore magna
aliqua.

On 01/01/1970, Someone <some@email.com> wrote:
> Ut enim ad minim veniam, quis nostrud exercitation ullamco
> laboris nisi ut aliquip ex ea commodo consequat. Duis aute
> irure dolor in reprehenderit in voluptate velit esse cillum
> dolore eu fugiat nulla pariatur.

Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum.

> Sed ut perspiciatis unde omnis iste natus error sit
> voluptatem accusantium doloremque laudantium, totam
> rem aperiam, eaque ipsa quae ab illo inventore veritatis
> et quasi architecto beatae vitae dicta sunt explicabo.
>
> On 01/01/2000, Someone else <some.other@email.com>
> wrote:
>> Nemo enim ipsam voluptatem quia voluptas sit
>> aspernatur aut odit aut fugit, sed quia consequuntur
>> magni dolores eos qui ratione voluptatem sequi nesciunt.
>> Neque porro quisquam est, qui dolorem ipsum quia dolor
>> sit amet, consectetur, adipisci velit, sed quia non
>> numquam eius modi tempora incidunt ut labore et dolore
>> magnam aliquam quaerat voluptatem.
>
> Ut enim ad minima veniam, quis nostrum exercitationem ullam
> corporis suscipit laboriosam, nisi ut aliquid ex ea commodi
> consequatur?

Quis autem vel eum iure reprehenderit qui in ea
voluptate velit esse quam nihil molestiae consequatur, vel
illum qui dolorem eum fugiat quo voluptas nulla pariatur?
[/cc]

Is there any way to reliably parse a message like this and get a tree of messages with headers, top-posts, inline replies, bottom-posts, etc? I’be been looking around for some time but I haven’t been able to find anything close to that.

Well this is something I plan on adding to the Notifiy wave robot so that it parses incoming emails and posts the relevant content to wave and even corrections to previous posts. Maybe in this case is easier than a multi level email since replies will always start from a clean message, so there’s just one level up.

Leave a Reply

*