How to read eml file in python? -


i not known how load eml file in python 3.4.
want list , read of them in python.

enter image description here

this how content of e-mail i.e. *.eml file. works on python2.5 - 2.7. try on 3. should work well.

  email import message_from_file import os  # path directory attachments stored: path = "./msgfiles"  # have attachments extracted memory, change behaviour of 2 following functions:  def file_exists (f):     """checks whether extracted file extracted before."""     return os.path.exists(os.path.join(path, f))  def save_file (fn, cont):     """saves cont file fn"""     file = open(os.path.join(path, fn), "wb")     file.write(cont)     file.close()  def construct_name (id, fn):     """constructs file name out of messages id , packed file name"""     id = id.split(".")     id = id[0]+id[1]     return id+"."+fn  def disqo (s):     """removes double or single quotations."""     s = s.strip()     if s.startswith("'") , s.endswith("'"): return s[1:-1]     if s.startswith('"') , s.endswith('"'): return s[1:-1]     return s  def disgra (s):     """removes < , > html-like tag or e-mail address or e-mail id."""     s = s.strip()     if s.startswith("<") , s.endswith(">"): return s[1:-1]     return s  def pullout (m, key):     """extracts content e-mail message.     works multipart , nested multipart messages too.     m   -- email.message() or mailbox.message()     key -- initial message id (some string)     returns tuple(text, html, files, parts)     text  -- text parts.     html  -- htmls parts     files -- dictionary mapping extracted file message id belongs to.     parts -- number of parts in original message.     """     html = ""     text = ""     files = {}     parts = 0     if not m.is_multipart():         if m.get_filename(): # it's attachment             fn = m.get_filename()             cfn = construct_name(key, fn)             files[fn] = (cfn, none)             if file_exists(cfn): return text, html, files, 1             save_file(cfn, m.get_payload(decode=true))             return text, html, files, 1         # not attachment!         # see belongs. text, html or other data:         cp = m.get_content_type()         if cp=="text/plain": text += m.get_payload(decode=true)         elif cp=="text/html": html += m.get_payload(decode=true)         else:             # else!             # extract message id , file name if there one:             # packed file , name contained in content-type header             # instead of content-disposition header explicitly             cp = m.get("content-type")             try: id = disgra(m.get("content-id"))             except: id = none             # find file name:             o = cp.find("name=")             if o==-1: return text, html, files, 1             ox = cp.find(";", o)             if ox==-1: ox = none             o += 5; fn = cp[o:ox]             fn = disqo(fn)             cfn = construct_name(key, fn)             files[fn] = (cfn, id)             if file_exists(cfn): return text, html, files, 1             save_file(cfn, m.get_payload(decode=true))         return text, html, files, 1     # multipart message.     # so, iterate on , call pullout() recursively each part.     y = 0     while 1:         # if cannot payload, means hit end:         try:             pl = m.get_payload(y)         except: break         # pl new message object goes pullout         t, h, f, p = pullout(pl, key)         text += t; html += h; files.update(f); parts += p         y += 1     return text, html, files, parts  def extract (msgfile, key):     """extracts data e-mail, including from, to, etc., , returns dictionary.     msgfile -- file-like readable object     key     -- id string particular message. can file name or anything.     returns dict()     keys: from, to, subject, date, text, html, parts[, files]     key files present when message contained binary files.     more see __doc__ pullout() , caption() functions.     """     m = message_from_file(msgfile)     from, to, subject, date = caption(m)     text, html, files, parts = pullout(m, key)     text = text.strip(); html = html.strip()     msg = {"subject": subject, "from": from, "to": to, "date": date,         "text": text, "html": html, "parts": parts}     if files: msg["files"] = files     return msg  def caption (origin):     """extracts: to, from, subject , date email.message() or mailbox.message()     origin -- message() object     returns tuple(from, to, subject, date)     if message doesn't contain one/more of them, empty strings returned.     """     date = ""     if origin.has_key("date"): date = origin["date"].strip()     = ""     if origin.has_key("from"): = origin["from"].strip()     = ""     if origin.has_key("to"): = origin["to"].strip()     subject = ""     if origin.has_key("subject"): subject = origin["subject"].strip()     return from, to, subject, date  
# usage: f = open("message.eml", "rb") print extract(f, f.name) f.close() 

i programmed mailgroup using mailbox, why convoluted. never failed me. never junk. if message multipart, output dictionary contain key "files" (a sub dict) filenames of extracted other files not text or html. way of extracting attachments , other binary data. may change in pullout(), or change behaviour of file_exists() , save_file().

construct_name() constructs filename out of message id , multipart message filename, if there one.

in pullout() text , html variables strings. online mailgroup ok text or html packed multipart wasn't attachment @ once.

if need more sophisticated change text , html lists , append them , add them needed. nothing problematic.

maybe there errors here, because intended work mailbox.message(), not email.message(). tried on email.message() , worked fine.

you said, "wish list them all". where? if refer pop3 mailbox or mailbox of nice open-source mailer, using mailbox module. if want list them others, have problem. example, mails ms outlook, have know how read ole2 compound files. other mailers refer them *.eml files, think do. search on pypi olefile or compoundfiles module , google around how extract e-mail ms outlook inbox file. or save mess , export them there directory. when have them eml files, apply code.


Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

jquery - javascript onscroll fade same class but with different div -