April 15, 20232 yr I have 63,000+ email files (file extension: .eml) in a folder on my Mac ( filemac:/Macintosh HD/Users/onemac/Documents/emails ). I want to import each of these .eml files into a container field ( field name: Email_Container ). What is the best way to handle this? I’m wanting a separate record for each email. Regarding the container field contents, I’m wanting to "Store only a reference to the file." It appears that I need to use Insert File (not Import Records) because I’ll need to later use GetContainerAttribute to pullout the file name for parsing purposes ... When I use Import Records for these .eml files, the GetContainerAttribute does not work. However, if I use Insert File, it works. The script step Insert File does not seem to support a path to a folder (only to a file). From what I understand, it looks like I’ll need to use the Get Folder Path script step along with the Get(DocumentPath) function, but I must be doing something wrong. This should be really easy, but I not getting my mind around this one. Help with writing this one would certainly be appreciated. 🙂
April 15, 20232 yr I think the easiest way would be to open the folder in Finder, select all files, copy and paste into a text file. Then import the text file (as .tab or .csv) and use a calculation field to return the full path to each file. If you set the result type to Container, the effect will be the same as inserting the file as reference only. Or you could go an extra step and use Replace Field Contents to populate an actual container field with the calculated path.
April 15, 20232 yr Author 1 hour ago, comment said: I think the easiest way would be to open the folder in Finder, select all files, copy and paste into a text file. Then import the text file (as .tab or .csv) and use a calculation field to return the full path to each file. If you set the result type to Container, the effect will be the same as inserting the file as reference only. Or you could go an extra step and use Replace Field Contents to populate an actual container field with the calculated path. Comment, You're great. Both of your options worked like a charm. 🙂 As far as parsing these .eml file names into three separate calculation fields, do you have a stream-lined solution? Example file name #1 ... New EOB Posting - "EOB Notice" <[email protected]> - 2010-01-10 0944.eml cSubject: New EOB Posting cEmail: [email protected] cDate: 2010-01-10 (YYYY-MM-DD format - Calculation result of TEXT (vs DATE) is fine) Example file name #2 ... BMI Work Registration Report - <[email protected]> - 2012-06-21 1534.eml cSubject: BMI Work Registration Report cEmail: [email protected] cDate: 2012-06-21 (YYYY-MM-DD format - Calculation result of TEXT (vs DATE) is fine)
April 15, 20232 yr I don't know if it's safe to generalize from only 2 examples, but try: cSubject = Left ( Filename ; Position ( Filename ; " - " ; 1 ; 1 ) - 1 ) cEmail = Let ( [ start = Position ( Filename ; "<" ; 1 ; 1 ) + 1 ; end = Position ( Filename ; ">" ; start ; 1 ) ] ; Middle ( Filename ; start ; end - start ) ) cDate = Middle ( Filename ; Position ( Filename ; " - " ; 1 ; 2 ) + 3 ; 10 ) Edited April 16, 20232 yr by comment
April 17, 20232 yr Author Hey comment, Thanks for your attention to detail. 🙂 I have imported 66,884 emails successfully. You were TOTALLY correct when you said: “I don't know if it's safe to generalize from only 2 examples, but try:" cDate parsing is failing with email addresses like this … * Statements and Payment Reports Request - HarryFox - NoReply <[email protected]> - 2020-01-22 0941.eml * It takes less than 10 seconds to rate your latest SugarSync Suppor...SugarSync <[email protected]> - 2019-08-08 1607.eml cEmail parsing is failing with this type of email address … * Form submission from peckmusicgroup.net - [email protected] - 2007-06-21 1645-2.eml cDate parsing and cEmail parsing are failing on this type of email address … * My Email Address Has Changed Re_ Over any earthly rule...- [email protected] - 2016-11-08 1547.eml Keep in mind (this may be helpful), there are certain email addressed where the @ symbol appears more than once in the .eml file name … * An Seong Bok has just paid for your invoice [email protected]" <[email protected]> - 2012-09-20 1835.eml Thoughts? Thanks
April 17, 20232 yr I think we need to understand the rules that were used to construct these strings, before we can formulate the rules to take them apart. Looking at your added examples, I must admit I don't see the logic. I thought there were 3 main components separated by " - " but now you show an example with 3 such separators and others with only one. Possibly the date could be extracted by looking for the last separator instead of the second one, but that still leaves the other two components in a rather hazy state. I can help you with reversing the logic, but I have no advantage in deducing the original one.
April 17, 20232 yr Author This is a challenging situation for sure. 🤔 The accuracy of the cDate is most the most important element. The accuracy of the cEmail is something I'd like to have fairly close ... I could alway manually adjust some. I believe there are very few .eml file names with more than one @ symbol, so maybe that's something that helps. The cSubject is not that important. Thanks 🙂
April 17, 20232 yr I don't see how that's moving us forward. As I said, the date could possibly be extracted more reliably using: Middle ( Filename ; Position ( Filename ; " - " ; Length ( Filename ) ; -1 ) + 3 ; 10 ) As for the email, if we could assume that the email you want contains the last @ character in the entire string AND that the email is surrounded either by spaces or by angle brackets, then we could do something with that. But I am purely guessing at this point.
April 17, 20232 yr Author It looks like your latest cDate is EXTREMELY accurate. 🙂 🙂 Your latest thoughts concern the cEmail should nail it ... The email address we want to parse contains the last@ character in the entire string AND that email address is surrounded either by spaces or by angle brackets. Thanks!
April 17, 20232 yr Ok, so try: Let ( [ mask = Substitute ( Filename ; [ "<" ; " " ] ; [ ">" ; " " ] ) ; at = Position ( mask ; "@" ; 1 ; PatternCount ( Filename ; "@" ) ) ; start = Position ( mask ; " " ; at ; -1 ) ; end = Position ( mask ; " " ; at ; 1 ) ] ; Trim ( Middle ( mask ; start ; end - start ) ) )
Create an account or sign in to comment