Container field: How to insert all files from a folder.

kcep · April 15, 2023

I have 63,000+ email files (file extension: .eml) in a folder on my Mac ( filemac:/Macintosh HD/Users/onemac/Documents/emails ). I want to import each of these .eml files into a container field ( field name: Email_Container ). What is the best way to handle this? I’m wanting a separate record for each email. Regarding the container field contents, I’m wanting to "Store only a reference to the file." It appears that I need to use Insert File (not Import Records) because I’ll need to later use GetContainerAttribute to pullout the file name for parsing purposes ... When I use Import Records for these .eml files, the GetContainerAttribute does not work. However, if I use Insert File, it works. The script step Insert File does not seem to support a path to a folder (only to a file). From what I understand, it looks like I’ll need to use the Get Folder Path script step along with the Get(DocumentPath) function, but I must be doing something wrong. This should be really easy, but I not getting my mind around this one. Help with writing this one would certainly be appreciated. 🙂

comment · April 15, 2023

I think the easiest way would be to open the folder in Finder, select all files, copy and paste into a text file. Then import the text file (as .tab or .csv) and use a calculation field to return the full path to each file. If you set the result type to Container, the effect will be the same as inserting the file as reference only. Or you could go an extra step and use Replace Field Contents to populate an actual container field with the calculated path.

kcep · April 15, 2023

1 hour ago, comment said:

I think the easiest way would be to open the folder in Finder, select all files, copy and paste into a text file. Then import the text file (as .tab or .csv) and use a calculation field to return the full path to each file. If you set the result type to Container, the effect will be the same as inserting the file as reference only. Or you could go an extra step and use Replace Field Contents to populate an actual container field with the calculated path.

Comment,

You're great. Both of your options worked like a charm. 🙂

As far as parsing these .eml file names into three separate calculation fields, do you have a stream-lined solution?

Example file name #1 ...

New EOB Posting - "EOB Notice" <[email protected]> - 2010-01-10 0944.eml

cSubject: New EOB Posting

cEmail: [email protected]

cDate: 2010-01-10 (YYYY-MM-DD format - Calculation result of TEXT (vs DATE) is fine)

Example file name #2 ...

BMI Work Registration Report - <[email protected]> - 2012-06-21 1534.eml

cSubject: BMI Work Registration Report

cEmail: [email protected]

cDate: 2012-06-21 (YYYY-MM-DD format - Calculation result of TEXT (vs DATE) is fine)

comment · April 15, 2023

I don't know if it's safe to generalize from only 2 examples, but try:

cSubject =

Left ( Filename ; Position ( Filename ; " - " ; 1 ; 1 ) - 1 )

cEmail =

Let ( [
start = Position ( Filename ; "<" ; 1 ; 1 ) + 1 ;
end = Position ( Filename ; ">" ; start ; 1 )
] ;
Middle ( Filename ; start ; end - start )
)

cDate =

Middle ( Filename ; Position ( Filename ; " - " ; 1 ; 2 ) + 3 ; 10 )

Edited April 16, 2023 by comment

kcep · April 16, 2023

Fantastic 🙂

I am SO grateful for your expert advice. 🙂

Thanks !

comment · April 16, 2023

Now that I look at it, cDate could be simplified - see the edited version.

kcep · April 17, 2023

Hey comment,

Thanks for your attention to detail. 🙂

I have imported 66,884 emails successfully.

You were TOTALLY correct when you said:

“I don't know if it's safe to generalize from only 2 examples, but try:"

cDate parsing is failing with email addresses like this …

* Statements and Payment Reports Request - HarryFox - NoReply <[email protected]> - 2020-01-22 0941.eml

* It takes less than 10 seconds to rate your latest SugarSync Suppor...SugarSync <[email protected]> - 2019-08-08 1607.eml

cEmail parsing is failing with this type of email address …

* Form submission from peckmusicgroup.net - [email protected] - 2007-06-21 1645-2.eml

cDate parsing and cEmail parsing are failing on this type of email address …

* My Email Address Has Changed Re_ Over any earthly rule...- [email protected] - 2016-11-08 1547.eml

Keep in mind (this may be helpful), there are certain email addressed where the @ symbol appears more than once in the .eml file name …

* An Seong Bok has just paid for your invoice [email protected]" <[email protected]> - 2012-09-20 1835.eml

Thoughts?

Thanks

comment · April 17, 2023

I think we need to understand the rules that were used to construct these strings, before we can formulate the rules to take them apart.

Looking at your added examples, I must admit I don't see the logic. I thought there were 3 main components separated by " - " but now you show an example with 3 such separators and others with only one. Possibly the date could be extracted by looking for the last separator instead of the second one, but that still leaves the other two components in a rather hazy state.

I can help you with reversing the logic, but I have no advantage in deducing the original one.

kcep · April 17, 2023

This is a challenging situation for sure. 🤔

The accuracy of the cDate is most the most important element.

The accuracy of the cEmail is something I'd like to have fairly close ... I could alway manually adjust some. I believe there are very few .eml file names with more than one @ symbol, so maybe that's something that helps.

The cSubject is not that important.

Thanks 🙂

comment · April 17, 2023

I don't see how that's moving us forward. As I said, the date could possibly be extracted more reliably using:

Middle ( Filename ; Position ( Filename ; " - " ; Length ( Filename ) ; -1 ) + 3 ; 10 )

As for the email, if we could assume that the email you want contains the last @ character in the entire string AND that the email is surrounded either by spaces or by angle brackets, then we could do something with that. But I am purely guessing at this point.

kcep · April 17, 2023

It looks like your latest cDate is EXTREMELY accurate. 🙂 🙂

Your latest thoughts concern the cEmail should nail it ...

The email address we want to parse contains the last@ character in the entire string AND that email address is surrounded either by spaces or by angle brackets.

Thanks!

comment · April 17, 2023

Ok, so try:

Let ( [
mask = Substitute ( Filename ; [ "<" ; " " ] ; [ ">" ; " " ] ) ;
at = Position ( mask ; "@" ; 1 ; PatternCount ( Filename ; "@" ) ) ;
start = Position ( mask ; " " ; at ; -1 ) ;
end = Position ( mask ; " " ; at ; 1 )
] ;
Trim ( Middle ( mask ; start ; end - start ) )
)

kcep · April 17, 2023

Unbelievable ... and SO fast !

You're a genius.

I am VERY grateful 🙂

Sign In

Container field: How to insert all files from a folder.

Recommended Posts

kcep

comment

kcep

comment

kcep

comment

kcep

comment

kcep

comment

kcep

comment

kcep

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information