Extract Email address from text field

rlinsurf · August 17, 2003

Hi--

I have gotten this far in trying to extract an email address wherever it may be found in a text field:

If(PatternCount(Answer, "@") (does not equal char.) 0, Middle(Answer,Position(Answer, "@", 1,1), Position(Answer, " ", (Position(Answer, "@", 1,1)) ,1) -(Position(Answer, "@", 1,1))) , "")

This gets everything from the @ to the next space in the text...but I can't figure out how to get the first part of the address.

Thanks!

All My Best,

Jeffrey

CobaltSky · August 17, 2003

Hello Jeffrey,

To reliably parse out an email address from a text field, regardless of where in the text the address occurs, and irrespective of the format of the address (ie as long as it is a valid email address), I suggest you use:

Case(

PatternCount(Substitute(Substitute(SourceData, "@ ", "

rogermax · August 22, 2003

Ray, you come up with the greatest calcs and tips. I would like to understand this calc better. If my question seems ignorant to others, please excuse me.

In a couple of spots there appears to be bullets, one with a space after and one with a space before, (correct?), why?

And what is the significance of the string "ZGXBYJZFXBYRZPXQY" ?

CobaltSky · August 22, 2003

Hello Roger,

Thanks for the compliment. -And BTW I don't imagine folks will think your question ignorant - just curious. And curiosity ranks high among the known antidotes for ignorance.

As regards the calc, yes it has bullets.

The bullets are being used as a substitution character, and are inserted in place of occurrences of the @ symbol that are either preceded or followed by a space, prior to testing for the presence of (and location of) an email address in the supplied text string. This is, of course, because the @ symbol can be used in contexts other than an email address and such uses will cause the calculation to return unintended results if they are not 'filtered out' before the calc is applied.

Since all or nearly all uses of @ - other than in email addreses - are either preceded or followed (or both) by a space, searching for and substituting all occurrences of "@ " and " @" with a different character - preferably one that is inert from the point of view of an email address (eg a bullet) effectively deals with this threat. And since valid email addresses cannot include spaces, the method greatly improves the robustness of the technique while posing no risk.

Meanwhile, the inclusion of "ZGXBYJZFXBYRZPXQY" is also as a substitution - this time for the underscore character. It is required in this formula because the underscore is acceptable in email addresses but is generally treated as a word separator, so it would therefore cause the use of the MiddleWords( ) function to break over addresses which included underscores. Substitution before parsing (and its reversal afterwards) ensures that this does not occur. For this purpose I simply included a string of characters which do not count as word separators, but which are extraordinarily unlikely to ever be found within a legitimate email address. I thought that ZGXBYJZFXBYRZPXQY was as good as anything for this purpose, but you could use any of a few trillion other highly obscure strings of nonsense just as effectively.

In fact, the choice of both 'ZGXBYJZFXBYRZPXQY' and '

jeffer · August 22, 2003

Case(
PatternCount(Substitute(Substitute(SourceData, "@ ", "

CobaltSky · August 22, 2003

Greetz to you also Jeffer,

Actually, I was of two minds about posting on this subject. Not because I mind sharing with forum colleagues, but because I have a horror of inadvertently lending aid to would-be spammers.

However I guess I take the view that despite the awfulness of junk email, there are probably more legitimate uses than corrupt ones to which technology of this type can be put. Actually, it's because of qualms of this kind that I have not previously posted the attached demo anywhere, so it has been mouldering on my hard drive instead. However on reflection, it is a dreadful pity if the 'self-serving few' spoil it for everyone else who is doing the right thing, so I've decided to place my faith in the FMForums community.

So... attached is a demo file that shows how the technique I previously posted can be very simply applied to the task of retrieving all the email addresses in a block of up to 64k of text (or more if you set it to loop through multiple fields) and listing all unique occurrences of them in alphabetical order in a separate field.

Please use it wisely and thoughtfully.

EmailExtractor.zip

jeffer · August 22, 2003

Hi Ray,

Thnx for the demo file! It's a cool way to extract email adresses from text. And don't worry...i am in no way a spammer.

I am just a Filemaker Developer who likes to break his mind on a nice formula so once in a while:)

so...thnx again.

greetz,

Jeffer

rogermax · August 22, 2003

Thanks again Ray. Once again I appreciate your willingness to share and educate.

AlexandervB · September 3, 2003

@ CobaltSky,

Really nice formula :-) Works flawlessly in FMP4.1 but doesn't want to work in FMP 5.5/6.0. As far as my experience goes I've tried all sorts of things but maybe somebody here has an idea. Thanks so far!

Alex

CobaltSky · September 3, 2003

AlexandervB said:
Works flawlessly in FMP4.1 but doesn't want to work in FMP 5.5/6.0...

Strange you should say that, Alex, as the formula was written in 6.0 and tested in 6.0v4, 5.5v2 and 5.0v3. Moreover the demo file that I also posted to this thread (see above) uses a slight variation of the same formula - and it too was tested in both 5.5 and 6.0 and works fine.

That aside, the functions that the calc uses are all standard fare that operate similarly from 4.0 through to 6.0, so if you have got it working in 4.1 there should be no impediment to you implementing it with equal success in 5.5 or 6.0.

AlexandervB · September 4, 2003

@CobaltSky

Well when using your excellent demo file everything works fine, but as soon as I open the calc and try to change things I get some weird errors when trying to save the calc. Also when just cutting and pasting the formula into a new DB it becomes unusable. I've tried pasting it in a 4.1 DB and it works fine over there. As soon as I try to paste the formula in a 5.5 or 6.0 DB it won't work again... Opening the 4.1 file in 6.0 leaves the calc working but I can't edit it... Very very strange...

BTW Using FileMakerPro 6.0v4 on Mac OS X 10.2.6... Hmm wondering if the problem might be in the difference between using comma's or semicolons....

CobaltSky · September 4, 2003

AlexandervB said:Hmm wondering if the problem might be in the difference between using comma's or semicolons....

Could be, Alex, Maybe your v4.1 file was created on a machine that was configured for commas but your 5.5 and 6.0 files are using the local Netherlands system formats and require semicolons. That might explain it.

Sign In

Extract Email address from text field

Recommended Posts

rlinsurf

CobaltSky

rogermax

CobaltSky

jeffer

CobaltSky

jeffer

rogermax

AlexandervB

CobaltSky

AlexandervB

CobaltSky

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information