Need GREP ability

May 18, 200719 yr

The customer (big UK company, NOT amenable to suggestions) supply data in a very unstructured way. Basically, I need to look at a field like this:

513/3831
Acoustic Solutions 256MB Pink MP3 Player

£17.99-£8.99

save £9

513/3161 Silver

5133178 Pink

Philips Silver MP3 Player

£29.99-£14.99

half price

and extract catalogue numbers, i.e. 513/3831, 513/3161, 513/3178 - note that they don't always put in the '/'!

I have been doing this successfully using AppleScript with Satimage's OSAX. But the main user of this system is on a PC (though she'd love a Mac!), and I have read David Kachel's excellent White Paper and can see that if I can do this within FM, I should.

Any pointers? Is there an array structure in Filemaker that I could set to {1,2,3,4,5,6,7,8,9,0}? (Sorry if this is a very dumb question, but I'm still waiting for a fat text book to make its way across the Atlantic).

May 18, 200719 yr

Although you say it's unstructured would the catalogue number always be 3 characters followed by 4 characters ( with or without the / )?

Does the number always start a new line ?

I would be looking to create 1 record in a temp table for the first 8 characters of each new line in your field,substitute out the "/" if it's there, and see if your left with just a number.

May 18, 200719 yr

Author

Solved my own problem, though my head is aching badly now! I've written a script which splits the field into values, filters them and then checks each word - if it has a full stop (period), it's a price and is ignored. Otherwise, if it has 7 digits, it's a catalogue number.

The worst line looks like this:

Position(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1);".";1;1)=0 and

Length(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1))=7

which scares me. Style tips welcome!

And now I've got to transfer it into the actual database.

May 18, 200719 yr

Author

Nice try, Robert, but I'm afraid not - sometimes they just type a list of products all on one line!

I seem to have double-posted my question, for which apologies. I have developed a tortuous calculation which solves my problem - see other post!

May 18, 200719 yr

See if this helps. This is one of Shaun Flisakowski's files.

Regex.fp7.zip

May 18, 200719 yr

Author

Unfortunately I don't have Filemaker Advanced! But feel quite proud to have solved it anyway. Thanks for the file - I'll keep it for future reference.

May 18, 200719 yr

What about "£1234-£999" ? No spaces, no "."

May 18, 200719 yr

Ah it may be ok, the hyphen splits it into 2 words.

May 18, 200719 yr

I wonder why you need to deal with values, if you're going to check each individual word anyway. And of course, as Robert points out, any word with 7 digits is a potential false positive - "18/5/2007" for example.

May 21, 200719 yr

Author

Hi Comment,

I have to deal with values otherwise I get false catalogue numbers appearing across line breaks

save £9

513/3161 Silver

becomes

save 95133161

- and the 9 confuses everything! Pity as it added a layer of repeats, but there you go.

You're quite right about the other dangers. However, I've checked two publications' worth of files, and these sort of combinations do not occur. It is possible, and then a false record will be created, but as it will not find any associated text or price details it will get spotted and can be deleted. To be honest, that's the least of my worries right now!

May 21, 200719 yr

Line breaks are also word delimiters. Using your example above, MiddleWords ( yourtext ; 3 ; 1 ) should return "513/3161".

If that's not the case, you have a VERY strange line-break character. I don't think it's very likely, but even if so, you could substitute it for another character that WILL break words.

May 21, 200719 yr

Author

I'd begun to forget what I'd done... OK: the first calculation on the field is the 'filter':

Filter ( text ; "1234567890 ." )

- and that lost all the line endings. But I've just had a play with it and realise I can add a return character to my filter, which should mean I can ignore values and just do words! Thanks! (disappears for about 3 hours to work out what on earth those calculations were and struggle with opening and closing parentheses....)