Need GREP ability

Emma in England · May 18, 2007

The customer (big UK company, NOT amenable to suggestions) supply data in a very unstructured way. Basically, I need to look at a field like this:

513/3831
Acoustic Solutions 256MB Pink MP3 Player

£17.99-£8.99

save £9

513/3161 Silver

5133178 Pink

Philips Silver MP3 Player

£29.99-£14.99

half price

and extract catalogue numbers, i.e. 513/3831, 513/3161, 513/3178 - note that they don't always put in the '/'!

I have been doing this successfully using AppleScript with Satimage's OSAX. But the main user of this system is on a PC (though she'd love a Mac!), and I have read David Kachel's excellent White Paper and can see that if I can do this within FM, I should.

Any pointers? Is there an array structure in Filemaker that I could set to {1,2,3,4,5,6,7,8,9,0}? (Sorry if this is a very dumb question, but I'm still waiting for a fat text book to make its way across the Atlantic).

Robert Kidd · May 18, 2007

Although you say it's unstructured would the catalogue number always be 3 characters followed by 4 characters ( with or without the / )?

Does the number always start a new line ?

I would be looking to create 1 record in a temp table for the first 8 characters of each new line in your field,substitute out the "/" if it's there, and see if your left with just a number.

Emma in England · May 18, 2007

Solved my own problem, though my head is aching badly now! I've written a script which splits the field into values, filters them and then checks each word - if it has a full stop (period), it's a price and is ignored. Otherwise, if it has 7 digits, it's a catalogue number.

The worst line looks like this:

Position(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1);".";1;1)=0 and

Length(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1))=7

which scares me. Style tips welcome!

And now I've got to transfer it into the actual database.

Emma in England · May 18, 2007

Nice try, Robert, but I'm afraid not - sometimes they just type a list of products all on one line!

I seem to have double-posted my question, for which apologies. I have developed a tortuous calculation which solves my problem - see other post!

Mikhail Edoshin · May 18, 2007

See if this helps. This is one of Shaun Flisakowski's files.

Regex.fp7.zip

Emma in England · May 18, 2007

Unfortunately I don't have Filemaker Advanced! But feel quite proud to have solved it anyway. Thanks for the file - I'll keep it for future reference.

Robert Kidd · May 18, 2007

What about "£1234-£999" ? No spaces, no "."

Robert Kidd · May 18, 2007

Ah it may be ok, the hyphen splits it into 2 words.

comment · May 18, 2007

I wonder why you need to deal with values, if you're going to check each individual word anyway. And of course, as Robert points out, any word with 7 digits is a potential false positive - "18/5/2007" for example.

Emma in England · May 21, 2007

Hi Comment,

I have to deal with values otherwise I get false catalogue numbers appearing across line breaks

save £9

513/3161 Silver

becomes

save 95133161

- and the 9 confuses everything! Pity as it added a layer of repeats, but there you go.

You're quite right about the other dangers. However, I've checked two publications' worth of files, and these sort of combinations do not occur. It is possible, and then a false record will be created, but as it will not find any associated text or price details it will get spotted and can be deleted. To be honest, that's the least of my worries right now!

comment · May 21, 2007

Line breaks are also word delimiters. Using your example above, MiddleWords ( yourtext ; 3 ; 1 ) should return "513/3161".

If that's not the case, you have a VERY strange line-break character. I don't think it's very likely, but even if so, you could substitute it for another character that WILL break words.

Emma in England · May 21, 2007

I'd begun to forget what I'd done... OK: the first calculation on the field is the 'filter':

Filter ( text ; "1234567890 ." )

- and that lost all the line endings. But I've just had a play with it and realise I can add a return character to my filter, which should mean I can ignore values and just do words! Thanks! (disappears for about 3 hours to work out what on earth those calculations were and struggle with opening and closing parentheses....)

comment · May 21, 2007

The order, I think, should be this:

1. Extract the current word (MiddleWords);

2. Examine the word (Filter and PatternCount);

3. If passed, create a record;

4. On to the next word.

Emma in England · May 21, 2007

Thanks, that was the clear thinking I needed. I've tidied the script up considerably and eliminated a whole variable. (I love variables)

Sign In

Need GREP ability

Recommended Posts

Emma in England

Robert Kidd

Emma in England

Emma in England

Mikhail Edoshin

Emma in England

Robert Kidd

Robert Kidd

comment

Emma in England

comment

Emma in England

comment

Emma in England

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Explore

Affiliate Forums

Activity

Important Information