Emma in England Posted May 18, 2007 Posted May 18, 2007 The customer (big UK company, NOT amenable to suggestions) supply data in a very unstructured way. Basically, I need to look at a field like this: 513/3831 Acoustic Solutions 256MB Pink MP3 Player £17.99-£8.99 save £9 513/3161 Silver 5133178 Pink Philips Silver MP3 Player £29.99-£14.99 half price and extract catalogue numbers, i.e. 513/3831, 513/3161, 513/3178 - note that they don't always put in the '/'! I have been doing this successfully using AppleScript with Satimage's OSAX. But the main user of this system is on a PC (though she'd love a Mac!), and I have read David Kachel's excellent White Paper and can see that if I can do this within FM, I should. Any pointers? Is there an array structure in Filemaker that I could set to {1,2,3,4,5,6,7,8,9,0}? (Sorry if this is a very dumb question, but I'm still waiting for a fat text book to make its way across the Atlantic).
Robert Kidd Posted May 18, 2007 Posted May 18, 2007 Although you say it's unstructured would the catalogue number always be 3 characters followed by 4 characters ( with or without the / )? Does the number always start a new line ? I would be looking to create 1 record in a temp table for the first 8 characters of each new line in your field,substitute out the "/" if it's there, and see if your left with just a number.
Emma in England Posted May 18, 2007 Author Posted May 18, 2007 Solved my own problem, though my head is aching badly now! I've written a script which splits the field into values, filters them and then checks each word - if it has a full stop (period), it's a price and is ignored. Otherwise, if it has 7 digits, it's a catalogue number. The worst line looks like this: Position(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1);".";1;1)=0 and Length(MiddleWords(Filter(GetValue ( GREP::Text to GREP ; $x );"1234567890 .");$y;1))=7 which scares me. Style tips welcome! And now I've got to transfer it into the actual database.
Emma in England Posted May 18, 2007 Author Posted May 18, 2007 Nice try, Robert, but I'm afraid not - sometimes they just type a list of products all on one line! I seem to have double-posted my question, for which apologies. I have developed a tortuous calculation which solves my problem - see other post!
Mikhail Edoshin Posted May 18, 2007 Posted May 18, 2007 See if this helps. This is one of Shaun Flisakowski's files. Regex.fp7.zip
Emma in England Posted May 18, 2007 Author Posted May 18, 2007 Unfortunately I don't have Filemaker Advanced! But feel quite proud to have solved it anyway. Thanks for the file - I'll keep it for future reference.
Robert Kidd Posted May 18, 2007 Posted May 18, 2007 Ah it may be ok, the hyphen splits it into 2 words.
comment Posted May 18, 2007 Posted May 18, 2007 I wonder why you need to deal with values, if you're going to check each individual word anyway. And of course, as Robert points out, any word with 7 digits is a potential false positive - "18/5/2007" for example.
Emma in England Posted May 21, 2007 Author Posted May 21, 2007 Hi Comment, I have to deal with values otherwise I get false catalogue numbers appearing across line breaks save £9 513/3161 Silver becomes save 95133161 - and the 9 confuses everything! Pity as it added a layer of repeats, but there you go. You're quite right about the other dangers. However, I've checked two publications' worth of files, and these sort of combinations do not occur. It is possible, and then a false record will be created, but as it will not find any associated text or price details it will get spotted and can be deleted. To be honest, that's the least of my worries right now!
comment Posted May 21, 2007 Posted May 21, 2007 Line breaks are also word delimiters. Using your example above, MiddleWords ( yourtext ; 3 ; 1 ) should return "513/3161". If that's not the case, you have a VERY strange line-break character. I don't think it's very likely, but even if so, you could substitute it for another character that WILL break words.
Emma in England Posted May 21, 2007 Author Posted May 21, 2007 I'd begun to forget what I'd done... OK: the first calculation on the field is the 'filter': Filter ( text ; "1234567890 ." ) - and that lost all the line endings. But I've just had a play with it and realise I can add a return character to my filter, which should mean I can ignore values and just do words! Thanks! (disappears for about 3 hours to work out what on earth those calculations were and struggle with opening and closing parentheses....)
comment Posted May 21, 2007 Posted May 21, 2007 The order, I think, should be this: 1. Extract the current word (MiddleWords); 2. Examine the word (Filter and PatternCount); 3. If passed, create a record; 4. On to the next word.
Emma in England Posted May 21, 2007 Author Posted May 21, 2007 Thanks, that was the clear thinking I needed. I've tidied the script up considerably and eliminated a whole variable. (I love variables)
Recommended Posts
This topic is 6457 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now