Jump to content

Data extraction

This topic is 6514 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Please help, I have for example the following data in a field, here are two examples:

BHC Components Ltd

20-21 Cumberland Drive

Granby Industrial Estate


DT4 9TE (Road Map)


Tel: 01305 782871

Fax: 01305 760670


Reveal Ltd

34 High Street


RG19 3JD (Road Map)


Tel: 01635 873374

Fax: 01635 873375

Internet: www.reveal.co.uk

I want each one of those lines extracted and put into a different field ie company name, address line1, address line2, phone fax etc etc. I have managed to get the phone number out easliy as it is between tel: and Fax:

My problem is that as you can see each record does not have the same amount of lines in as the address for each company is sometimes 3 4 or 5 lines.

Constants in the records are:

Poscode always has Road Map to the right of it

Tel: and Fax: before each of the numbers

www. in a web address

Un-constants are:

Website addresses sometimes have Internet: infront of them

Addresses are sometimes either 3 4 or 5 lines

I have my actual database if you would like to see that to help.

Thankyou very much for your time


Link to comment
Share on other sites

Hi harry,

Others may have some quick calculations or scripts to do this. However, if this was a text file before you imported it, I would prefer to examine it to see if there isn't a better way to separate the file prior to import.


Link to comment
Share on other sites

AudioFreak said:


This thread is looking similiar smirk.gif Looks like possibly another BBEDIT fan.


Very good Michael, you are getting to be verrrrrrrrry Clairvoyant wink.gif

However, if Harry is a Windows user, or it the sample file "example.csv" hasn't been changed prior to what he posted. Then I will really have my hands full trying to help him.

I know that there is a Windows equivalent of BBEdit, so I can work through that, but the sample shows a lot of inconsistencies in the records and even between them.

Lee confused.gif

Version: v6.x

Platform: Mac OS 9

Link to comment
Share on other sites

This is a hard one. It looks like fairly well organized info, but because of the different number of lines, it's difficult to get it into a consistent format. I used BBEdit, with grep. EditPad Pro on PC should work.

The first task was to separate it a bit more, using the quotes, adding returns between "records." Easy enough. The 2nd task was to get rid of all the extra info at the bottom. Not so easy. I managed eventually, by finding either the "Internet: " line or the "www." line. First I changed the "field" separator to Tab. This is the "grep" for the "www" address:





That's about as far as I can take it right now. The whole job would require at least a couple hours (more really) of rather tedious work; I can't justify it right now just for fun.

I believe the top part would require a looping script in FileMaker, to move the data from field to field, after counting which field the darn "(road map)" was in.

It is doable however.


Link to comment
Share on other sites

No, yours is better. Mine was more "cleaning the data" than "creating a tool to clean the data." I guess you were able to drop the junk 'cause it was far enough down the line.

The data is still not quite done though. In some records (road map) ends up in Address3, but on most it's in Address4.

It should not be too hard to create a loop to test Address3 for the map, then move the fields 3 and 4 down one field if it's there, using a global to hold. In every one of those, Address5 is empty, so they could be moved.

It would also mean first importing into a file where the fields are just text.

Link to comment
Share on other sites

Corrected to fit the Zip Code contraint.

There are 5 adress fields and a Postal Code field has been added.

In order to have your adress fields ready for another import into the definitive file, another set of Definitive calcs would parse the Postal Code out of the Adress Fields.

I played with several of your datas, and it appears to be working AFAICS

I'd use this as an intermediary file, import the csv data, and then import it back to the new file.



Link to comment
Share on other sites

Hi Fenton,

Actually, I hadn't see the need for the Postal Code in my earlier solution.

You're right that cleaning the data would surely be safer anyway.

Now, I'm unsure what should stay in Field 1 to 5, as some appear to be Cities, other States and even Countries.

Wait and see smirk.gif

Link to comment
Share on other sites

Well done. I don't know if it's a postal code, or map coordinates. But it's on its own in either case.

Yes, the other address lines are not going to be able to be handled by calculations. From what I could see the only way to know which was which would be to have a couple large lists, of cities and counties (or whatever those are). Then you could use relationships to see what was what.

Link to comment
Share on other sites

Hi guys, sounds like you are all having fun with this. Although I must admit you are starting to lose me as I read through things.

Answers to a couple of questions, yess I am a PC user, and that is exactly how the text comes to me. I have used Internet Macros to retrieve info from a website, I then need to copy all of that info into a filemaker database for my sales guys to use.

That code next to Road Map is a UK Postcode

I do not mind how the address fields are laid out as long as I have an address for each of them for example Line 3 can contain either a county, city or country.

Important things are: Company name postcode tel fax and website



Link to comment
Share on other sites

This topic is 6514 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...

Important Information

By using this site, you agree to our Terms of Use.