Importing badly formed data

iantri · December 10, 2005

Hi all,

I've got a problem.

I'm trying to migrate from a contact management solution with Maximizer 97is to Filemaker 7.

The problem is, Maximizer exports it's records in a very broken way.

If I pared it down to just a few fields the output would look like this ([CR] and [LF] denote the control codes)

"ID", "First", "Last", "Notes"[CR][LF]

"12345", "John", "Smith", "January 1, 1900 12:00AM Customer called, blah blah blah. [CR][LF]

January 2, 1900 2:32PM Customer called again still waiting for answer."[CR][LF]

"12346", "Jane", "Smith", "Feb 2, 1954 4:23PM Equipment not working -- schedule service call."[CR][LF]

As you can see this is horribly, horribly broken. Almost any program that imports comma-delimited data sees every [CR][LF] as an end-of-row indicator, so very bad things happen when I try and import this.

Does anyone have any suggestion of how I could co about fixing this badly munged file so that I end up with something importable into Filemaker? Ideally i'd like to preseve the linebreaks inside the notes field some way (perhaps by replacing it with another character, doesn't matter to me) so I can split the field apart later into records for each note (in a seperate linked table obviously).

EDIT: If I can avoid writing custom code to parse these files it would be nice.

Edited December 10, 2005 by Guest

comment · December 10, 2005

Almost any program that imports comma-delimited data sees every [CR][LF] as an end-of-row indicator

I don't think so. Carriage returns INSIDE quotes should be interpreted as line breaks in field contents. Carriage returns OUTSIDE quotes are record delimiters. What you are showing us seems to be a properly formatted .csv file. Filemaker should be able to import this without a hitch.

iantri · December 10, 2005

Unless I am misinterpreting you, it appears to be what you describe. Here is an actual piece of data, trimmed to the header and one customer record (personal info omitted).

"IDentification","First Name","Last Name","Notes (All)"

"000814043832045904601C","[censored]","[censored]","November 17, 2005 12:58 PM Customer inquired about winterizing spa. I left a manual in the pickup room for him and information about Pool antifreeze pricing (we sell) and left message on his phone with these details. He will be in later to pick up. [censored] Manual and antifreeze picked up.

November 17, 2005 12:56 PM [censored] and [censored] have moved. New Homeowner [censored]. Maureen

December 18, 2004 3:53 PM Cash/Carry - 4"-3" spa cover for Hydropool 575/ 36" split built Sept. 27/2000

November 30, 2001 12:36 PM Vitron Filter "O" Ring for Hydropool Spa.

October 1, 2001 9:27 AM Repair kit for Seaspray 575.

December 14, 2000 3:26 PM Need to replace control board on spa. [censored]

Note: ordered Dec14/00. HPL-100/5HYDRO #51686 Warranty. [censored]

December 5, 2000 12:08 PM Replace ozonator under warranty. [censored]"

The CRLFs are indeed inside the quotes.. but filemaker chokes on the first one it hits (in this file it would start a new record at "17, 2005 12:58PM") and throw the rest of the data totally out of whack.

comment · December 10, 2005

You have quotes in your content - these must be escaped.

iantri · December 10, 2005

.. ah.

not sure how i missed something so obvious.

Unfortunately, I can't make the source program spit out well-formed comma-separted files in the first place, so any suggestion on how I can fix this? I'm hoping i can avoid having to write something that attempts to parse the file and match up quotes to figure out which are the record separators and which are unescaped quotes inside..

For what it's worth, I can get tab-delimited output out of program too but I don't think that gets along with multiline fields at all.. and off the top of my head I can't think of a way to preprocess it to convert the newlines inside each field to the group separator character since there isn't anything to mark the start of a row like in a comma-seperated file (.. where you could do something like search and replace to turn [CRLF]" into ROWSEPARATOR, then turn remaining [CRLF] into the group separator character, then ROWSEPARATOR back into [CRLF]")

comment · December 10, 2005

I too think this has better be fixed BEFORE importing.

I think I would try something like this:

1. Replace [color:purple]"," with some code, let's say &TAB;

2. Replace [color:purple]"CRLF" with another code, say &RETURN;

3. Replace all remaining quotes with [color:purple]"" (i.e. two quotes for each one);

4 & 5 = the reverse of 1 & 2.

It's not 100% fool-proof, but it should be good enough for a one-time conversion - any exceptions should be few enough to fix manually.

iantri · December 10, 2005

nice.. i like it

I'll try that out later.

Thanks

Sign In

Importing badly formed data

Recommended Posts

iantri

comment

iantri

comment

iantri

comment

iantri

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information