Jump to content

Importing badly formed data


This topic is 5770 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • Newbies

Hi all,

I've got a problem.

I'm trying to migrate from a contact management solution with Maximizer 97is to Filemaker 7.

The problem is, Maximizer exports it's records in a very broken way.

If I pared it down to just a few fields the output would look like this ([CR] and [LF] denote the control codes):P

"ID", "First", "Last", "Notes"[CR][LF]

"12345", "John", "Smith", "January 1, 1900 12:00AM Customer called, blah blah blah. [CR][LF]

January 2, 1900 2:32PM Customer called again still waiting for answer."[CR][LF]

"12346", "Jane", "Smith", "Feb 2, 1954 4:23PM Equipment not working -- schedule service call."[CR][LF]

As you can see this is horribly, horribly broken. Almost any program that imports comma-delimited data sees every [CR][LF] as an end-of-row indicator, so very bad things happen when I try and import this.

Does anyone have any suggestion of how I could co about fixing this badly munged file so that I end up with something importable into Filemaker? Ideally i'd like to preseve the linebreaks inside the notes field some way (perhaps by replacing it with another character, doesn't matter to me) so I can split the field apart later into records for each note (in a seperate linked table obviously).

EDIT: If I can avoid writing custom code to parse these files it would be nice.

Edited by Guest
Link to comment
Share on other sites

Almost any program that imports comma-delimited data sees every [CR][LF] as an end-of-row indicator

I don't think so. Carriage returns INSIDE quotes should be interpreted as line breaks in field contents. Carriage returns OUTSIDE quotes are record delimiters. What you are showing us seems to be a properly formatted .csv file. Filemaker should be able to import this without a hitch.

Link to comment
Share on other sites

  • Newbies

Unless I am misinterpreting you, it appears to be what you describe. Here is an actual piece of data, trimmed to the header and one customer record (personal info omitted).

"IDentification","First Name","Last Name","Notes (All)"

"000814043832045904601C","[censored]","[censored]","November 17, 2005 12:58 PM Customer inquired about winterizing spa. I left a manual in the pickup room for him and information about Pool antifreeze pricing (we sell) and left message on his phone with these details. He will be in later to pick up. [censored] Manual and antifreeze picked up.

November 17, 2005 12:56 PM [censored] and [censored] have moved. New Homeowner [censored]. Maureen

December 18, 2004 3:53 PM Cash/Carry - 4"-3" spa cover for Hydropool 575/ 36" split built Sept. 27/2000

November 30, 2001 12:36 PM Vitron Filter "O" Ring for Hydropool Spa.

October 1, 2001 9:27 AM Repair kit for Seaspray 575.

December 14, 2000 3:26 PM Need to replace control board on spa. [censored]

Note: ordered Dec14/00. HPL-100/5HYDRO #51686 Warranty. [censored]

December 5, 2000 12:08 PM Replace ozonator under warranty. [censored]"

The CRLFs are indeed inside the quotes.. but filemaker chokes on the first one it hits (in this file it would start a new record at "17, 2005 12:58PM") and throw the rest of the data totally out of whack.

Link to comment
Share on other sites

  • Newbies

.. ah.

not sure how i missed something so obvious.

Unfortunately, I can't make the source program spit out well-formed comma-separted files in the first place, so any suggestion on how I can fix this? I'm hoping i can avoid having to write something that attempts to parse the file and match up quotes to figure out which are the record separators and which are unescaped quotes inside..

For what it's worth, I can get tab-delimited output out of program too but I don't think that gets along with multiline fields at all.. and off the top of my head I can't think of a way to preprocess it to convert the newlines inside each field to the group separator character since there isn't anything to mark the start of a row like in a comma-seperated file (.. where you could do something like search and replace to turn [CRLF]" into ROWSEPARATOR, then turn remaining [CRLF] into the group separator character, then ROWSEPARATOR back into [CRLF]")

Link to comment
Share on other sites

I too think this has better be fixed BEFORE importing.

I think I would try something like this:

1. Replace [color:purple]"," with some code, let's say &TAB;

2. Replace [color:purple]"CRLF" with another code, say &RETURN;

3. Replace all remaining quotes with [color:purple]"" (i.e. two quotes for each one);

4 & 5 = the reverse of 1 & 2.

It's not 100% fool-proof, but it should be good enough for a one-time conversion - any exceptions should be few enough to fix manually.

Link to comment
Share on other sites

This topic is 5770 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...

Important Information

By using this site, you agree to our Terms of Use.