How to import records from XML file > 10 GB

import xml

Followers

April 7, 201411 yr

How I can import XML file that is about 12 GB.

I have an error after about 4 hours of parsing file, with no text in error dialog window.

April 7, 201411 yr

I would probably run that file through some pre-processing first outside of FM. Is it the sheer # of records that makes for that big an XML file?

April 7, 201411 yr

Author

Oh, it's Government Federal State Address database, that gives open source in XML, DBF and Cladr russian database format. Cladr impossible to use in FM, DBF has incorrect CP1251 coding in import of russian letter. And I have only a chance to import from XML.

Thank you. May be you will advice how and where I can do pre-processing outside of FM for easiest way?

April 7, 201411 yr

Pre-processing kinda depends on what you are comfortable with. For instance you could create a VBscript / Powershell / AppleScript / shell script that fixes the CP1251 coding in the DBF format, or that parses the XML into a CSV for import into FM, or goes through the XML and deletes any nodes that you do not need but preserve the XML otherwise...

Something along those lines.

April 7, 201411 yr

Author

Pre-processing kinda depends on what you are comfortable with. For instance you could create a VBscript / Powershell / AppleScript / shell script that fixes the CP1251 coding in the DBF format, or that parses the XML into a CSV for import into FM, or goes through the XML and deletes any nodes that you do not need but preserve the XML otherwise...

Something along those lines.

I think now I need to learn applescript... I don't imagine how script fixes cp1251 coding. How I understanding to convert file from xml into csv better to use replace text function. Is it right?

April 7, 201411 yr

How I can import XML file that is about 12 GB.

I have an error after about 4 hours of parsing file, with no text in error dialog window.

It sounds like you have run out of memory. Your best way to proceed, IMHO, would be to split the file into smaller chunks. See for example:

http://fmforums.com/forum/topic/90048-import-plain-text-to-fm/?p=418221

Note, however, that splitting an XML file is not as trivial as splitting a .csv file, for example.

---

DBF has incorrect CP1251 coding in import of russian letter.

This would be a lot easier to answer if we knew what is the exact problem with the encoding. Did you manage to import the data? Perhaps all you need to do is a series of character substitutions (in Filemaker, after importing)?

Edited April 8, 201411 yr by comment

April 9, 201411 yr

Author

Yes it is easier to import from dbf, but there are 30 bilion records to do subtitutiones for a characters. And It's difficult to find all different characters in db