Importing UTF-8 records in scripts

milefaker · May 20, 2009

When importing records from a UTF-8 file (which may have extension .u8 or .txt), I have no trouble getting FMP to recognize and display the contents of the records correctly when I do the import manually.

But when I do the step as part of a script, for some reason the character set reverts to "Macintosh" every time I run the script. What ends up getting imported is therefore gobbledygook.

I am

specifying the data source as a file (have tried ending filename with extensions .u8 and .txt, to no avail)
checking "add new records" when specifying the import order
peforming without dialog.

If I perform with dialog, I can manually set the character set to UTF-8, and it all works fine. In that case, it doesn't matter whether the import file has extension .txt or .u8 . But I'd like to be able to run the whole script without dialog, and I don't understand why the character set doesn't stay set the way I have set it. When I view the steps of the script in the edit window, this function is summarized as:

Import Records ["searchforme.txt"; Add; UTF-8]

So it seems that the character set is being recorded as UTF-8, but for some reason when the script is actually being run without dialog, the character set is interpreted as being "Macintosh". Have I forgotten something? Or is this a bug?

Gracious thanks for any suggestions!

comment · May 21, 2009

I have tried this, and I cannot reproduce the problem. Perhaps you should post some files that illustrate the issue.

milefaker · May 21, 2009

I should have added that the text involved is Chinese. I don't imagine there would be a problem for the lowest Unicode planes.

As an illustration, please download this zipped directory of five files, which contains:

exhibit_UTF-8_import_problem.fp7 (a simple db with two scripts, one of which imports text without dialog and one of which imports text with dialog)
清白.txt (a UTF-8-encoded text file with a few lines of Chinese text - for you to experiment importing)
清白.u8 (the same file as 清白.txt , but with a different extension)
screenshot_if_imported_without_dialog.tiff (screenshot of the garbled text that gets imported if the script is run "without dialog")
screenshot_if_imported_*with*_dialog_and_charset_adjusted_manually.tiff (screenshot of what the imported text looks like if you adjust the character set manually)

To recapitulate, using the script that proceeds without dialog leads to importing garbled text, even though "UTF-8" is specified in the script. If the character set is set to UTF-8 by manual intervention (either with a script or from the menu), there is no such garbling.

Thanks for any light you may be able to shed on this problem.

comment · May 21, 2009

The bad news: I have managed to reproduce the problem using version 9.

The good news: It works fine in version 10.

Now here's the interesting part: if I change the file's encoding to UTF-8 (instead of UTF-8, no BOM), it works fine in version 9 too. Not only that, but once this file is selected in the script step, the charset is automatically set to Unicode and cannot be changed.

I am attaching the modified file.

NewFile.zip

milefaker · May 21, 2009

Ah, that's very interesting! I see the BOM as the first character of the file. Works like a dream.

I'm glad to hear that even without a BOM, FMPv10 will do this correctly. For now, however, I've set my BBEdit preferences to save everything as UTF-8 with BOM.

Am now deleting the files I had placed on-line.

Thanks for your labor!

Sign In

Importing UTF-8 records in scripts

Recommended Posts

milefaker

comment

milefaker

comment

milefaker

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information