Jump to content
Server Maintenance This Week. ×

Importing UTF-8 records in scripts


This topic is 5454 days old. Please don't post here. Open a new topic instead.

Recommended Posts

When importing records from a UTF-8 file (which may have extension .u8 or .txt), I have no trouble getting FMP to recognize and display the contents of the records correctly when I do the import manually.

But when I do the step as part of a script, for some reason the character set reverts to "Macintosh" every time I run the script. What ends up getting imported is therefore gobbledygook.

I am

  • specifying the data source as a file (have tried ending filename with extensions .u8 and .txt, to no avail)
  • checking "add new records" when specifying the import order
  • peforming without dialog.

If I perform with dialog, I can manually set the character set to UTF-8, and it all works fine. In that case, it doesn't matter whether the import file has extension .txt or .u8 . But I'd like to be able to run the whole script without dialog, and I don't understand why the character set doesn't stay set the way I have set it. When I view the steps of the script in the edit window, this function is summarized as:

Import Records ["searchforme.txt"; Add; UTF-8]

So it seems that the character set is being recorded as UTF-8, but for some reason when the script is actually being run without dialog, the character set is interpreted as being "Macintosh". Have I forgotten something? Or is this a bug?

Gracious thanks for any suggestions!

Link to comment
Share on other sites

I should have added that the text involved is Chinese. I don't imagine there would be a problem for the lowest Unicode planes.

As an illustration, please download this zipped directory of five files, which contains:

  • exhibit_UTF-8_import_problem.fp7 (a simple db with two scripts, one of which imports text without dialog and one of which imports text with dialog)
  • 清白.txt (a UTF-8-encoded text file with a few lines of Chinese text - for you to experiment importing)
  • 清白.u8 (the same file as 清白.txt , but with a different extension)
  • screenshot_if_imported_without_dialog.tiff (screenshot of the garbled text that gets imported if the script is run "without dialog")
  • screenshot_if_imported_*with*_dialog_and_charset_adjusted_manually.tiff (screenshot of what the imported text looks like if you adjust the character set manually)

To recapitulate, using the script that proceeds without dialog leads to importing garbled text, even though "UTF-8" is specified in the script. If the character set is set to UTF-8 by manual intervention (either with a script or from the menu), there is no such garbling.

Thanks for any light you may be able to shed on this problem.

Link to comment
Share on other sites

The bad news: I have managed to reproduce the problem using version 9.

The good news: It works fine in version 10.

Now here's the interesting part: if I change the file's encoding to UTF-8 (instead of UTF-8, no BOM), it works fine in version 9 too. Not only that, but once this file is selected in the script step, the charset is automatically set to Unicode and cannot be changed.

I am attaching the modified file.

NewFile.zip

Link to comment
Share on other sites

Ah, that's very interesting! I see the BOM as the first character of the file. Works like a dream.

I'm glad to hear that even without a BOM, FMPv10 will do this correctly. For now, however, I've set my BBEdit preferences to save everything as UTF-8 with BOM.

Am now deleting the files I had placed on-line.

Thanks for your labor!

Link to comment
Share on other sites

This topic is 5454 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.