milefaker Posted May 20, 2009 Posted May 20, 2009 When importing records from a UTF-8 file (which may have extension .u8 or .txt), I have no trouble getting FMP to recognize and display the contents of the records correctly when I do the import manually. But when I do the step as part of a script, for some reason the character set reverts to "Macintosh" every time I run the script. What ends up getting imported is therefore gobbledygook. I am specifying the data source as a file (have tried ending filename with extensions .u8 and .txt, to no avail) checking "add new records" when specifying the import order peforming without dialog. If I perform with dialog, I can manually set the character set to UTF-8, and it all works fine. In that case, it doesn't matter whether the import file has extension .txt or .u8 . But I'd like to be able to run the whole script without dialog, and I don't understand why the character set doesn't stay set the way I have set it. When I view the steps of the script in the edit window, this function is summarized as: Import Records ["searchforme.txt"; Add; UTF-8] So it seems that the character set is being recorded as UTF-8, but for some reason when the script is actually being run without dialog, the character set is interpreted as being "Macintosh". Have I forgotten something? Or is this a bug? Gracious thanks for any suggestions!
comment Posted May 21, 2009 Posted May 21, 2009 I have tried this, and I cannot reproduce the problem. Perhaps you should post some files that illustrate the issue.
milefaker Posted May 21, 2009 Author Posted May 21, 2009 I should have added that the text involved is Chinese. I don't imagine there would be a problem for the lowest Unicode planes. As an illustration, please download this zipped directory of five files, which contains: exhibit_UTF-8_import_problem.fp7 (a simple db with two scripts, one of which imports text without dialog and one of which imports text with dialog) 清白.txt (a UTF-8-encoded text file with a few lines of Chinese text - for you to experiment importing) 清白.u8 (the same file as 清白.txt , but with a different extension) screenshot_if_imported_without_dialog.tiff (screenshot of the garbled text that gets imported if the script is run "without dialog") screenshot_if_imported_*with*_dialog_and_charset_adjusted_manually.tiff (screenshot of what the imported text looks like if you adjust the character set manually) To recapitulate, using the script that proceeds without dialog leads to importing garbled text, even though "UTF-8" is specified in the script. If the character set is set to UTF-8 by manual intervention (either with a script or from the menu), there is no such garbling. Thanks for any light you may be able to shed on this problem.
comment Posted May 21, 2009 Posted May 21, 2009 The bad news: I have managed to reproduce the problem using version 9. The good news: It works fine in version 10. Now here's the interesting part: if I change the file's encoding to UTF-8 (instead of UTF-8, no BOM), it works fine in version 9 too. Not only that, but once this file is selected in the script step, the charset is automatically set to Unicode and cannot be changed. I am attaching the modified file. NewFile.zip
milefaker Posted May 21, 2009 Author Posted May 21, 2009 Ah, that's very interesting! I see the BOM as the first character of the file. Works like a dream. I'm glad to hear that even without a BOM, FMPv10 will do this correctly. For now, however, I've set my BBEdit preferences to save everything as UTF-8 with BOM. Am now deleting the files I had placed on-line. Thanks for your labor!
Recommended Posts
This topic is 5666 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now