SCRIBE File As Text and Unicode

March 26, 201511 yr

Hi,

I use a lot of 360works plugins happily but I have found something I can't figure out with the Scribe FileAsText function. Here's the set-up:

FM 12 Server Advanced -Middle Eastern version from winSoft (I need Arabic text among others for this db)

FM 12 Pro Advanced -Middle Eastern version from winSoft (I need Arabic text among others for this db)

IWP -hosted database on a Windows 2008 Server; SuperContainer is installed on this server. SC is working fine.

text fields are set for Unicode storage.

I use SuperContainer to browse and upload a file with Russian text.

I use a script that contains error trapping and the two essential functions:

#extract the text into a variable

Set Variable $text; Value: ScribeFileAsText(uploadedFile)

#put the extracted text into the text field with Unicode storage

Set Field (unicodeTextField; $text)

When I run this script to "extract the text," I get no error. Instead, I get gobbledy gook: it's got what looks like some half-width dread Unicode boxes interspersed with numbers, Roman letters, question marks and other alphanumeric symbols: it is just a mess .... hexcode? it is certainly something I've never seen before.

However, if I copy and paste the Russian text into the unicodeTextField, it looks fine --good Russian Cyrillic text. So that makes me think that the function ScribeFileAsText is not working with Unicode?

I have also tested this with Chinese --again, big problem: the extracted text looks like half-width Unicode boxes, not good Chinese ideographs. This also happens when I paste in the text from a Word docx.

I have also tested this with Korean --no problem: the extracted text is good Korean hangul.

Any ideas how to fix the Scribe output of FileAsText?

Are there settings on the FM server for special fonts that I need to set? Or on the Windows server? I use a Toshiba laptop to interface with the server and it can show all the writing systems I've mentioned just fine in Word docx (Roman, Cyrillic, Chinese and Korean).

Many thanks,

Carole

March 26, 201511 yr

Hi Carole,

It's possible that the source file is not unicode but rather another type of encoding and we are reading it as unicode hence the jumbled return you are seeing. Can you send us an example file so that I can test from here? If you can, you can upload the file here 360works.com/upload .

March 26, 201511 yr

Author

Thanks, Ryan! I made sure that the Russian document was encoded in Unicode (not UTF-8, just plain old Unicode on the Windows options) and SCRIBE worked to extract the text.

I am still working on the Chinese document--this is a docx document, but it still is coming through as half-width boxes.

Is there a way to error trap this problem --check the encoding of a file before extracting text?

Thanks,

Carole

March 27, 201511 yr

Carole,

I wouldn't suspect that Scribe is going to set any errors as you are still getting a return, albeit not a very useful one. However, you can use Scribe last error to see if anything is being set.

Docx files are a little more complicated. They have a lot going on "under the hood" compared to just a simple .txt file. I know that you can set the encoding of a docx file when you save it but I believe if its not specified it is unicode by default. Check to make sure the Chinese document is indeed saved in unicode. You can also set the encoding when you open a file. Consult this link for more details . It may be that when you open the file Word changes the encoding to properly display the text.