Jump to content

SCRIBE File As Text and Unicode


This topic is 1622 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Hi,

I use a lot of 360works plugins happily but I have found something I can't figure out with the Scribe FileAsText function. Here's the set-up:

FM 12 Server Advanced -Middle Eastern version from winSoft (I need Arabic text among others for this db)

FM 12 Pro Advanced -Middle Eastern version from winSoft  (I need Arabic text among others for this db)

 

IWP -hosted database on a Windows 2008 Server; SuperContainer is installed on this server. SC is working fine.

 

text fields are set for Unicode storage.

 

I use SuperContainer to browse and upload a file with Russian text.

I use a script that contains error trapping and the two essential functions:

 

#extract the text into a variable

Set Variable $text; Value: ScribeFileAsText(uploadedFile)

#put the extracted text into the text field with Unicode storage

Set Field (unicodeTextField; $text)

 

When I run this script to "extract the text," I get no error. Instead, I get gobbledy gook: it's got what looks like some half-width dread Unicode boxes interspersed with numbers, Roman letters, question marks and other alphanumeric symbols: it is just a mess .... hexcode? it is certainly something I've never seen before. 

 

However, if I copy and paste the Russian text into the unicodeTextField, it looks fine --good Russian Cyrillic text. So that makes me think that the function ScribeFileAsText is not working with Unicode????

 

I have also tested this with Chinese --again, big problem: the extracted text looks like half-width Unicode boxes, not good Chinese ideographs. This also happens when I paste in the text from a Word docx.

 

I have also tested this with Korean --no problem: the extracted text is good Korean hangul.

 

Any ideas how to fix the Scribe output of FileAsText?

 

Are there settings on the FM server for special fonts that I need to set? Or on the Windows server? I use a Toshiba laptop to interface with the server and it can show all the writing systems I've mentioned just fine in Word docx (Roman, Cyrillic, Chinese and Korean).

 

Many thanks,

Carole

 

 

Link to comment
Share on other sites

Hi Carole,

 

It's possible that the source file is not unicode but rather another type of encoding and we are reading it as unicode hence the jumbled return you are seeing. Can you send us an example file so that I can test from here? If you can, you can upload the file here 360works.com/upload . 

Link to comment
Share on other sites

Thanks, Ryan! I made sure that the Russian document was encoded in Unicode (not UTF-8, just plain old Unicode on the Windows options) and SCRIBE worked to extract the text.

 

I am still working on the Chinese document--this is a docx document, but it still is coming through as half-width boxes.

 

Is there a way to error trap this problem --check the encoding of a file before extracting text?

 

Thanks,

Carole

Link to comment
Share on other sites

Carole,
 
I wouldn't suspect that Scribe is going to set any errors as you are still getting a return, albeit not a very useful one. However, you can use Scribe last error to see if anything is being set.
 
Docx files are a little more complicated. They have a lot going on "under the hood" compared to just a simple .txt file. I know that you can set the encoding of a docx file when you save it but I believe if its not specified it is unicode by default. Check to make sure the Chinese document is indeed saved in unicode. You can also set the encoding when you open a file. Consult this link for more details . It may be that when you open the file Word changes the encoding to properly display the text.

Link to comment
Share on other sites

  • 4 years later...

I am having a similar issue - only I don't think word document encoding is to blame. 

When I run 

Scribe FileAsTxt on the .docx file locally, I get the correct readout. 

 

However, the very same directive, on the very same .docx run on a server side script, returns the 

 

PK0????[Content_Types].xml???N?0?_??%.,?j??K@???3i

??=???3N B?m??D?g?9_?2?^[?? &?]??? g??o??W???]q?YB?i???o ????y 1ҺT?b?#DR?2?>??J룕H?q.?T?r?b2??;?f>??@+??]?z??-g?ƜUqm?????]?|?(???7??h%

kind of output. 

 

Is there anything else I am missing? The .docx file is stored in a SuperContainer location. 

Link to comment
Share on other sites

This topic is 1622 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.