Jump to content
Sign in to follow this  
cchaski

SCRIBE File As Text and Unicode

Recommended Posts

Hi,

I use a lot of 360works plugins happily but I have found something I can't figure out with the Scribe FileAsText function. Here's the set-up:

FM 12 Server Advanced -Middle Eastern version from winSoft (I need Arabic text among others for this db)

FM 12 Pro Advanced -Middle Eastern version from winSoft  (I need Arabic text among others for this db)

 

IWP -hosted database on a Windows 2008 Server; SuperContainer is installed on this server. SC is working fine.

 

text fields are set for Unicode storage.

 

I use SuperContainer to browse and upload a file with Russian text.

I use a script that contains error trapping and the two essential functions:

 

#extract the text into a variable

Set Variable $text; Value: ScribeFileAsText(uploadedFile)

#put the extracted text into the text field with Unicode storage

Set Field (unicodeTextField; $text)

 

When I run this script to "extract the text," I get no error. Instead, I get gobbledy gook: it's got what looks like some half-width dread Unicode boxes interspersed with numbers, Roman letters, question marks and other alphanumeric symbols: it is just a mess .... hexcode? it is certainly something I've never seen before. 

 

However, if I copy and paste the Russian text into the unicodeTextField, it looks fine --good Russian Cyrillic text. So that makes me think that the function ScribeFileAsText is not working with Unicode????

 

I have also tested this with Chinese --again, big problem: the extracted text looks like half-width Unicode boxes, not good Chinese ideographs. This also happens when I paste in the text from a Word docx.

 

I have also tested this with Korean --no problem: the extracted text is good Korean hangul.

 

Any ideas how to fix the Scribe output of FileAsText?

 

Are there settings on the FM server for special fonts that I need to set? Or on the Windows server? I use a Toshiba laptop to interface with the server and it can show all the writing systems I've mentioned just fine in Word docx (Roman, Cyrillic, Chinese and Korean).

 

Many thanks,

Carole

 

 

Share this post


Link to post
Share on other sites

Hi Carole,

 

It's possible that the source file is not unicode but rather another type of encoding and we are reading it as unicode hence the jumbled return you are seeing. Can you send us an example file so that I can test from here? If you can, you can upload the file here 360works.com/upload . 

Share this post


Link to post
Share on other sites

Thanks, Ryan! I made sure that the Russian document was encoded in Unicode (not UTF-8, just plain old Unicode on the Windows options) and SCRIBE worked to extract the text.

 

I am still working on the Chinese document--this is a docx document, but it still is coming through as half-width boxes.

 

Is there a way to error trap this problem --check the encoding of a file before extracting text?

 

Thanks,

Carole

Share this post


Link to post
Share on other sites

Carole,
 
I wouldn't suspect that Scribe is going to set any errors as you are still getting a return, albeit not a very useful one. However, you can use Scribe last error to see if anything is being set.
 
Docx files are a little more complicated. They have a lot going on "under the hood" compared to just a simple .txt file. I know that you can set the encoding of a docx file when you save it but I believe if its not specified it is unicode by default. Check to make sure the Chinese document is indeed saved in unicode. You can also set the encoding when you open a file. Consult this link for more details . It may be that when you open the file Word changes the encoding to properly display the text.

Share this post


Link to post
Share on other sites

I am having a similar issue - only I don't think word document encoding is to blame. 

When I run 

Scribe FileAsTxt on the .docx file locally, I get the correct readout. 

 

However, the very same directive, on the very same .docx run on a server side script, returns the 

 

PK0????[Content_Types].xml???N?0?_??%.,?j??K@???3i

??=???3N B?m??D?g?9_?2?^[?? &?]??? g??o??W???]q?YB?i???o ????y 1ҺT?b?#DR?2?>??J룕H?q.?T?r?b2??;?f>??@+??]?z??-g?ƜUqm?????]?|?(???7??h%

kind of output. 

 

Is there anything else I am missing? The .docx file is stored in a SuperContainer location. 

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.