Jump to content
View in the app

A better way to browse. Learn more.

FMForums.com

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Featured Replies

Hello all.

I've just realized that if I can extract a few different field sources of data from about 2000 MS word documents and get that data into some format that can be imported into my FM Pro database . . . if I can do something like this then I can save many hours of data entry that would otherwise be required to fill in my database with archived data.

Please share any ideas you have on the subject.

The fields would be: Gender (text, limited to value list), MedicalRecordNumber (text), SubmitterID (text), SecondSubmitterID (text), AccessionNumber (our ID, text), and ProvidedClinicalInfo (text entries of observed abnormalities or suspected syndromes separated by commas). I figure if I can get the information that's in these documents into a CSV or Excel spreadsheet then I can import to FM Pro; then I can use a sophisticated script interfaced with a gui layout to take the "ProvidedClinicalInfo" and use it to populate clinical data fields that are used to categorize the type of abnormality. For example, the script might take "micrognathia" from the "ProvidedClinicalInfo" field and copy it to the "FacialAbnormalities" field -- terms that haven't been introduced to the script's lists yet could be turned over to staff who would decide which list they belong in.

Thanks for any help you can offer.

Do you possibly have access to a Mac? Because using AppleScript you could create copies of all those files as plain text files. Then you could Import them all into FileMaker as one operation. You would still need to parse out the data.

Alternatively you could write a Word macro to save each as a text file on a PC, but I wouldn't know how to run it on the whole folder of files.

  • Author

Yes, I do have access to Macs here; biologists tend to love Macs and this place is full of both.

However, although I love the Mac interface, I'm a new convert to Linux. I'm sure I can find a Linux software that would convert the MS Word files to text in batch mode.

Fenton, what would be the general process flow you're alluding to?

Step 1: Convert thousands of MS Word docs to text.

Step 2: Import to Filemaker Pro. (How?)

Step 3: ? . . .

Thanks again. You all have really helped a lot.

Yes, you can probably find a Linux "antiword" command line tool. There is an "textutil" tool for Mac OS (10.4?), which works on Word files also. Or you can just use AppleScript and TextEdit, which can open a Word file and get its text.

Once the files are just text, you have a couple options.

1. One is to parse/extract the text using AppleScript and/or command line tools, such as grep and cut.

AppleScript can run Unix command line, using: do shell script "the command goes here". You can mix AppleScript variables in there too (outside the quotes). You can set AppleScript variables to resulting values, then later set them into FileMaker fields, all with AppleScript.

You would likely be parsing the files one at a time, so as to put the values in separate records.

2. Or, you can just use the FileMaker Import Folder command. Though it's usually used for image files, it also supports importing a folder full of text files. The contents of the file go into a single FileMaker field, line returns and all. Then you could use a script with a loop, and FileMaker text functions to parse the field.

3. Or some unholy mixture of the two methods above :-]

You could do #3 in different ways. But there are also a few plug-ins which add grep capability to FileMaker. None free though, I don't think. There is a set of Custom Functions that are free however, somewhere....

Edited by Guest
textutil, not rtf2txt

Could you post a sample Word document?

  • Author

Thanks, Fenton. I think I'll go with the first example; I'll use some linux software to convert the docs to text and then parse them. Maybe we'll parse with Perl as there's someone here who's familiar with writing Perl scripts.

As far as posting a sample report document, I'm a bit nervous about that as I don't want to be advertising the name of my company and I certainly don't want to accidentally reveal patient information. (Word docs are notoriously insecure.) Perhaps I could "anonymize" a report and convert it to pdf first in order to give you an idea of the layout. Would you like me to do that?

  • Author

Woops. Complication. The patient ID, birth day, gender, and outside medical record numbers are all given as a MS word table that isn't read by the text converter in Openoffice.org. 'Probably means that most doc-to-text converters won't read the table, I guess.

Ugh! I'll do a little research and ask around about this.

You don't say what version of Word you are using, but years ago there was a batch conversion wizard included with Word. It would process a batch of files, saving then into a different format. The macro as such doesn't appear to be there any more, but it does seem to be on the Microsoft support site:

http://support.microsoft.com/kb/826174/

If you can get this installed to work with your version of Word, it will probably do what you want.

James

www.james-mc.com

  • Author

Ahah!

catdoc -a -f ascii FILEname.doc > Newfile.txt

The catdoc software is able to convert the table to in-line tab-delimited text. Cool!

'Also found something called wvText that, if you install elinks, makes a pretty-print text file with the table boarders printed with * | - + like on old dot-matrix printers. Tehe! But I don't want that 'cause it will just make parsing more difficult later. Neat though.

This is a big project for an amateur though, and there's other work to be tended to. So, to be continued. . .

  • Author

Wahoo!

for i in *.doc ; do catdoc -a -f ascii $i >${i/%doc/txt} ; done

Just cd into the directory and issue that command. All FileName.doc's are converted to FileName.txt.

Whoopee!

Create an account or sign in to comment

Important Information

By using this site, you agree to our Terms of Use.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.