Newbies Paco Posted August 30, 2010 Newbies Posted August 30, 2010 Hello, I have a database which is consist of about 10.000 records. I want to export all data of a text field which will be indexed alphabetically. For example: First record of x field contains "The bog turtle is a semiaquatic turtle" And secord record "It is the smallest North American turtle" Now i want to export this data like: a American bog is It North semiaquatic smallest the turtle Thank you for your help Paco
comment Posted August 30, 2010 Posted August 30, 2010 I can think of couple of ways - unfortunately, neither one is quite simple: 1. Export as XML, using a custom XSLT stylesheet. The stylesheet needs to (a) tokenize the field contents into words; (: sort the tokens and © remove duplicate tokens. 2. Define a repeating calculation field to split the text into individual words. Import this into another table as separate records, then export grouped by word.
Newbies Paco Posted August 31, 2010 Author Newbies Posted August 31, 2010 I can see my data as separated words when i click on the field "insert from index" option. Is it possible to export this index data but as separated words?
comment Posted August 31, 2010 Posted August 31, 2010 I don't know of a way to export (or even copy) the index. Is this a one-time operation or do you need to do this periodically? For a one time thing, I believe it would be easier to export the text as is, and produce the word index in another application.
Lee Smith Posted August 31, 2010 Posted August 31, 2010 I'm not sure why you would need this, but I would do it this way. Export the Field using the Tab Delimitated format Open it in TextWrangler Find space and replace with r Sort Lines Process Duplicate Lines You now have your list. HTH Lee
Fenton Posted August 31, 2010 Posted August 31, 2010 It is also possible, if your text field is not very large,* to use a Custom Function (requires FileMaker Pro Advanced) to get the unique words of a field, in a single record. It must be a stored calculation. You can then use the Design function ValueListItems (Get(FileName), "value list name") to produce an index. I don't know if there's a limit of the size that ValueListItems can return. UniqueWords http://www.briandunning.com/cf/478 *Actually, you don't really need "unique" words in this first stage. You just need words. ValueListItems will take care of the uniqueness. P.S. As comment says, if you only need to do this once in a while, it may be easier (since you don't have FileMaker Pro Advanced) to use external tools. The free TextWrangler can do a Find/Replace to create "words", then sort and de-dupe the lines.
comment Posted August 31, 2010 Posted August 31, 2010 Find space and replace with r You would probably want to replace punctuation symbols with , too.
Lee Smith Posted August 31, 2010 Posted August 31, 2010 Good Point. After replacing the spaces, then do a Grep Find and replace using Find [?|.|,|!] Replace with nothing. HTH Lee
Newbies Paco Posted September 1, 2010 Author Newbies Posted September 1, 2010 Thank you for your replies. I think it is impossible to remove duplicate words. Because my text data is very large (about 5.000 pages) and removing duplicate words will take more times. I am a translator and i need this word list because of there are some different writings of same words. Ok. maybe i can control it by using filemaker index window.
LaRetta Posted September 1, 2010 Posted September 1, 2010 (edited) I may be missing something here but wouldn't something like this work (attached). By its very nature, it eliminates duplicates and it can be sorted. If you want one field with the final results, you can always create a value list based upon this field and export just the calculation but records would work, wouldn't it? BTW, by using xWords, non-word characters are dropped automatically. UPDATE: This will delete the records as it goes. Peel.zip Edited September 1, 2010 by Guest
comment Posted September 1, 2010 Posted September 1, 2010 Nice (I'm just not sure why you delete the original records, though).
LaRetta Posted September 1, 2010 Posted September 1, 2010 No reason other than I envisioned feeding multiple records into it and it spitting the words out the backside and returning to empty state waiting for new supply of records to process. FEED ME SEYMOUR :smile2:
bruceR Posted September 1, 2010 Posted September 1, 2010 If applescript is allowed try this. It gets the field contents (field = cell contents across found set in applescript; think column.) Breaks it into new-line delimited words, passes it to shell script that uses sort and uniq functions. GetWords.fp7.zip
Newbies Paco Posted September 7, 2010 Author Newbies Posted September 7, 2010 Your solution is GREAT! It has got some more time spitting the text data to words due to large amount (about 500.000 words), but it works! Thanks for the sharing.
Newbies Paco Posted September 7, 2010 Author Newbies Posted September 7, 2010 If applescript is allowed try this. It gets the field contents (field = cell contents across found set in applescript; think column.) Breaks it into new-line delimited words, passes it to shell script that uses sort and uniq functions. Your solution is also works if you have a small text data. I have got an error message due to large text. Paco
Recommended Posts
This topic is 5190 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now