Muralidharan Posted May 26, 2011 Posted May 26, 2011 Hi, Is there anything possible to extract test/numeric values from the word file. i need to extract the similar kind of values from multiple documents. All the documents having same type of pattern.
bcooney Posted May 26, 2011 Posted May 26, 2011 That's probably going to be a huge challenge and parsing nightmare. Perhaps you can provide an example? Take a look at 360works' Scribe.
Muralidharan Posted May 28, 2011 Author Posted May 28, 2011 Hi bcooney, I try with the 360 works scribe plugin. I need to extract Name, Contact No and Address from CV and save this values according to the fields given. ie., Name into Name field, Contact No into Contact No field and Address into Address Field. I extract these values using 360 scribe plugin using Regular expressions. But, how I get these values into particular field.
bcooney Posted May 28, 2011 Posted May 28, 2011 I'll move this into the 360 forum, and we'll see if they post a comment or suggestion.
Muralidharan Posted May 31, 2011 Author Posted May 31, 2011 I'll move this into the 360 forum, and we'll see if they post a comment or suggestion. Hi bcooney, Is there any comment or suggestion from 360 forum... please
Smef Posted May 31, 2011 Posted May 31, 2011 This can be done pretty easily with scribe if your word document uses content control fields. If it doesn't you will need to extract the entire contents of your document and then do regex parsing with scribe and filemaker text functions. I extract these values using 360 scribe plugin using Regular expressions. But, how I get these values into particular field. If you've already done the regex parsing to get the parts of the text that you want then you can just use the Set Field script step as you normally would to set it to the field you want to store the value in. If your varible $result has the value you want from Scribe then you can just do Set Field [Myfield; $result].
Muralidharan Posted June 1, 2011 Author Posted June 1, 2011 This can be done pretty easily with scribe if your word document uses content control fields. If it doesn't you will need to extract the entire contents of your document and then do regex parsing with scribe and filemaker text functions. If you've already done the regex parsing to get the parts of the text that you want then you can just use the Set Field script step as you normally would to set it to the field you want to store the value in. If your varible $result has the value you want from Scribe then you can just do Set Field [Myfield; $result]. Hi Smef, I'm using the Keywords as $Field. These vales containing some other table. First i extract values from word document to one edit box. and extract based on keywords given using this script. In that CV Name : AAAAA Father’s Name : XXXXX Date of Birth : 12-12-2050 Marital Status : Single Gender : Male Here Name, Father's Name, Date of Birth, Maritial Status, Gender are the key words. Using loop concept the keyword is chaned based every loop. The Script is, Set Variable [$result; Value:ScribePatternMatchAll (Employee_details::Import; "^\s?"&$Field&"(\s*)+:?(\s*.*?)(?:#|$)")] It is extract based on keywords given. My problem is Every CV there is Keywords are different. How to i rectify this. Anything possible without using keywords.
Smef Posted June 1, 2011 Posted June 1, 2011 You could use regex to look for " : " if every document is formatted in that way and then check for text around it, but if formatting is different between documents and you don't know what values you're looking for I think it would be very difficult to write regex to parse the information out.
Muralidharan Posted June 3, 2011 Author Posted June 3, 2011 You could use regex to look for " : " if every document is formatted in that way and then check for text around it, but if formatting is different between documents and you don't know what values you're looking for I think it would be very difficult to write regex to parse the information out. Is there any other possible option to extract values...
Smef Posted June 3, 2011 Posted June 3, 2011 If you don't know what you want to extract, what format it is written in, or where it is in the document it becomes very, very difficult to write a method of text parsing. If you knew the 10 (let's say) sets of text you wanted to look for it would be much easier, but with what you're saying it sounds like a difficult parsing task.
Recommended Posts
This topic is 4921 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now