Adam123 Posted July 16, 2004 Posted July 16, 2004 Okay I need to parse data on vehicles. Im going to make a field I paste the following into: -------------------------------------------------------------------------- Exterior Length: 188 in. Width: 70.9 in. Height: 56.5 in. Wheel Base: 111.4 in. Curb Weight: 3748 lbs. Interior Front Head Room: 38.7 in. Front Shoulder Room: 56.8 in. Rear Head Room: 37.8 in. Rear Shoulder Room: 55.9 in. Front Leg Room: 41.7 in. Rear Leg Room: 34.2 in. Luggage Capacity: 11.1 cu. ft. Maximum Seating: 5 Performance Data Performance Braking Distance (60-0 mph) 136 ft. Road Holding Index: .89 g Base Number of Cylinders: 8 Base Engine Size: 4.4 liters Base Engine Type: V8 Horsepower: 282 hp Max Horsepower: 5400 rpm Torque: 324 ft-lbs. Max Torque: 3600 rpm Drive Type: RWD Fuel Data Fuel Fuel Tank Capacity: 18.5 gal. EPA Mileage Estimates: (City/Highway) Manual: 15 mpg / 24 mpg Automatic: : 18 mpg / 24 mpg Range in Miles: (City/Highway) Automatic: 333 mi. / 444 mi. Manual: 277.5 mi. / 444 mi. ------------------------------------------------------------------ Then I want to have a button next to this large field to click and make it parse the diffrent data into diffrent fields. like take the Horsepower: 282 hp and extract the 282 from it and place that into the horsepower field along with about 7 other things.
Fenton Posted July 17, 2004 Posted July 17, 2004 Here's another little discussion on the subject, with a few example files, from myself, Ugo, and Jim McKee: http://www.fmforums.com/threads/showflat...=true#Post81884 But it could be simpler with version 7, using the MiddleValues() function. All the examples are still "sixish." The real problem with parsing your example is that you have lines with "multiple values," lines which have 2 labels, followed by ":". (One line has 3 ":", but I think/hope it's a typo. "Manual: 15 mpg / 24 mpg Automatic: 18 mpg / 24 mpg") How would you tell where the data from the first label ended and the 2nd label began? Sometimes the labels have more than one word. I see no generic way to do so; it appears that each of these instances would have to be "hard-coded," using the particulars (in this case labels are "Manual:" and "Automatic:"). It seems to me that with a limited set of labels, and the above serious problem with the structure, it would be easiest to just hard-code the whole thing. Explicitly target each label, get its data, then set it into its field. The data would be whatever is in between 2 labels, minus any carriage returns (if any). My preferred method, which I think is easiest for beginners, is to set each chunk into its own global "line" field, then remove it (using Substitute) from the original field (which I have saved as a whole in other global, in case something goes wrong). So it's a Loop within the lines of a global, until all data is removed. Setting the data into fields is a whole 'nother technique (trick). Go To Next Field Set Field ["", "the data"] i.e., no targeted field. It can only be done on a dedicated layout, with the fields (and only the fields) in the correct tab order. The date MUST ALWAYS be in the same order. EVERY label must be present in the data, whether it is blank or not. Otherwise it would have to done with a god-awful mess of nested If's. On the bright side, you can ignore the "sections," not needed. Anyone have a magic bullet?
Fenton Posted July 17, 2004 Posted July 17, 2004 I should modify the above to say that while the label values would be "hard-coded," they would not have to hard-coded into the script itself. They could be read, one at a time, from another permanent multi-line global, using MiddleValues. To get that field, just paste in your example data, remove everything but the labels. Add returns where needed so that each was on its own line. During the parsing Loop, you'd go get the value for the current label and the next label from this global field. The data is what's between them.
Fenton Posted July 17, 2004 Posted July 17, 2004 BTW, the "data is what's between one label and the next label" only needs to kick in when there's more than 1 ":" in a line, PatternCount(_gLine, ":") > 1 If there's 1 colon, then the data is from the end of the label to the end of the line. And nothing happens when there's no ":" in a line. This enables you to ignore all the extra lines. Except ones like this: EPA Mileage Estimates: (City/Highway) sigh. Well, that's a start. Have to go. It's parse my bed time -]
Fenton Posted July 18, 2004 Posted July 18, 2004 Well, I couldn't pass up such a challenge. I would not expect a beginner to succeed at this, not without some serious stress. It wasn't all that easy, and I've parsed a few in the past. There were a couple small glitches, mostly outlined in previous posts. The interesting part was in using the current label and the next label in order to get at the first data in the line; which I could see no other way of getting accurately.
MoonShadow Posted July 18, 2004 Posted July 18, 2004 This is very interesting, Fenton. And exactly what I need right now. Thank you. This is the kind of thing Bob Weaver excels in also!! It'll take some time to wrap my head around it though.
Ugo DI LUCA Posted July 18, 2004 Posted July 18, 2004 So that you won't be alone working on this puzzle... My guess was that labels would start with Capital letters. Height: 56.5 in. Wheel Base: 111.4 in. Then, from my label and ":" to the next Capital letter is the value. There are a few traps though as usual Base Engine Type: V8 Type: RWD EPA Mileage Estimates: (City/Highway) I'm quite sure a few others would show if we had the entire file, rather than a single record to work with. I agree with you harcoding the labels would be a pain as surely some others would show (or be missing) in other records. In order to have it working as much as possible, I'd think the first step would be to re-order this data sheet, so that it happens to be easy to parse at the end. I used nested Substitute( ) in a script. Added is a script that parse out labels in a value list, from which you can get the values afterward. Well, nice Sunday, wasn't it Fenton ParsingVehicles.zip
Ugo DI LUCA Posted July 18, 2004 Posted July 18, 2004 Sorry for the fm.5 file guys. Didn't even noticed this was 7 section...
LaRetta Posted July 18, 2004 Posted July 18, 2004 Hey Ugo, I make that mistake all the time. But not to worry ... us'ens with 7 can drop your file on our shortcut in a snap. It's when we blow it - and post a 7 file - that it is embarrasing.
Fenton Posted July 19, 2004 Posted July 19, 2004 I'm posting more or less the same file again. This one has a (heavily) commented script. I admit, even with the help blurb I'd written, it was probably not easy to see what each step did. It's a 7-only file because it uses Values quite a bit. This could be translated into 5-6, using the old fallback of paragraph return positions; but it would be longer and more tedious. The central trick, solving the difficult problem of multiple labelled, but not properly separated, pieces of data on the same line was to use the current and next labels, as a kind of "sandwich," with the data in-between. One hopes to seldom need such a technique, and one would never design such a form, but it's a good way to deal with data with lame structural separation, but with known labels. The labels would still need to be all of them present, and ordered correctly. CarSpecsParse.zip
Fenton Posted July 19, 2004 Posted July 19, 2004 I wrote: The labels would still need to be all of them present, and ordered correctly. This is not actually necessary. The problem: 1 line with 2 labels. You know the 1st label, so you know where the 1st data begins; but you don't know precisely where the 1st data ends and the 2nd label begins. There is only a space, which is not a definitive separator. The 1st data ends just before the 2nd label starts. It actually doesn't matter that it's the "next" label, in a known order of labels. It only has to be "any" label of the known labels. A loop through the known labels would tell you. But you don't know how many words it is; otherwise none of this would be necessary. That's also not insurmountable. You start with the 1st word before the colon, try that, and if it fails, add the previous word and try again. Once you've got both labels you can get the data between them. But the very fact that the order is not continuous means that you can't just walk down all the fields, setting data as you go. So you don't know where to put the data, because you can't go to or set a field by a text name (actually AppleScript can). At that point there's 2 alternatives. 1. A largish script of nested If, ElseIf steps. If [ label = "something"]
Recommended Posts
This topic is 7769 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now