Script to extract multiple values from a field as individual records?

alana2 · May 23, 2007

Hi all,

I am a newbie here so my apologies if I stumble.

I am a marine biologist working with a database of ~2000 records of encounters with groups of killer whales. Each record is one encounter, and I have a field called "Individual ID" in which up to 30 individuals are listed, space delimited (eg "T001 T002 T002A" etc). What I want to do is to parse this data so that it is organized by individual, rather than by encounter, with all of the relevant data for each encounter associated with each individual. In a separate table, I have a list of all the known individual whales (n=300).

I'm thinking that what I need to do is create a script that looks up the table of whale IDs, selects the first ID on the list, executes a Find on the Encounter data, and extracts all of the records that include that ID within the Individual_ID field. It could then save those records to a new file. It would have to also create a new field called Target_ID and fill all of the new records with the relevant ID. Then it would loop to the next ID on the list and do the same thing. It would be great if each loop could extract to the same file, appending each set of records as it went. This would end up with a new file with ~10,000 records.

Am I on the right track? I have been trying to figure out how to do this, and have searched this forum for similar scenarios but I think my experience might be too limited to find what I need.

I have done this once manually in Excel and it took 3 days of mind-numbing cutting and pasting, so I would be VERY grateful for any advice! Thanks!

-- Alana

comment · May 23, 2007

Splitting the data into separate records is easy: define a calculation field (result is Number) =

Let ( [

text = Substitute ( Extend ( Individual ID ) ; " " ; ¶ ) ;

i =  Get ( CalculationRepetitionNumber ) 

] ;

Substitute ( MiddleValues ( text ; i ; 1 ) ; ¶ ; "" )

)





Let the calculation have 30 repetitions. Next, because of a bug in version 7, make a similar repeating calculation field =


Extend ( EncounterID )

Show all records in the Encounters table, and import them to your new table. Map the first calculation field into the WhaleID field, and the second one into EncounterID. When asked about values in repeating fields, select 'Splitting them into separate records'.

But I am not all together following your strategy. I believe you should have a table of Whales, a table of Encounters, and this new table joining them in a many-to-many relationship (see a demo here). And you should be using serial IDs for your relationships, so something like T002, T002A shouldn't occur.

David Jondreau · May 23, 2007

I think you're on the right track. You really do need three tables: a Whales table, an Encounters table, and a WhaleEnounters table. Each individual whale/encounter will be a separate record in that last (new) table and only needs three fields: WhaleID, EncounterID and PrimaryID (auto-enter serial).

Are new encounters still be entered? If they are, you'll want to split them up, then use the new table for entering new ones so a new record is created for each whale/encounter.

I would probably do the split a little differently. I would create a calculated repeating field in the Encounter table with 30 reps. The calculation would be

WhaleIDsRep=

GetValue(Extend(EncounterIDs) ; Get(CalculationRepetitionNumber)).

That will place each whale ID in a separate repetition of the calc field. Then from the new table you will import all the records from the Encounters table at once, using the option to split repeating fields into separate records (which shows up after you click Import).

Edited May 23, 2007 by Guest

alana2 · May 24, 2007

Hi, thanks for your comments.

I definitely see the need to go to three tables. However, these data are being entered on an on-going basis, by a number of users with varying degrees of computer savvy. So I am looking for a solution that can really run in the background without having to change the whole structure of the database and risk having the data entered incorrectly. Can I set up the third table for my use in extracting the data I need, while still having users enter data in the original Encounters table?

Also, we do need to keep the IDs in their current format, as they incorporate genealogy in their names, so T002A is the daughter of T002, T002A1 is the daughter of T002A and so on. It is the way that the gurus have defined the individual names so that is what I have to work around...

David, I tried using the calculation you suggested:

WhaleIDsRep=

GetValue(Extend(Encounter IDs) ; Get(CalculationRepetition Number)).

... but "GetValue" was not recognized and not included in the list of functions in the Specify Calculation dialog box. Am I in the right place?

Thanks again, I really appreciate the help.

-- Alana

David Jondreau · May 24, 2007

I'm coming from 8.5, I don't know if GetValue was a feature of 7. Sorry. The bug comment mentions may also be in play, you're better off using his post.

You'll save everyone a lot of trouble if you change the data entry layout. It won't be seamless, but pretty close.

Simplest way would be make that relationship between WhaleEncounters and Encounters and allow "Creation of new records..". Then, on the Encounters layout, delete the WhaleIDs field (you can in fact delete that from the database entirely once all the dust has settled). Put a portal in it's place showing records from the WhalesEncounters table with one field: the Whale ID field. You can play around with borders and colors to make it look pretty close to the old field.

comment · May 24, 2007

Can I set up the third table for my use in extracting the data I need, while still having users enter data in the original Encounters table?

I think that would me more trouble than it's worth - maintaining 2 solutions and trying to keep them synchronized. If it were me, I'd go for a clean cut. You need to work on the user interface so that other users can adapt to it quickly and painlessly (if there is such a thing).

we do need to keep the IDs in their current format, as they incorporate genealogy in their names, so T002A is the daughter of T002,

You can keep this data - but don't use it as match fields for relationships. Those are really names, not IDs. You should use meaningless, auto-generated serial numbers for your primary keys.

Incidentally, the genealogy could be recorded via a self-join relationship (ParentID to WhaleID).

alana2 · May 24, 2007

OK, I am trying to recreate comment's suggestions. I just want to make sure I am doing this correctly:

I have 3 tables:

Whales - list of all unique IDs of each whale

Encounters - list of each encounter

WhaleEncounters - new table that will relate the other 2 tables to each other.

In Encounters, I have a field called Individual_IDs; this is the field that needs to be separated out. Sample contents:

T090 T090A T124A T124A1 T124A2 T124B T124B1

So, I would use comment's code above to define a new calculated repeating field - in Encounters, right? And the second code to define a 2nd calculated field, also in Encounters?

... Then once those two new fields are created, I create the new table, WhaleEncounters, with two fields, WhaleID and EncounterID - to which I map the two calculated fields.

Is this correct? I'm sorry if I appear dense here - I'm pretty sure I'm in way over my head so I do appreciate your advice.

-- Alana

comment · May 24, 2007

Yes, that sounds about right. I would also add a SerialID in the new table (again, an auto-entered serial number). You won't be needing it right away, but it's better to plan ahead. Why don't you try it with a copy of your file (offline, if your file is served).

But you have another task to complete the conversion, and that is to release the "T" field from serving as key field in relationships, and replace it with a proper serial ID.

alana2 · June 6, 2007

Hi comment and David,

I just wanted to let you know that I got comment's script working and it does just what it is supposed to. Although my manager is reluctant to restructure the database at this time, this at least gives me a great work-around so that I can extract data in the background without affecting the way data is being entered by the various users.

AND - both of you have opened my eyes to the sorts of things we need to do to improve our databases, and I appreciate that.

So - thank you both very much for your help!

-- Alana

Sign In

Script to extract multiple values from a field as individual records?

Recommended Posts

alana2

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

David Jondreau

Link to comment

Share on other sites

alana2

Link to comment

Share on other sites

David Jondreau

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

alana2

Link to comment

Share on other sites

comment

Link to comment

Share on other sites

alana2

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information