Pat Posted April 25, 2012 Posted April 25, 2012 I am trying to figure out a way to separate out a number/modifier from the beginning of a field entry. For example in these two entries: 25 MG Valium 2 MG Niacin 25 MG I want to separate the leading '25 MG' from the Valium entry into a new field called Quantity. I tried this: Position ( drugfield ; " MG " ; 1 ; 1 ) ≤ 10 and PatternCount ( drugfield ; " MG " ) = 1 ; LeftWords ( drugfield ; 2 ); only to realize it picked up Niacin in the second record. Some quantities can have long numbers, eg 2000000 UNT so it is very variable. Is there a way to test if the first position is a number? Other ideas? Thanks for any assistance.
comment Posted April 25, 2012 Posted April 25, 2012 It's hard to tell what is consistent among all your entries. How about = LeftWords ( Filter ( drugfield ; "0123456789 " ) ; 1 )
Pat Posted April 25, 2012 Author Posted April 25, 2012 That's the problem, although more entries do not have leading quantities than those that do. I want to remove the leading quantities to get down to the root part of the field. The quantities can be of different units, ie MG, ML, UNT, etc. That's why I thought if I could detect a number is the first position, that would be the simplest. Everything starts with a letter if it doesn't have a quantity leading it. All will have numbers and units at the end of the entry (drug name and then strength). Tried your idea, and not successful yet. Thanks for responding.
comment Posted April 25, 2012 Posted April 25, 2012 I want to remove the leading quantities to get down to the root part of the field. I a not sure what you mean by that. I thought the first word that is a number would be the Quantity. Probably the next word after that would be the Unit.
Pat Posted April 25, 2012 Author Posted April 25, 2012 Sorry. Entries can be either: 1st number : unit : drug : 2nd number : unit quantity( = 1st number + unit) : drug name : drug strength( = 2nd number + unit) or 500 MG Valium 2 MG 2000000 UNT Vaccine 2 UNT/ML drug : 1st number : unit drug name : drug strength( = 1st number + unit) Valium 2 MG Vaccine 2 UNT/ML I refer to 1st and 2nd numbers to represent when a number is encountered in the field. Some entries will have the two numbers, most others not. I want to get all entries into the form of the second entry: drug name : drug strength I am not as interested in the quantity provided of the (drug name drug strength) as I am in the actual (drug name drug strength).
comment Posted April 25, 2012 Posted April 25, 2012 If all entries fall within one of the two patterns then it should be easy: all you need to do is determine if the first word is a number or not, then extract the elements according to the pattern, e.g.: DrugName = Let ( [ firstWord = LeftWords ( drugfield ; 1 ) ; N = Case ( firstWord = Filter ( firstWord ; "0123456789" ) ; 3 ; 1 ) ] ; MiddleWords ( drugfield ; N ; 1 ) )
Lee Smith Posted April 25, 2012 Posted April 25, 2012 Sorry. Entries can be either: Hi Pat, could you share with us a sample of the actual data, or a site where your are getting it from? Also the names of your fields that you want to populate, and with what. Sometimes there are separators that we can use that can speed things up. Lee
Pat Posted April 25, 2012 Author Posted April 25, 2012 Well, quasi-success! A problem that has now occurred is for entries that have a decimal number in front: 0.1 ML for example. I had broken down your calc to better understand it, by creating a firstWord field and an N field. For regular numbers, N is 3 from firstWord. For the decimal number in firstWord, N is 1. So for all the other entries, I have successfully pulled out the drug name, but not for quantity entries with a decimal. And to further complicate the issue, some drug names come with a numerical designation, for example 4-aminobenzoate. The four is not a quantity, it is a position of a chemical moiety in the compound. So now I need to figure out a way around this issue as well. Lee, each entry is one field with no separators. The source is RxTerms from the NIM, and comes with pipes which I used to separate each field, and thus have this field with "quantitiy drug name strength" in one field as it was included between pipes this way. For my purposes, I want one field to be just name and strength, stripping off the quantity, and then ultimately, I want a field with just name. I am successful for all the entries without leading quantities, quantities keep tripping me up. Thanks for all you efforts! I do appreciate it.
comment Posted April 25, 2012 Posted April 25, 2012 OK, then how about = Let ( [ text = TrimAll ( drugfield ; 0 ; 0 ) ; firstWord = LeftWords ( text ; 1 ) ; pos = Position ( text ; " " ; 1 ; 2 ) ; start = Case ( firstWord = Filter ( firstWord ; "0123456789." ) ; pos + 1 ; 1 ) ; end = Position ( text ; " " ; start + 1 ; 1 ) ] ; Middle ( text ; start ; end - start ) )
Recommended Posts
This topic is 4664 days old. Please don't post here. Open a new topic instead.
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now