how to detect numerical data

Pat · April 25, 2012

I am trying to figure out a way to separate out a number/modifier from the beginning of a field entry. For example in these two entries:

25 MG Valium 2 MG

Niacin 25 MG

I want to separate the leading '25 MG' from the Valium entry into a new field called Quantity. I tried this:

Position ( drugfield ; " MG " ; 1 ; 1 ) ≤ 10 and

PatternCount ( drugfield ; " MG " ) = 1 ;

LeftWords ( drugfield ; 2 );

only to realize it picked up Niacin in the second record. Some quantities can have long numbers, eg 2000000 UNT so it is very variable.

Is there a way to test if the first position is a number? Other ideas? Thanks for any assistance.

comment · April 25, 2012

It's hard to tell what is consistent among all your entries. How about =

LeftWords ( Filter ( drugfield ; "0123456789 " ) ; 1 )

Pat · April 25, 2012

That's the problem, although more entries do not have leading quantities than those that do. I want to remove the leading quantities to get down to the root part of the field. The quantities can be of different units, ie MG, ML, UNT, etc. That's why I thought if I could detect a number is the first position, that would be the simplest. Everything starts with a letter if it doesn't have a quantity leading it. All will have numbers and units at the end of the entry (drug name and then strength).

Tried your idea, and not successful yet. Thanks for responding.

comment · April 25, 2012

I want to remove the leading quantities to get down to the root part of the field.

I a not sure what you mean by that. I thought the first word that is a number would be the Quantity. Probably the next word after that would be the Unit.

Pat · April 25, 2012

Sorry. Entries can be either:

1st number : unit : drug : 2nd number : unit

quantity( = 1st number + unit) : drug name : drug strength( = 2nd number + unit) or

500 MG Valium 2 MG

2000000 UNT Vaccine 2 UNT/ML

drug : 1st number : unit

drug name : drug strength( = 1st number + unit)

Valium 2 MG

Vaccine 2 UNT/ML

I refer to 1st and 2nd numbers to represent when a number is encountered in the field. Some entries will have the two numbers, most others not.

I want to get all entries into the form of the second entry: drug name : drug strength

I am not as interested in the quantity provided of the (drug name drug strength) as I am in the actual (drug name drug strength).

comment · April 25, 2012

If all entries fall within one of the two patterns then it should be easy: all you need to do is determine if the first word is a number or not, then extract the elements according to the pattern, e.g.:

DrugName =

Let ( [

firstWord = LeftWords ( drugfield ; 1 ) ;

N = Case ( firstWord = Filter ( firstWord ; "0123456789" ) ; 3 ; 1 )

] ;

MiddleWords ( drugfield ; N ; 1 )

)

Lee Smith · April 25, 2012

Sorry. Entries can be either:

Hi Pat, could you share with us a sample of the actual data, or a site where your are getting it from?

Also the names of your fields that you want to populate, and with what.

Sometimes there are separators that we can use that can speed things up.

Lee

Pat · April 25, 2012

Well, quasi-success!

A problem that has now occurred is for entries that have a decimal number in front: 0.1 ML for example. I had broken down your calc to better understand it, by creating a firstWord field and an N field. For regular numbers, N is 3 from firstWord. For the decimal number in firstWord, N is 1. So for all the other entries, I have successfully pulled out the drug name, but not for quantity entries with a decimal. And to further complicate the issue, some drug names come with a numerical designation, for example 4-aminobenzoate. The four is not a quantity, it is a position of a chemical moiety in the compound. So now I need to figure out a way around this issue as well.

Lee, each entry is one field with no separators. The source is RxTerms from the NIM, and comes with pipes which I used to separate each field, and thus have this field with "quantitiy drug name strength" in one field as it was included between pipes this way. For my purposes, I want one field to be just name and strength, stripping off the quantity, and then ultimately, I want a field with just name. I am successful for all the entries without leading quantities, quantities keep tripping me up.

Thanks for all you efforts! I do appreciate it.

comment · April 25, 2012

OK, then how about =


Let ( [

text = TrimAll ( drugfield ; 0 ; 0 ) ;

firstWord = LeftWords ( text ; 1 ) ;

pos = Position ( text ; " " ; 1 ; 2 ) ;

start = Case ( firstWord = Filter ( firstWord ; "0123456789." ) ; pos + 1 ; 1 ) ;

end =  Position ( text ; " " ; start + 1 ; 1 )

] ;

Middle ( text ; start ; end - start )

)

Sign In

how to detect numerical data

Recommended Posts

Pat

comment

Pat

comment

Pat

comment

Lee Smith

Pat

comment

Create an account or sign in to comment

Create an account

Sign in

Browse

Site Support

Forums

Blogs

Marketplace

Activity

Important Information