November 15, 201213 yr I often receive names strings which vary in format, eg Harris, John Dr Harris, Dr John Dr John Harris John Harris I would like to extract the component names. My plan - which is my question - is to find the word number of the title (eg Dr=3), Once I have the word number for the title I can use various middlewords to extract the name components. BTW titles are likely to including a variety of options. Dr, A/Prof, Prof, Ms, Mrs, Mr For info, I have a clunky workaround for components which might be read as more than one word eg A/Prof, Smith-Harris by using substitute - "/" "slash", "-" to "hyphen, etc extract via middlewords and then substitute back. All suggestions welcome.
November 15, 201213 yr Why the big difference in the entry? Is the collection of this data in your control? I posted a file Links to files here that you might find helpful, but really your example of the data is convoluted. HTH Lee
November 16, 201213 yr Author Indeed it is convoluted but the sources of information are different and definitely out of my control. Actually it is worse than I indicated as sometimes a nickname might be shown in brackets as well or use a shortened version of a name. I am trying to bring together multiple lists which have different additional information Edited November 16, 201213 yr by Lee Smith Removed quote, it was distorting you question
November 16, 201213 yr Hey John, Wow, Good Luck with this project. It sounds like you can use Lynn's file by using scripts, but it certainly appears that it will be a hand's on project. I'm curious though, is there any consistency within the individual list, or does each file have all of the same problems? Lee
November 16, 201213 yr Author Thanks Lee. Each source tends to be similar. Sometimes drops the title.
Create an account or sign in to comment