September 9, 200124 yr Well folks, I’ve now posted over 30 answers but only asked one question…. So I guess I’m entitled to a little help now! LOL I don’t have a problem as such, as I’ve already found an answer. However, if someone could help me to do this a better way I would be very happy. The situation: I have an address File with over 130,000 company names and addresses. I get these names and addresses from CDs, which are imported into the DB. The problem is keeping out the double records. If every Company address CD manufacturer used the same formats, this would be easy – but they don’t: So for example: Mueller & Mueller GmbH Ring Strasse 6 46789 Munich Deutschland Is identical with: M
September 10, 200124 yr Instead of using a hi-tek self-join, why not just perform a lo-tek find on the calc field for duplicates (search for exclamation character). It can be scripted: perform the find, sort, then start sifting through the records. Because you're no longer using the relationship the calculation based on the global field will now work.
September 18, 200124 yr Author I think you missed my point a little! My problem isn’t finding the doubled records. The system of taking out specific characters works great for this. The problem is telling FM which letters, numbers, words, and other characters to remove. If I set these up permanently as part of the calculation, the user has no way of changing them, which is important for my client. An example would be: There are 4 fields: <CompanyName> <Sub1> <Sub1With> <CompanyNameResult> The user enters in field <Sub1> what should be removed from <CompanyName>, and enters in <Sub1With> what the text in <Sub1> should be replaced with, so it would look like this: <CompanyName> = “Boris M
September 19, 200124 yr I don't feel like wrapping my brain around your problem at the moment (it's 10:00 am here, just waking up) but my intuition suggestes that a problem like this might be solved by making a database of "substitutions" and looking through all the records to search and replace. Time consuming and complicated, yes. But at least adding and removing substitutions will be clean-cut. Having said that, maybe you are asking for the impossible. Most databases of addresses acknowledge that there are a certain percentage of duplicates and duds.
September 19, 200124 yr This really does spiral off into the area of AI. What do you do about incorrectly typed abbreviations you want to remove, etc. Remember the short lived AI stock boom of the early '80 sold on the basis that artificial intelligence was only a few years away? We sure kicked the heck out of that problem (maybe in another 100 years)! What it kind of boils down to is that for the price of all the engineering required to provide a poor/marginal solution, you can hire an army of low cost clerical workers who can solve it just fine. Not as much fun as tinkering with databases, but a lot faster and cheaper. -bd
September 19, 200124 yr Author Ok Guys! Thanks for your comments. To be honest, I thought I had already found the best way (read my first posting on this matter), but just thought I’d get your ideas first incase any of you had a way of speeding this up. I’ll stick to the related file, using a lookup and forcing the user to use a script if any values are changed. I’m also going to put a number of users limit on this command, so that it can only be used if there is little network activity. If I come up with a better way, I’ll let you know. Rigsby
Create an account or sign in to comment