Jump to content

How to identify and collate duplicates?


This topic is 5512 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I'm trying to find a way to identify duplicates contained within 20 fields in a single record.

Duplicates are possible and acceptable, though unlikely.

The 20 fields are populated by choosing from a valuelist drop-down via an alternate table.

I've been researching possible recursive functions, hoping that I might bump into something that (a) I understood and (:( having understood it I might be able to bend it to my own ends - but I am a custom function virgin!

Where I've got to is a concatenated list of all the fields (List1) and a custom function which strips out the duplicates (List2). Hopes that I might be able to compare one list with the other (and display the difference between them) have failed because nearly everything I've tried seems to treat the duplicates as if they were present in both lists - and in one respect they are... [or if something has worked it invariably fails if the last list item is a dupe.] Jonathan Stark's SubtractValues (List1, List2) comes very close, but it only captures the first appearance of a duplicate and fails completely if the last entry in List1 is a duplicate.

I've tried John Osborne's True Value Filtering and XORvalues from Brian Dunning's site, but so far the only custom function which works is RemoveDuplicates.

Any suggestions would be gratefully received - but whether I'll understand them is another matter!

Link to comment
Share on other sites

This simple custom function will list all duplicate values in a list:

DuplicateValues ( listOfValues )


Let ( [

item = GetValue ( listOfValues ; 1 ) ; 

duplicate = ValueCount ( FilterValues ( listOfValues ; item ) ) > 1 ;

countValues = ValueCount ( listOfValues ) 

] ;

Case ( duplicate ; item & ¶ ) 

& 

Case ( countValues > 1 ; DuplicateValues ( RightValues ( listOfValues ; countValues - 1 ) ) )

)

Note:

1. The test for duplicates is NOT case-sensitive.

2. Triplicate values will be listed twice in the resulting list, and so on.

BTW, if you need to test for duplicates among fields of the same record, there is very likely something wrong with your data structure.

Link to comment
Share on other sites

Many, many thanks Comment.

I've spent hours footling around in all the wrong areas and you throw this back in seconds!

In terms of whether there's something wrong with the data structure, that's always a possibility! If you've got yet another 30 seconds I'll briefly outline what I'm trying to achieve - I'd value your opinion.

What I'm trying to create is a set-list db: something that retains a catalog of performance (song) titles; a printable list of selected titles, by performance (location/date); and ending up with a summary report showing where each catalog title is performed...

So basically 3 tables...

Catalog

Location/Performance

Report

[And a bunch of TOs]

Songs are selected from a drop-down valuelist, based on the catalog - duplicates are in effect second performances of the same title.

I did try to break the Location/Performance table into at least two (if not 3) tables, which would give me the opportunity to select 1 title per records, but I couldn't make that work.

If you can see a better way to achieve this please don't hesitate to tell me!

Thanks again.

MU

Link to comment
Share on other sites

I'm afraid I don't understand your explanation too well. It seems there are Songs, and each song can have any number of Performances. Not sure where do locations and dates come into this - are you cataloging existing performances of the songs, or are you playing these songs at various locations and keep a log of what you did?

Link to comment
Share on other sites

It's all about where songs are performed, live.

So the catalog is effectively all the possible titles, whereas the venue is where some of the songs are performed...

And the report is merely a summary of titles performed and where.

Link to comment
Share on other sites

OK, so where do 20 fields come into this? I see a table of Songs, a table of Gigs (for lack of better name), and a table of PerformedSongs:

Songs -< PerformedSongs >- Gigs

Place a portal to PerformedSongs on a layout of Gigs, and all you need to enter is the SongID to create the gig's playlist. A report from PerformedSongs can be sub-summarized by songs to eliminate duplicates.

Link to comment
Share on other sites

Place a portal to PerformedSongs on a layout of Gigs, and all you need to enter is the SongID to create the gig's playlist.

Unfortunately you've lost me! You're still on three tables, right?

The PerformedSongs table is empty, yes? Songs have to be selected from the Songs (Catalog), from the context of gigs, first...

So I have to get the Catalog into Gigs and then each song selection out of gigs and into PerformedSongs (presuming that this represents the history?

Or maybe I really haven't got what you've proposed!

Link to comment
Share on other sites

This topic is 5512 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.