# De-duplicating a list of values

This topic is 2877 days old. Please don't post here. Open a new topic instead.

## Recommended Posts

I wrote a script that takes a list of values separated by carriage returns and returns the list without any duplicates.  So, if your list is:

apple

banana

pear

banana

apple

The script returns

apple

banana

pear

Here's the script:

```Set Variable [ \$inputlist; Value:Get(ScriptParameter) ]
Set Variable [ \$outputlist; Value:"" ]
Set Variable [ \$num; Value:1 ]
Loop
If [ Let([numberinoriginal=ValueCount( FilterValues(\$inputlist; GetValue(\$inputlist;\$num)));
numberinnew=ValueCount( FilterValues(\$outputlist; GetValue(\$inputlist;\$num)))];
numberinoriginal=1 or
(numberinoriginal>1 and numberinnew=0)) ]
Set Variable [ \$outputlist; Value:\$outputlist & GetValue(\$inputlist;\$num) & "¶" ]
End If
Set Variable [ \$num; Value:\$num+1 ]
Exit Loop If [ \$num>ValueCount(\$inputlist) ]
End Loop
Exit Script [ Result: \$outputlist ]
```

Works great!    But, in reading this forum, what I've realized is that I seem to do things inefficiently and/or less optimally.  Is there any room for improvement here?

Basically, what the script does is set the output list to null at first.  It then steps through each value of the input list.  If it finds that a value isn't a duplicate (i.e. -- the value is present just once in the input list, or if present multiple times in the input list has not yet been added to the output list), then the value is added to the output list.

I was going to do some recursive scripting trickery, but instead figured it would be easier to leverage FM's FilterValues and ValueCount functions.  So, "numberinoriginal" is the number of times the current value is in the input list (using these two functions), and similarly with "numberinnew".

##### Share on other sites

Typically I do this with two custom functions CustomList and UniqueList  using unique list it allows you to also consider case sensitivity, as FilterValues ignores case.

##### Share on other sites

Is there any room for improvement here?

Maybe. Why, where, when do you need this?

##### Share on other sites

Ahh -- interesting about FilterValues ignoring case.  Luckily in my case, the lists are composed only of numeric values!

Maybe. Why, where, when do you need this?

I'm not sure if I follow?  By "this" do you mean the script itself, or the reason why I need the script?

##### Share on other sites

By "this" do you mean the script itself, or the reason why I need the script?

The latter.

##### Share on other sites

The latter.

We generate a particular type of form for submission to the patent office, which lists various patents and patent applications.  The user is requested to enter in a list of everything to be on the form.  In practice, what I've found is that some users are copying and pasting various lists from emails received from clients, and when doing so, can introduce duplicates.  Therefore, before my script generates the form, it calls this script to remove any duplicates.

Basically, the master script does this:

1 clean up the list (removing any non-numeric characters from each entry)

2 call this script to remove any duplicates

3 verify the validity of each number on the list (this is pretty cool -- I look for a relevant Google patents page for the number; if it exists, then this means the number is a valid patent or patent application; if it does not exist, then this means the number is invalid), and remove any invalid numbers

4 download information for each number (again by going to Google)

5 parse the information and construct an xml file

6 control Adobe Acrobat Pro to open the form, import the xml file to populate the form, and then save it

The form sadly is a dynamic PDF form, so I can't use 360works' Scribe plug-in to populate it, but rather have to create an xml that has to be imported into Adobe Acrobat.  And Adobe Acrobat's AppleScript support is awful.

But I digress!

##### Share on other sites

If you still want feedback – here's a modified version. New steps/parts commented.

```Set Variable [ \$inputList; Value:Get(ScriptParameter) ]
# Set Variable [ \$outputlist; Value:"" // not needed; if undefined at a later point, it evaluates to empty anyway ]
# Set Variable [ \$num; Value:1 // not needed when using Let() in Exit Loop [] ]
Set Variable [ \$inputCount ; ValueCount ( \$inputlist ) // calculate once outside the loop ]
Loop
Exit Loop If [ Let ( \$i = \$i + 1 ; \$i > \$inputCount ) // traditional loop counter variable name ]
Set Variable [ \$currentInputItem ; GetValue ( \$inputList ; \$i ) // evaluate this expression only once ]
If [ IsEmpty ( FilterValues ( \$outputList ; \$currentInputItem ) ) // only check membership in output; # of input occurrences is immaterial ]
Set Variable [ \$outputList ; List ( \$outputList ; \$currentInputItem ) // use List() ]
End If
End Loop
Exit Script [ Result: \$outputList ]```
• 1
##### Share on other sites

I like it!!!  Thanks so much!  I tend to over-do things, and like the various optimizations here.

##### Share on other sites

The user is requested to enter in a list of everything to be on the form.  In practice, what I've found is that some users are copying and pasting various lists from emails received from clients, and when doing so, can introduce duplicates.

Best practice would eliminate the duplicates from being created in the first place by field validation.

Note also that a value list based on a field automatically removes any duplicates - and does it faster then any calculation or script you can devise yourself.

• 2
##### Share on other sites

Best practice would eliminate the duplicates from being created in the first place by field validation.

Note also that a value list based on a field automatically removes any duplicates - and does it faster then any calculation or script you can devise yourself.

This is sort of a placeholder solution for the timebeing. Basically, the user types a list of references (patent numbers), but what I've found is that there is generally a lot of copying and pasting be done. It's "on the list" to have each reference being in a separate field/record, but that is a battle to fight in the future . . . . .

##### Share on other sites

This topic is 2877 days old. Please don't post here. Open a new topic instead.

## Create an account or sign in to comment

You need to be a member in order to leave a comment

## Create an account

Sign up for a new account in our community. It's easy!

Register a new account