Jump to content
Sign in to follow this  
mikedr

De-duplicating a list of values

Recommended Posts

I wrote a script that takes a list of values separated by carriage returns and returns the list without any duplicates.  So, if your list is:

 

apple

banana

pear

banana

apple

 

The script returns

 

apple

banana

pear

 

Here's the script:

Set Variable [ $inputlist; Value:Get(ScriptParameter) ]
Set Variable [ $outputlist; Value:"" ]
Set Variable [ $num; Value:1 ]
Loop
  If [ Let([numberinoriginal=ValueCount( FilterValues($inputlist; GetValue($inputlist;$num)));
           numberinnew=ValueCount( FilterValues($outputlist; GetValue($inputlist;$num)))];
       numberinoriginal=1 or
           (numberinoriginal>1 and numberinnew=0)) ]
    Set Variable [ $outputlist; Value:$outputlist & GetValue($inputlist;$num) & "¶" ] 
  End If
  Set Variable [ $num; Value:$num+1 ]
  Exit Loop If [ $num>ValueCount($inputlist) ] 
End Loop
Exit Script [ Result: $outputlist ]

Works great!  :laugh2:  But, in reading this forum, what I've realized is that I seem to do things inefficiently and/or less optimally.  Is there any room for improvement here?

 

Basically, what the script does is set the output list to null at first.  It then steps through each value of the input list.  If it finds that a value isn't a duplicate (i.e. -- the value is present just once in the input list, or if present multiple times in the input list has not yet been added to the output list), then the value is added to the output list.

 

I was going to do some recursive scripting trickery, but instead figured it would be easier to leverage FM's FilterValues and ValueCount functions.  So, "numberinoriginal" is the number of times the current value is in the input list (using these two functions), and similarly with "numberinnew". 

 

Share this post


Link to post
Share on other sites

Typically I do this with two custom functions CustomList and UniqueList  using unique list it allows you to also consider case sensitivity, as FilterValues ignores case.

Share this post


Link to post
Share on other sites
Is there any room for improvement here?

 

Maybe. Why, where, when do you need this?

Share this post


Link to post
Share on other sites

Ahh -- interesting about FilterValues ignoring case.  Luckily in my case, the lists are composed only of numeric values!


Maybe. Why, where, when do you need this?

 

I'm not sure if I follow?  By "this" do you mean the script itself, or the reason why I need the script?

Share this post


Link to post
Share on other sites

By "this" do you mean the script itself, or the reason why I need the script?

 

The latter.

Share this post


Link to post
Share on other sites

The latter.

 

We generate a particular type of form for submission to the patent office, which lists various patents and patent applications.  The user is requested to enter in a list of everything to be on the form.  In practice, what I've found is that some users are copying and pasting various lists from emails received from clients, and when doing so, can introduce duplicates.  Therefore, before my script generates the form, it calls this script to remove any duplicates.

 

Basically, the master script does this:

1 clean up the list (removing any non-numeric characters from each entry)

2 call this script to remove any duplicates

3 verify the validity of each number on the list (this is pretty cool -- I look for a relevant Google patents page for the number; if it exists, then this means the number is a valid patent or patent application; if it does not exist, then this means the number is invalid), and remove any invalid numbers

4 download information for each number (again by going to Google)

5 parse the information and construct an xml file

6 control Adobe Acrobat Pro to open the form, import the xml file to populate the form, and then save it

 

The form sadly is a dynamic PDF form, so I can't use 360works' Scribe plug-in to populate it, but rather have to create an xml that has to be imported into Adobe Acrobat.  And Adobe Acrobat's AppleScript support is awful.

 

But I digress!  :)

Share this post


Link to post
Share on other sites

If you still want feedback – here's a modified version. New steps/parts commented.

Set Variable [ $inputList; Value:Get(ScriptParameter) ]
# Set Variable [ $outputlist; Value:"" // not needed; if undefined at a later point, it evaluates to empty anyway ] 
# Set Variable [ $num; Value:1 // not needed when using Let() in Exit Loop [] ]
Set Variable [ $inputCount ; ValueCount ( $inputlist ) // calculate once outside the loop ] 
Loop
  Exit Loop If [ Let ( $i = $i + 1 ; $i > $inputCount ) // traditional loop counter variable name ]
  Set Variable [ $currentInputItem ; GetValue ( $inputList ; $i ) // evaluate this expression only once ] 
  If [ IsEmpty ( FilterValues ( $outputList ; $currentInputItem ) ) // only check membership in output; # of input occurrences is immaterial ]
    Set Variable [ $outputList ; List ( $outputList ; $currentInputItem ) // use List() ] 
  End If 
End Loop 
Exit Script [ Result: $outputList ]
  • Like 1

Share this post


Link to post
Share on other sites

I like it!!!  Thanks so much!  I tend to over-do things, and like the various optimizations here.

Share this post


Link to post
Share on other sites
The user is requested to enter in a list of everything to be on the form.  In practice, what I've found is that some users are copying and pasting various lists from emails received from clients, and when doing so, can introduce duplicates.

 

Best practice would eliminate the duplicates from being created in the first place by field validation.

 

 

Note also that a value list based on a field automatically removes any duplicates - and does it faster then any calculation or script you can devise yourself.

  • Like 2

Share this post


Link to post
Share on other sites

Best practice would eliminate the duplicates from being created in the first place by field validation.

 

 

Note also that a value list based on a field automatically removes any duplicates - and does it faster then any calculation or script you can devise yourself.

This is sort of a placeholder solution for the timebeing. Basically, the user types a list of references (patent numbers), but what I've found is that there is generally a lot of copying and pasting be done. It's "on the list" to have each reference being in a separate field/record, but that is a battle to fight in the future . . . . .

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

  • Who Viewed the Topic

    1 member has viewed this topic:
    grostete1 
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.