Creating a Top 10 list (take 2)

August 13, 200124 yr

Newbies

Ok, let's try this again.

I have a text field that is generated by grabbing website URL strings from a referenced file. This ensures that the contents of this field are entirely unknown beforehand.

What I want to do is create a Top 10 (or 20 or 30 etc.) list of the results from that field. I mean if 345 people visited www.yahoo.com, and that's the number one site, I want to create a report that says something like:

1. http://www.yahoo.com (345 hits)

2. http://www.lycos.com (211 hits)

3. http://fmforums.com/ubb/cgi-bin/ultimatebb.bgi (205 hits)

etc.

Help? Please? Thanks.

August 14, 200124 yr

How is FMP going to know how many hits each site has received?

August 15, 200124 yr

Let's assume that you are to import CLEAN urls - ie, short & sweet, www.yahoo.com, and you aren't going to try and PARSE the domain name form a full length URL (your question did not ask how to parse this - it can be done of course)

so, you import the text file, and lets say you have 300 records, and only one field per record, the URL text field.

Add another field (number) called COUNT

write a script that does this (plain english)

1) start at first record

2) copy URL field

3) enter find mode

4) paste the results of our previousd COPY into the URL field - perhaps into " " (quotes) to be literal if you need

5) perform find

6) use the CURRENTFOUDCOUNT status function now to SET THE COUNT FIELD

7) loop

now you have all 300 records with a COUNT field - and after running the script, each record will indicate the number of DUPLICATE urls in the database.

At this point, if you want, make another script to eliminate DUPLICATES and yo are left with ORIGNAL URL's and the NUMBER OF HITS that were fpund before you weeded out the DUPS in the CPUNT field.

have a nice day

August 15, 200124 yr

actually, you may want to eliminate DUPS on the fly or you will have to write a little more in the loop to keep track of what record you need to be at next after doing the find since it will toss off the sequence.

MAYBE THIS:

write a script that does this (plain english)

1) start at first record

2) START LOOP

3) copy URL field

4) enter find mode

5) paste the results of our previousd COPY into the URL field - perhaps into " " (quotes) to be literal if you need

6) perform find

7) use the CURRENTFOUNDCOUNT status function now to SET THE COUNT FIELD IN THE CURRENT RECORD IN THE SET

8) OMIT ONE RECORD (THE RECORD WE JUST SET THE COUNT FIELD)

9) DELETE FOUND SET (now you have just eliminated all but the ONE copy of the URL you looked for)

at this point, you are back to the full record set, and you need to figure out a way to sequence to the next unique URL and start over - You have only ONE copy now of that URL - and the COUNT field is set to how many were there before you nuked them

I will try and simulate this for you and provide the rest of the answer to cycle the script properly. I think using a script in this way is going to solve your problem.

It's hard for me to do this without an actual data set to play with -)

Kind of like trying to build Mr. Bill without playdough...

luckily it's only a 2-fielder so no big deal - I can whip one up to play around with!

August 15, 200124 yr

HERE YOU GO - TESTED AND READY

Here is a SAMPLE of the database - ALL READY TO DOWNLOAD AND USE!!

URLHIT Database

it is preloaded with 30 "urls" - hit the button and see what happens!

INSTRUCTIONS & HOW TO MAKE YOUR OWN

Make FIVE FIELDS:

1) url (text)

2) count (text)

3) urlLIST (text)

4) countLIST (number)

5) QUOTE (global)

SET THE GLOBAL QUOTE FIELD TO " ( a quotation mark - dont ask)

PREPARE THESE 2 SCRIPTS (exactly)

1) SCRIPT NAME: GO

Loop

Show All Records

Go To Record/Request [First]

If [url <> ""]

Copy [select entire contents]

Else

Perform Script ["cycle"] [sub-scripts]

Halt Script

End If

Enter Find Mode

Paste [select entire contents]

Perform Find

Set Field [count, quote & url & quote & "," & quote & Status(CurrentFoundCount) & quote ]

Copy [count] [select entire contents]

New Record/Request

Paste [urlLIST]

OmitRecord

Delete All Records [No dialog]

End Loop

2) SCRIPT NAME: CYCLE

Show All Records

Export Records [Filename: "temp.csv"; Export Order: urlLIST(Text)] [Restore export order, No dialog]

Delete All Records [No dialog]

Import Records [Filename: "temp.csv"; Import Order:urlLIST(Text),countLIST(Number)] [Restore import order, No dialog]

You must be familiar with the KEEP/REPLACE script option when buliding FMP scripts.

1) You need to prepare ONE TIME the export order for the script. The EXPORT needs to be a TAB formatted file to the name "temp.csv" - and you are ONLY exporting the urlLIST field.

2) You need to prepare ONE TIME the import order for the script. The IMPORT needs to be from the "temp.csv" file which will have TWO FIELDS - you need to import the domain name into the urlLIST field. You need to import the # of hits into the countLIST field. You may need to create manually a text file to do this, one time. It would look like this:

filename: temp.csv contents of file: one line that looks like this -

"field 1","field 2"

Import this ONE TIME - field 1 into urlLIST and field 2 into countLIST - and then UPDATE THE CYCLE SCRIPT to reflect these new import and export orders.

INSTRUCTIONS TO USE

3) Now you are ready for action! Import your URL file into a BLANK copy of the database - putting the url into the URL field

4) RUN THE SCRIPT

5) When it is finished - you are left with a clean, simple database that contains: A record for each UNIQUE URL you had. Each record contains: the URL name in the urlLIST field, and the number of hits in the countLIST field - exactly what you need to...

6) ...slice and dice this to your hearts content to make Top 10 Top 20 Top 1000, whatever - just sort and make a nice report layout.

7) BE SURE TO CLEAR THE DB BEFORE DOING ANOTHER SET !!

8) Anything is possible in FMP - ENJOY! tongue.gif" border="0

August 15, 200124 yr

You said tab delimited but them said to export to a "temp.csv" file. Isn't that a comma delimited format. Just a detail but it may cause some confusion later in the process.