googleSearch

Rob 7 Collins · June 21, 2005

[ EDIT: please see latest attachment post ]

I've been fretting over building a search mechanism that allows me to search every table in a tree of tables in the relationships graph. Finally, I've got that functionality. The attached file demonstrates it and allows easy alteration to work on any tree / results from any table.

I'm posting this not because it provides some superior functionality, but because it was so darn hard to figure out how to accomplish ... its a rather large set of layouts, tables, scripts, ... various components that work together to provide a simple means of generalizing the search functionality.

googlSearch.v2.zip

Lee Smith · June 21, 2005

Why is it asking for a "Password"?

Ender · June 21, 2005

Hi Rob,

A couple comments:

It would be helpful to have a it return to the search screen if there were no records found (maybe a message too.) It would also help to have a little more sample data and an example search that shows what's special about this technique.

Although I have never thought searches across all tables to be useful, I can image easier ways to do it. Perhaps you can give an overview and explanation of your method so we don't have to sift through the code. Clearly you have put a lot of effort into this, so there must be some reason for the complexity.

Rob 7 Collins · June 21, 2005

Why is it asking for a "Password"?

Sorry. It uses Get (AccountName) so it is set up user aware. "admin" has no password so if you come to the login screen, just change your username to that and hit enter.

Rob 7 Collins · June 21, 2005

It would be helpful to have a it return to the search screen if there were no records found (maybe a message too.) It would also help to have a little more sample data and an example search that shows what's special about this technique.

It would, yes! I'll put that on the to do list, beccause I'm halting once I run out of found records instead of calling an exiting script.

Although I have never thought searches across all tables to be useful, I can image easier ways to do it. Perhaps you can give an overview and explanation of your method so we don't have to sift through the code. Clearly you have put a lot of effort into this, so there must be some reason for the complexity.

about "useful": Imagine a tree like Resources->Volunteers, Volunteers->Sponsors, Volunteers->Associations. Volunteers, Sponsors, and Associations have multiple addresses so a supporting table for addresses are associated with them. A user of this system might know they need to find the Resources record for some John Doe and only partial address information ... but doesn't know if John Doe is a volunteer or a Sponsor. If you've only got to search a small set of tables like Volunteers and Sponsors, it's not so bad. But if you've got many of those, or many locations for addresses, or a large addresses table, your user is often times going to give up and say "can't find it" before searching. The database I'm building requires as a primary function this sort of search functionality.

Before I go on about the overview ... Queue, you're a really great developer, you're wisdom has informed me a lot while I've been here. Perhaps there is a better way. I want to know that way.

I spent about two weeks searching for, posting several threads asking for advice / solutions, but no one seemed to have one. So I did it myself.

An overview: The method is really simple. It builds multicriteria searches for each ANDed term, and it builds a multicritiria search for every ORed search together. They constrain the found set sequentially.

The search is performed on a single layout with just plain fields for each related field... it turns out that even if the layout won't show more than the first related record in that field, the find will still match against all those records.

You don't have to have a concatenation field in each table (which can be a very serious performance / size issue, FMP has a static memory allocation for databases). And if you want to use this functionality several times, on many trees, and/or with many results table, you only need to change the scriptparameter you start the search with, and build the search layout with all those fields.

I ran into a lot of problems with iteration and Find mode along the way ... several simpler ideas eventually failed because of Find mode (making data in other tables inaccessable or failing to constrain value lists) and the fact that you can't specify table fields dynamically in script (which IMHO should be considered a bug).

comment · June 21, 2005

Hi Rob,

Referring only to your example: had you put Volunteers, Sponsors and Associations all in a single Contacts table, you would only need to search for John Doe once. Dividing similar items into three tables results in having to look for the same needle in three stacks of hay. It is hard to imagine someone would search Resources for either a John Doe or a film projector - so perhaps a better example is called for.

Rob 7 Collins · June 21, 2005

[ EDIT: pardon my first write ... I was surprised by the film projectors and people. ]

Hi comment

It was a conceptual example

...it mostly comes into play with data mining. When you've got logical differences of data type for the target users, that don't need to apply from a data-abstraction point of view, across ranges of related datas.

Also, there's design efficiency. A person and a sponsor needn't be so different. You can have one table, with type fields as necessary to distinguish information. If you do that from the start, each time you go to access that data from the point of view of the user, who expects people and sponsors to be different, you've got to script your filter. You also end up with overly-large and cumbersome tables. With four layouts and maybe 12 filters its no big deal, but that's not a real database... that's a glorified spreadsheet. With numerous layouts and filters, with schema changes coming and going over time, it adds up to significant maintenance and performance requirements.

comment · June 21, 2005

[shrug] All I meant to say is it's difficult to imagine what this might be good for, and the example given is not helping.

Rob 7 Collins · June 21, 2005

[Ender, re complexity]

A lot of the complexity of the example is because of the database to which the functionality applies.

Large, varied trees; sessions-ready (transactions-ready, probably my next big project); and following naming protocols and well-documented (in layouts, there are layouts just for divisions, in tables there are tables just for divisions in the tables view and the relationships graph, in scripts there are scripts just to visually categorize the scripts).

All these make for a largish framework from which to start.

The materials themselves are just three, very simple tables, their layouts, a value list, many scripts the developer never need change, and one they do need to. Also, the layout on which to search, which the developer will need to change. Most of it justs needs to exist, doesn't require alteration (but the code isn't optimized, its worth looking at).

I'm attaching the first changes to the sample:

The help layout now lets the body of the help infos be accessable in browse mode (I'd kept it uneditable thinking the scrollbar would still work).
On no found results, the script now defaults to allerting the user with a dialog and returning them to the search layout.
The database automatically logs in as admin.

googlSearch.v2.zip

Rob 7 Collins · June 21, 2005

[shrug] All I meant to say is it's difficult to imagine what this might be good for, and the example given is not helping.

Please understand, the actual uses for this often deal with sensative information: I'm reticent to use a real example because, in discussion, I have to keep reviewing what I'm writing to ensure I'm allowed to send it. The fact that the places you might need this are complicated leads to my oversimplified examples.

So, I'll bite the bullet and give a real example that isn't entirely sensative. I have a group of users that make a periodical catalog. They keep track of providers, which offer programs, which have catalog entries. All catalog entry changes are kept track of: only one version of a catalog entry is selected for a given catalog, and reversion needs to be fully supported. Entry information and the program description are intimately related, but seperate datas. Programs, Providers, and Catalogs all have contact information (sundry supporting tables like addresses, etc). Programs and providers both have contacts. Catalog contact information has to be kept seperate from the normal supporting tables because of messy issues with layout in the final catalog.

When it's time to build a catalog, the users have to start a new catalog in the database, and add catalog entries to it. They have hundreds of such entries. They need to locate those entries by variable information they will remember from any of the providers, the programs, the catalog entries, the contacts, or the catalog entry contact information. Also, any time they need ot work with any of those datas, or send mailings to those contacts, or such, they need to find the proper record from information they might remember anywhere in this tree of tables.

Ender · June 21, 2005

I don't think it's important for us to know the specifics of the solution you designed this for. With databases, there's usually similar structures describing very different things.

A search is usually made on fields in the table where you wish to see the resulting records, but can also include related parent or child fields. It's not clear to me why you wouldn't simply search these related parent or child fields in addition to the current table. But then I still don't get your algorithm. What are your scripts doing?? (And don't start talking about "large, varied trees" again--I'm no gardener!) What's the algorithm? Can you put it in pseudocode?

Rob 7 Collins · June 21, 2005

A search is usually made on fields in the table where you wish to see the resulting records, but can also include related parent or child fields. It's not clear to me why you wouldn't simply search these related parent or child fields in addition to the current table.

The example shows the simplest case I could think of. There is a many-to-many map between Middle and Bottom records. From Middle records it is easy to create new bottom records and the necessary associations without the script. The script makes it easy to associate a Bottom record to an unassociated Middle record... from the bottom record you go to the search, locate some middle record, from criteria anywhere within the tree (the middle record of this other top record, etc), and then associate the resultant record to the bottom record.

I can see it is a little too simple to see the real usefulness of this method, but it does demonstrate a use of it. A more thorough result is anything you can dream up about data mining: You've some set of records, you want to draw analytical conclusions about the data that doesn't derive from the organization of the information. You run searches to derrive counts, and compare those counts to the totals of the results record. The searches span partial trees of information, so your totals tell you the degree of association between information is certain graphs of the database and a given table. Then you compare those records in the results table to the "omitted set" in fmp-speak, and decide what is significantly different about the records that showed up in your search results. That what data mining is.

But then I still don't get your algorithm. What are your scripts doing?? (And don't start talking about "large, varied trees" again--I'm no gardener!) What's the algorithm? Can you put it in pseudocode?

Ok, enough of the gardening. ... you know, with SQL, none of this would be necessary. With C#, the code would be ~80 lines. In fact, the AND and the OR branches could all be done in the same branch, with another variable. But I find ScripParameters tiring to use and try to minimize the use of them. Here's the pseudo-code:

in: varSearchLayout varResultsLayout



( tblAND , tblOR ) = parse_quotes ( ANDs , ORs )



if ( count ( tblAND ) ) # do AND search

  varFieldCount = count_fields ( varSearchLayout )

  foreach valuelist_on_tblAND AS criterion do

    ON varSearchLayout in find mode:

      foreach 1..varFieldCount AS incr do

        foreach incr..0 AS subincr do

          go to next field

        endfor #subincr

        set field ( criterion )

        new record

      endfor #incr

    constrain found set

  endfor #criterion

endif



if ( 

  count ( tblOR ) && 

  count ( found_set ( varSearchLayout ) ) ) # do OR search



  varFieldCount = count_fields ( varSearchLayout )

  foreach valuelist_on_tblAND AS criterion do

    ON varSearchLayout in find mode:

      foreach 1..varFieldCount AS incr do

        foreach incr..0 AS subincr do

          go to next field

        endfor #subincr

        set field ( criterion )

        new record

      endfor #incr

  endfor #criterion

  constrain found set

endif



custom_results = new script ( ) #defaults to handle zero results and point user to varResultsLayout

custom_results->run_once( varSearchLayout varResultsLayout )

Ender · June 21, 2005

Hmm, Sorry I asked. Pseudocode is supposed to make the algorithm easier to understand, like,

1. Do this

2. Set That

3. Loop through records until whatever

Yours looks like C or FORTRAN. There's a reason I didn't become a programmer.

Rob 7 Collins · June 21, 2005

Hmm, Sorry I asked.

: blush : That made me smile.

Maybe this is better:

get the layout to search and the layout to show results on.
remove the quotes from the search criterion fields, and put the AND criteria in the AND table and the OR criteria in the OR table.
if the AND table has records:
(first AND subscript) count the fields on the search table.
-- then go into find mode on the search layout
(second AND subscript) iterate the value list on the AND table.
(third AND subscript) for each field (from the first script), iterate through the layout with Go to Next Field until that field has focus. Then Set Record to the search criterion (from second subscript). Then create a new record.
(second subscript after each time the third has run) Constrain Found Set

The OR loop is similar, but it waits to execute the Constrain Found Set until after all the search criteria have been added to the multi-criteria find. Afterwards, I call a cleanup script to clean up the tables, and a results script for you to customize so you can do meaningful things with the search results ... in the example, I move the found set over to a different table occurance.

Ender · June 22, 2005

That's better.

I have a couple thoughts about you algorithm:

First, as long as you're doing all that work with the OR conditions, maybe there's a way to add a hit score. Such as: for each criteria that a record matches on, add a point, then rank the results by the total score for the matched record.

Next, you might consider relationships and a GTRR step for finding matching records. This would involve an exploded-key for each field that's being searched, but could improve overall performance. In this manner a Find is not actually done, but each condition of the search would still be added to your search table, and then the IDs of the matching related records would be retrieved and assembled.

Finally, I wonder if it would work to take the IDs of the found records, along with information about which table they were found in, and create records in a "Search Results" table, which could show information about the record that was found, what table it was from, and the hit score. I don't really like this idea, as adding and deleting records tends to be a slow operation, but maybe it will inspire you (hmm, maybe only show the top 10 hits??). A better solution would be to have everything in one table, with a simple search on a concatonated field.

Rob 7 Collins · June 22, 2005

First, as long as you're doing all that work with the OR conditions, maybe there's a way to add a hit score. Such as: for each criteria that a record matches on, add a point, then rank the results by the total score for the matched record.

Creating the hit score per table would be doable...but it would be fairly slow (first create records for each hit, then add the results per resultTable record). On the plus side, the sort is easy too.

I can think of another way from what I just described, that might be considerably faster: if all the data table's key fields are (n)*primes for top-level and unique (Nth prime results * otherprimes) composites for low-level, it can be arranged where simple comparative operations on the hit score list would reveal a "score" for the results table.

Next, you might consider relationships and a GTRR step for finding matching records. This would involve an exploded-key for each field that's being searched, but could improve overall performance. In this manner a Find is not actually done, but each condition of the search would still be added to your search table, and then the IDs of the matching related records would be retrieved and assembled.

I'm definitely intrigued. I am hazy about this...GTRR is match-sensative based on what? Please tell me more, I've never worked with explosives.

Finally, I wonder if it would work to take the IDs of the found records, along with information about which table they were found in, and create records in a "Search Results" table, which could show information about the record that was found, what table it was from, and the hit score. I don't really like this idea, as adding and deleting records tends to be a slow operation, but maybe it will inspire you (hmm, maybe only show the top 10 hits??). A better solution would be to have everything in one table, with a simple search on a concatonated field.

For my purposes I can't use concatenations. Too performance & footprint-heavy. But a working results table could have a million and one uses, like: I sell various things and keep track of that with this solution. I want to know the characteristic, not the product, with the greatest plurality in any (buyer's general) location. I configure these results to be expressive in two tables like the above, and run a search. Presto.

Sign In

googleSearch

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Important Information