SQL and the Found Set, Part 2: RecordID List and Hyperlist
Background: ID Lists
I have one thing (at least) to follow up with on my ExecuteSQL using the Found Set post from a little bit back. This was to see if I could get the RecordID parsing optimized (or at least faster) using some of the techniques in Todd’s Hyperlist demo. Although this is not the slow part of the process–feeding the large set of values into the SQL IN clause is the slow part–it still did seem worth pursuing for two reasons: Firs,t to get my head around Hyperlist and the underlying theory that appending strings (lists) is a costly process and, second, there could be some practical value down the road in having a fast way of getting RecordIDs. The RecordID and Hyperlist method does (in theory) work on the Server, as it can save Snapshots to the Documents folder. The ExecuteSQL using the Found Set idea may also have some limited practical value in certain situations, although in 99% of the cases Hyperlist performs significantly faster. There may be a point where the overhead of adding a lot of additional columns to the process is lighter using the ExecuteSQL method, but just a loose theory. It does have, for me anyway, significant educational value. For gathering the RecordID List, we’re “scraping” an exported snapshot using InsertFromURL. This works great and is basically instantaneous. Unlike using a Web Viewer for the scrape, InsertFromURL can be called directly after the Snapshot is exported. For the web viewer, you need to Pause for it to refresh, and even then it doesn’t always work. I’ve banged on the InsertFromURL method a bunch with no Pauses and never a problem! Reading the RecordID is easy, but they’re in this format of mixing single values and range expressions:
41223 41228 41231-41332
and of course, we have no native way of interpreting that in our available tools. My first approach to this was to convert this format to something I could use in SQL:
WHERE ROWID IN ( 41223,41228 ) OR BETWEEN 41231 AND 41332
and that does work although it’s super slow depending on the size of the IN clause and number of BETWEEN clauses. Using this RecordIDList technique, I’m going for a straight list, so I can still use this for the SQL as a single IN, and who knows? Maybe it will work better than the combination of clauses. It also seems like something that may have some value outside of this SQL application.
Applying (and tweaking) Hyperlist
I think the most striking thing with how Hyperlist works is the large $expression variable that Todd builds to combine the record values into chunks of 100:
Substitute( "GetNthRecord( fieldName; rec ) & $Sep & GetNthRecord( fieldName; rec + 1 ) & $Sep & GetNthRecord( fieldName; rec + 2 ) & $Sep & GetNthRecord( fieldName; rec + 3 ) & $Sep & GetNthRecord( fieldName; rec + 4 ) & $Sep & etc. to 100.”
For those of us who feel so clever having “mastered” recursion and the difference between Tail and Stack, this seems strange. Well there might be a “more clever” way of doing this, the results clearly show us that there’s not a faster way. As my appreciation for this method grew, I started thinking of it like Jefferson’s Auto-Pen, which is quite an elegant tool when you look at it. My problem is slightly different. I need to create a list, not from a group of records, but from a range expression. What I used is actually quite similar to Hyperlist:
“$sc_c & $sc_Sep & $sc_c + 1 & $sc_Sep & $sc_c + 2 & $sc_Sep & $sc_c + 3 & $sc_Sep & etc. to 10.”
I only did chunks of 10, probably partly out of laziness, but also because it made more sense since the range expressions will almost always be less than 10, with the most common exception being a full normalized found set. Even then, 10 is still wicked fast compared to the appending method. I know that Todd did some testing and found that 100 is the sweet spot, but 10 will do in a pinch! The next part of Hyperlist is the combining of the lists of 100 you just created. Todd uses a very cool algorithm for this, that I imagine may have a name. It combines two strings into one, so in log2 iterations. I was so enamored of the Auto-Pen idea at this point that I thought it could be applied to the re-combining problem as well. I came up with this second $expression variable:
"$sc_subList [ $sc_c ] & $sc_subList [ $sc_c + 1 * $sc_v ] & $sc_subList [ $sc_c + 2 * $sc_v ] & $sc_subList [ $sc_c + 3 * $sc_v ] & etc. to 10.”
This would combine my lists 10 at a time, so just log10 iterations instead of log2. Of course, since you’re combining more lists per iteration, there could be no benefit at all, but I can tell you this is now “Hyper” level fast. I will try and come up with an apples to apples test on the combining methods…soon. However, I will say my primary reason for wanting to use this method is the chill that I imagine my Middle School teacher’s heart feels every time I talk about or use anything vaguely “logarithmic.”
<a href="http://seedcodenext.wordpress.com/2013/04/15/sql-and-the-found-set-part-2-recordid-list-and-hyperlist/">Source</a>
0 Comments
Recommended Comments
There are no comments to display.