Jump to content

PatternCount() vs. Position() revisited


Ender

This topic is 6460 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Since this discussion last year:

http://www.fmforums.com/forum/showtopic.php?tid/151574/

I've been under the impression that Position() is faster than PatternCount() for determining the presence of a string within a larger source text field. And intuitively this makes it makes sense that PatternCount() would need to check the entire source text field while Position() only needs to find one.

But when I tried setting up a test of this, I was unable to notice a difference between them.

In both FM6 and FM8, my test show equally fast evaluations of a Set Field[] that populates the Position or PatternCount of the search string within the text field.

In FM8, the text field can hold a huge amount of text. I tried my test with 50,000 words, 100,000 words, 200,000 words, and saw no difference in the time elapsed (about a second for each.)

In FM6, the limit for a text field is much smaller (I got about 11,000 words in there before it maxed out). In this test, the results for both Position() and PatternCount() were nearly instantaneous (less than a second.)

I'm guessing if we were to build custom functions that scan through the text to behave like the Position() and PatternCount() functions, the theoretical speed difference would come into play. But why not here? Are my tests flawed? Is the difference noticable in different versions or different OSs? Or are these functions somehow optimized with some internal search algorithm that makes them fast enough to work with large text fields?

:qwery:

Attached is my test file.

WordSearch.fp5.zip

Link to comment
Share on other sites

Hi Mike,

While on 7.v2, I ran a test between them but the test was unstructured. I used a field with 10,000 words. But I used a copy of my LineItems (500,000). The difference, if I recall, was approx 10 seconds. Not much really but 10 seconds is a long time to a User when they are waiting for system.

How many records did you use for your test?

Link to comment
Share on other sites

From LaRetta's description, it sounded like her tests showed differences when looping though a large record set. I don't know how often this sort of thing happens in a real solution (doesn't seem very efficient), but based on this I'm in the process of running these and other tests while looping through a large record set.

I think I see why comment hadn't posted this yet. There seem to be a lot of other things that have a greater effect on performance than merely whether the test is Position() or PatternCount(). Although it's turning into more of a can of worms than a half a cat.

I should have something more definitive in a day or two.

Link to comment
Share on other sites

There seem to be a lot of other things that have a greater effect on performance than merely whether the test is Position() or PatternCount().

So true, Mike. That is why the tests must be identical, each using a backup of the same file. I even reboot my system between the two tests. Because I've ran tests using the same open file and skewed the second results because of system resources or FM indexing on same file etc. The only observation I could make in my ONE test is that, the larger the record-set, the more obvious the difference between the two. ONE test does NOT a theory make but it SUGGESTS there is a difference, although slight. Since FM can't calculate nanonseconds, only large record-sets can display these differences in countable seconds.

Link to comment
Share on other sites

This topic is 6460 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.