Rules to follow to have good performance

June 5, 200323 yr

When a newly created solution is populated with just few records every operation is performed very quickly.

When the database grows in size, bad design could cause performances to drop.

What are the MAIN RULES to follow in order to build today a database that will run fast tomorrow too?

What are the FM's limits (apart nr. of connected users and nr. of hosted files) assuming good db design ?

Thank in advance.

Paolo.

June 5, 200323 yr

Hi Paolo,

This could become an interesting thread. Good!

One thing that really slows down performance is using a value list that uses values from a field. In our company database we (still) use this to select Contacts (Clients & Suppliers) in our LineItems file. Specially the Clients list, which has about 1500 entries, takes a long time to appear, mostly when the already selected client has a name that begins with a letter at the end of the alphabet.

Using the 'Insert from Index' works a lot faster but does not look very nice...

I'll see if I can come up with more ideas.

Best regards,

Ernst

June 5, 200323 yr

Much worse problem is calculated fields across records. That is so bad, that instead of storing just result users are waiting literally minutes in some cases.

Never, never, never calculate through database!

June 5, 200323 yr

Hi Paolo,

Ernst is right. This could become a really good thread.

Following Ernst comments, I was also using some value list to select the correct product within a 50,000+ products records.

I started with conditional value list. Then ended with a big filtered Multi-key portal. This incredibly decreased the times compared to what was needed with Value list.

Even more surprising. I'd like some of you to make the test if you have time...

Two layouts.

One with only conditional value lists

One with both conditional value lists and portals using exactly the same relationships as the value lists.

The value lists refresh really quickly when the portal is on layout !!!

Of course, there is no need for such a structure, but I still find it strange.

June 5, 200323 yr

The "rule" for optimal speed is: break the rules for normalized db design. I.e., you get much better performance when you duplicate data all over the place rather than relying on relationships. This means you have to be really careful to script all the data entry etc. so it all stays in sync.

June 5, 200323 yr

Regarding Anatoli's post:

Do you mean a scripted calculation across many records in a database? These can indeed be slooow i.m.o.

Calculated fields, on the other hand, can be fast -and remain fast when you have a lot of records- when you can store the calculation.

Unstored calculations can become terribly slow when the calculation uses many related records an/or fields. And then an increasing amount of records can slow down the operation of a database considerably.

Regarding Ugo's post:

A filtered portal is indeed a whole lot faster then a valuelist.

I found that a SORTED filtered portal can become slow when it has a lot of records to display. But this seems to be largely independent of the database size.

Tried the two layouts test with my parts database (5000 records, local) and did not find to much difference between the two; they were both fast.

But I can imagine that this will be different over the network and with 10 times as much records. I'll try that to when I have a chance.

Best regards,

Ernst

June 5, 200323 yr

Hi all,

Another related topic would be which to choose from lookup fields and related calculations....

I'd go for lookup and a kind of control of the re-lookup needed.

Ernst, I may have said that the conditional value list displayed 4 value lists before coming to the ended filtered values.

You can see the quite the "final" Portal solution in the Sample Section (Multi-key in action) with 50 records though. So you'll understand about the conditional being longer.

No network either in my case.

June 5, 200323 yr

Hi Hugo,

I'll check-out the Multi-Key in action tomorrow. Gonna take of my helmet now and have some sleep...

Regards,

Ernst.

June 6, 200323 yr

Author

Much worse problem is calculated fields across records. That is so bad, that instead of storing just result users are waiting literally minutes in some cases.

you mean "unstored calculated fields" ?

I'm having troubles with them, expecially with an unindexed search over a non-stored calculated field.

Now it takes few seconds over 700 records but I expect the database to grow up to 10000 records or more in the next 6-12 months. This would mean over one minute for the search.

I could work around this using a standard field instead of the calculated field and a set of scripts that update the standard field when tha parameters that lead to the final result changhes. But this is a very complicated solution to do something easy.

That's why I don't like Fitch's suggestion to break relational rules.

That's bad design and bad design, in my opinion, is always a bad choiche, even if it lead to better performance.

June 6, 200323 yr

Ciao Paolo,

You really decided to make this thread a "Must see"

At least I'd like it to be...

I'm having troubles with them, expecially with an unindexed search over a non-stored calculated field.
Now it takes few seconds over 700 records but I expect the database to grow up to 10000 records or more in the next 6-12 months. This would mean over one minute for the search.

A calculation might be unstored when it references some globals, relationships or some status functions.

There isn't any topic I could find about this, but in my opinion, it could be determinant to use different methods according to

- the real cause of the calc not being indexed

- the current operation performed.

Here are my "proper rules" when developping a solution. Not sure they could apply in any case, and I'd like some comment on these :

A. When the calc is unstored because it references a relationship :

1. When the calc is used for another relationship, use Ray's calc to index the calculation and maintain an update script apart.

2. When the calc is not used for another relationship but could be necessary for a find :

- create a unstored calculation with boolean result.

- use this calc as a sortkey for another relationship

- display the results of the find in a portal using the relationship

3. When the find is used before any print reports.

- Goto related record using the above relationship

- Omit all 0's from the boolean calc

B. When the calculation is unstored because it references a global :

1. If the global is a "constant", use Ray's calc

2. If the global is a variable, script an update or use same method above.

I hope this isn't too theorical, but I have the feeling that a "good" FM solution may depend upon the ability of the developer to use accurate balance between :

- updating scripts

- finds over unindexed records

- relationship sorted by an unstored calc.

- indexing lookups (Ray's method)

That's why I don't like Fitch's suggestion to break relational rules.
That's bad design and bad design, in my opinion, is always a bad choiche, even if it lead to better performance.

There are little chances that a solution would work 100% relational anyway. You still would have some unindexed calcs or keys

Besides, it depends of what we'd call "breaking relationship rules". Sometimes, you'd just switch a key to an indexed one, you'd use repeating fields to store some calc results just to quicken other calcs....but there still is a relational design. The "key" is there, but we won't use it.

In some cases, you just won't have the choice unless you're OK with your user taking a big coffee pause while waiting for its search to be performed.

Knowing Fitch, he''ll surely add something here...

June 6, 200323 yr

Author

thank you Ugo!

but what's ray's calc?

btw

my calc can't be stored because reference a related field.

the reference is 1-n. Records A are related to records B.

1A - nB

the calculated field is on B, referenced is in A

I think I need to replicate the ref field of A in B (and break rules -) )

I would use a "Replace" script step to do the replication every time the field in A changes.

Is there a better way to do that replacing?

June 6, 200323 yr

Ciao Paolo,

Well, I was sure you knew that. Good for new posters browsing though.

Ray's calc ----> Articles section How to index not indexable fields.

Given a unstored text calculation "c_ustored"

3 additional fields :

serial number (auto-enterered)

Modification Time (auto-entered)

c_Lookup key = (ModTime*0) + serial number

1 relationship :

Lookup Relationship

Left side : c_LookupKey

Right side : serial number

the last field, your indexed field :

t_yourindexedcalc is a lookup of "c_ustored" using the Lookup Relationship.

Now, anytime something is changed to the record, the lookup is triggered automatically (limited though to the record you're browsing).

So the t_yourindexedcalc would be used for the key relationship.

I cannot say how this could work in your case. Seems to me it may suits your need.

Could you extend on the calc a little even if this wasn't the Main issue of this thread ?

June 6, 200323 yr

Bad design is a bad choice, I'm with you there. But better performance = bad design?... well, that's just a matter of priorities. If your number one priority is good performance, then good design means optimizing that any way you can. And if that means redundant data, then that's what you do. It's common practice, with accounting systems in particular.

Whatever works = good design.

Whatever works better = better design.

Not to mention, users don't give a f--k about proper relational design. They just want something that works and doesn't make them feel stupid. So we may be talking about two different things. What I care about is the user experience. Anyone who is interested in that should go straight here:

http://www.joelonsoftware.com/uibook/chapters/fog0000000057.html

June 6, 200323 yr

Hmmm. Discussion is going in lot of different directions...

Regarding Fitches last post: I think that improving performance by having multiple instances of the same data (or doing other, theoretically forbidden, things with the data) can improve the user experience. And that's indeed what counts.

But one should be sure that it does not results in increased 'buggy' behavior, because that really puts users of.

So if one has to use 'bad design' tricks to get better performance, one should also have the discipline to put time in developing check-routines that can be run periodically or when found necessary.

That's where I usually fail by the way, because that's the boring bit of developing, but luckily my databases don't fly airplanes...

Regards,

Ernst.

June 7, 200323 yr

you mean "unstored calculated fields" ?

I'm having troubles with them, expecially with an unindexed search over a non-stored calculated field.

Now it takes few seconds over 700 records but I expect the database to grow up to 10000 records or more in the next 6-12 months. This would mean over one minute for the search.

I could work around this using a standard field instead of the calculated field and a set of scripts that update the standard field when tha parameters that lead to the final result changhes. But this is a very complicated solution to do something easy.

That's why I don't like Fitch's suggestion to break relational rules.

That's bad design and bad design, in my opinion, is always a bad choiche, even if it lead to better performance.

I believe in only one model of database.

The rules are simple:

Database is container for data.

None of the calculations is done through records.

There is no unindexed field where the search can be made.

The values are always and without and exemption stored only in single place.

I am sorry guys, but your "improvements" and twisting the rules are leading to more and more problems in future.

I could work around this using a standard field instead of the calculated field and a set of scripts that update the standard field when tha parameters that lead to the final result changhes. But this is a very complicated solution to do something easy.

So what? Are you complaining, that you have to program? Are you programmer?

The "rule" for optimal speed is: break the rules for normalized db design.

What if your databases will handle couple of million records? What if you will "break the rules" in 100 cases? How you will manage this? How will the next person in line manage?

All my clients tried another FM "programmers".

My programs are running in every day scenario fast. Nothing should take more than 1 second.

The other programmers don't have rules, and have more pleasant graphics on layout. Their performance sucks in BIG way. Period.

My finished solutions are working with zero maintenance.

Others are called almost every day to do this and that, optimize that and repair this.

I wonder why?

June 8, 200323 yr

Hello Anatoli,

Can you please explain what you mean with:

>>None of the calculations is done through records.<<

Thanks and regards,

Ernst

June 8, 200323 yr

Hi Ernst,

My guess here is that Anatoli is avoiding any calculation based upon relationships.

I find this reasonable even if not as dynamic as a relationship, but it surely speeds up the calcs and finds can be performed quickly, even using relationship as each are indexed.

Say for example you're maintaining an Inventory.

You can

- have a field in the Poduct db which is a calculation based upon the Sum(LineItemPO::Product_ID) - Sum(LineItemInvoice::Product_ID)

- a script that sets the value of this field to ---> Sum(LineItemPO::Product_ID) - Sum(LineItemInvoice::Product_ID)

I personnaly used the first method at first, then turned to second with a lot of "satisfaction". But I must admit I still have a bunch of related calculations across records in my db.

I'd like Anatoli to confirm my guess, and 2 assumptions I have about this :

1. To keep somehow your db as dynamic, you'll surely need a bunch of plug-ins for some event-scripts.

2. Scripting forces a lot of refresh steps and bounces the design (that's my problem actually)

June 8, 200323 yr

Hi Ugo,

I've been using method 1 in a similar situation on databases with roughly 5.000 - 50.000 records on a network with 8 users and the results remained usable.

The financial database that's used in the same network and had to calculate all sort of running totals became slow after a few months of adding 'booking' records. So we had to fall back to method 2 for that one, calculating some sort of 'grand totals' from time to time.

The main drawbacks of method 2 are i.m.o.

- that it's not automatic

- if you do this for a selection of records at once (i.e. in a script) on a network you have a chance that other users are modifying the same records. With filemaker server you can trigger a script on a moment when hopefully nobody else is using the database.

About the 'event plugins'. I've been using plugins in the past, but stay away from them nowadays, even if this means that solutions are not as streamlined as they could be. But that's another discussion...

I'm also curious to see what Anatoli actally was meaning.

And by the way, what do YOU mean with:

>>Scripting forces a lot of refresh steps and bounces the design<<?

Best regards,

Ernst.

June 8, 200323 yr

Hi Ernst,

Scripting forces a lot of refresh steps and bounces the design<<?

As you said, using a script is not as automatic as a relationship, and this has some effects on the Interface when triggering scripts.

Thanks for your imput about your method even if this was only an example.

By the way, method 2 could have been more straightforward with a script that sets the new "inventory level" to "inventory level - global" or "inventory level + global" where the global obviously stored the new PO or Invoice quantity.

Am I wrong saying that setting the new "inventory level" by a "calculation from relationship within the script" would avoid the issue you pointed out ?

June 8, 200323 yr

It is very very simple. What we need from stock database?

a) We must trace all transactions and transactions must be accountable and transparent.

Multiusers shouldn't affect precision

c) users MUST get stock level in split second

You can get stock level with cross records calculations like:

Start: 0

Rec. 10

Sold 5

Rec. 5

Sold 5

Result on hand = 5

If you have more than 50k records, everyone is waiting for on hand value.

Correct way:

Start: 0 -- on hand 0 (static value in product card)

Rec. 10 -- on hand 10 (static value in product card)

Sold 5 -- on hand 5 (static value in product card)

Rec. 5 -- on hand 10 (static value in product card)

Sold 5 -- on hand 5 (static value in product card)

There is only minor problem with that. Full transaction has steps:

Write new record and populate it with value.

LOCK product card.

Add or subtract product card with transaction value.

RELEASE product card.

The Lock and Release must be programmed and maybe also checked like: if starting value = value now, then add or subtract transaction and release lock.

This way I've programmed everything in my first Pascal database, dBase II and now in FileMaker.

June 8, 200323 yr

Bad design is a bad choice, I'm with you there. But better performance = bad design?... well, that's just a matter of priorities. If your number one priority is good performance, then good design means optimizing that any way you can. And if that means redundant data, then that's what you do. It's common practice, with accounting systems in particular.

Whatever works = good design.

Whatever works better = better design.

Not to mention, users don't give a f--k about proper relational design. They just want something that works and doesn't make them feel stupid. So we may be talking about two different things. What I care about is the user experience. Anyone who is interested in that should go straight here:

Good design = best performance = usually more thinking and more work.

Redundant data = bad design, horrible maintenance and in future unmanageable system...

Users like best performance and system without maintenance.

June 9, 200323 yr

Author

Anatoli, I like very much your "model" of database and I agree with it:

1>Database is container for data.

2>None of the calculations is done through records.

3>There is no unindexed field where the search can be made.

4>The values are always and without and exemption stored only in single place.

Expecially 1 and 4.

I can break 2 and 3 with the condition that the operation is not "interactive" but "batch" (without the user having to wait it to complete in front of the screen).

Whan I say that "I dont like easy things accomplished trought complex solutions" I mean that I always look for the most straight forward (and rule-complaining) solution to every problem.

Because this way it is more likely that my solutions will be affordable, readable, reausable and bug free.

When I started this thread I was wondering if there is a way to workaround the FM limitation of storing some calc fields. If it's not possible, then I'll have to program a little more...

Some one said something that sounded like this:

"when I develop a theory I never think about beauty. but when I finish, if it's not beautiful I know it's not a good theory"

I think this apply to programming too

(btw, taking your stock example: when you store on hand value [because everyone is waiting for on hand value] you are breaking rule nr.4, the sum of A+B+C is a replication of part of the data in A, B and C)

June 9, 200323 yr

RE: you are breaking rule nr.4, the sum of A+B+C is a replication of part of the data in A, B and C)

I don't think so. It is just stored result in another database. It will not change if someone changed value "B" and that can be used for integrity check. And it is handled by procedure.

What I am against, is to put the some values in more databases just to gain speed as was mentioned. That will be not consistent, because when updated, values during update may change, or it will be longer process with difficult locking.

June 10, 200323 yr

"What I am against, is to put the some values in more databases just to gain speed..."

If speed is the priority, why not? It requires the same care as your example where your sum is not a calculated field, i.e. you don't allow changing the factors of that sum except via script.

I agree that storing values in multiple locations is poor design in principle. But in practice, it is sometimes useful or even necessary.

June 11, 200323 yr

If I know your code and structure, then I can comment

IMHO -- speed is always priority, but to scatter data all over the place will not provide much gain.

How you will provide track of what is where and what should be updated and when?

I really cannot imagine scenario where it will help.

August 22, 200421 yr

I'd say the #1 rule is to identify the hot-spots in a well-designed DB that's performing poorly.

Are there any tools for profiling a FM database?

August 22, 200421 yr

I have a long-running FM loader script that runs pretty slowly. According to MenuMeters and the built-in Activity Monitor, the CPU is only 33% to 50% active, there's no network activity, there's no paging, and there's very little disk I/O (less than 1/sec).

Any ideas why some physical resource doesn't appear to be maxed out?

February 3, 200521 yr

Some of this comes down to the difference between a transactional database system and a business intelligence/knowledge system. They have terribly different requirements.

I more often see FMP used as the former than the latter. Indeed, it's used primarily as a RAD (rapid application development) tool to create data-driven applications. (Esp. for small workgroups.) In this sort of scenario, speed IS a priority.

However, for lookups/searching and knowledge stuff, speed becomes considerably less important. If I'm generating a large report, or a list run for mailings, or anything like that; I can put up with a delay on creating that report.

Many enterprises create data warehouses with flattened AND normalized versions of their data so that they are both easy to search and easy to report on. However, these systems are rarely used to CHANGE data, just to extract it. Lacking any sort of replication tool or equivalent of SQL Server's DTS packages, FIlemaker is ill suited to populating a data warehouse. (ODBC notwithstanding.) However, if your system needs to be fast for some users and rich with data and searchability for others, there may be some benefit to a sort of export/import routine with a separate database that exists solely for seaching.

In THAT database, you can render all your calculations and hard fields, whereas in the live system, there's obvious benefits to keeping them as calculations.

As for searching, I usually try to drive searches by a script, so then I can control the interface for searching. I then include a two-tabbed view (switches layouts when the tab is pressed), one of which contains basic search criteria, and the other contains advanced search criteria, including unstored calc fields, so that when a user goes there, they KNOW their search will be more time consuming.

February 14, 200521 yr

A method I use for creating fully indexed results of calculations involving globals, relationships, etc, is to store the result of the calculation in a static text field which self-updates using auto-enter with 'do not replace' unchecked:

staticField auto-enter calc = Evaluate ( "calcField" ; calcField )