Jump to content

Startup Restoration White Paper


This topic is 1638 days old. Please don't post here. Open a new topic instead.

Recommended Posts

  • 3 months later...

I just upgraded several servers to FMS 18, and I decided to disable this feature.

 

My reasoning:

  • I have not had a single one of my database servers crash in over 24 months.   Startup restoration feels like a solution to a problem I'm not having.
  • There are warnings about startup restoration causing performance hits
  • It's a new feature, which always opens the possibility of there being some sort of subtle bugs

 

Everyone's needs are different of course.  But I would argue that if you are are having any server crashes whatsoever, you really should address that problem first.  Do you need a better UPS?  Is your server hardware flakey?  Should you use a different OS?

 

 

On second thought:  is "transaction logging" the same thing as "startup restoration"?   If so, then this discussion https://community.filemaker.com/en/s/question/0D50H00006tgeVdSAI/fms18-any-definitive-documentationevidence-of-multicore-use  suggests that turnning TL off might actually hurt performance.  I'm a little confused.  

Link to comment
Share on other sites

A UPS will only protect you so long. And they have a finite life, including their batteries. Hardware doesn't give you notice when it's going to fail. A different OS can't prevent a power failure, or UPS or hardware failure.

You make backups don't you? That's a form of insurance. So is startup recovery. You can use your backups to do the recovery, but that means you might have to re-enter data since your last backup. With startup restoration that is reduced to maybe the last few minutes.

Just because you haven't had any issues for two years doesn't mean this is going to stay that way. As hardware ages it becomes more prone to failure. So as time progresses, you're more likely to encounter failures. And when it does, you're thankful for the backups. And ifit happens in the middle of the day, people will get grumpy if they have to reenter data since the last backup. If startup recovery reduces that to just the last few minutes, you'll be a hero.

Link to comment
Share on other sites

On 9/21/2019 at 10:40 AM, xochi said:

On second thought:  is "transaction logging" the same thing as "startup restoration"?   If so, then this discussion https://community.filemaker.com/en/s/question/0D50H00006tgeVdSAI/fms18-any-definitive-documentationevidence-of-multicore-use  suggests that turnning TL off might actually hurt performance.  I'm a little confused.  

It is not the same thing, but both features ('data restoration' and 'better parallel processing') are tied together.  If you disable Data Restoration, you forgo the 'better parallel processing' that was introduced with FMS18 and your server will behave like FMS17 when it comes to performance.

 

 

Link to comment
Share on other sites

 

@WimDecorte: 

Just to be absolutely clear - you used the term "Data Restoration" but I'm talking about "Startup Restoration" as descirbed here:

https://fmhelp.filemaker.com/help/18/fms/en/index.html#page/FMS_Help/hostdb-startup-restoration.html

 

Are you saying that if I run the command

fmsadmin set serverprefs StartupRestorationEnabled=false

 

then I may lose some benefits of parallel processing in FMS18?

 

It's strange the Claris mentions the performance penalties, but not the possible benefits.

 

 

 

 

On 9/21/2019 at 3:52 PM, OlgerDiekstra said:

A UPS will only protect you so long. And they have a finite life, including their batteries.

Our UPS has 36 hours of battery life and 10 year lifespan batteries.  It's not really a limiting factor.

On 9/21/2019 at 3:52 PM, OlgerDiekstra said:

You make backups don't you? That's a form of insurance. So is startup recovery. You can use your backups to do the recovery, but that means you might have to re-enter data since your last backup. With startup restoration that is reduced to maybe the last few minutes.

Of course - we have multiple layers (local, cold onsite, and cloud offsite). We have progressive backups enabled.  Progressive backups (on the several FMS servers I checked) are backing up our database every 5 minutes.    So at worst we'd lose 5 minutes of data.  

I'm just not seeing the value proposition here for Startup Restoration.  It sounds like it's only going to be useful in one particular circumstance, when you have a software or hardware crash which takes down FileMaker Server, but does NOT corrupt the startup restoration log, and the server is in incredibaly high use, so that 5 minutes of lost data entry is not acceptable, and you feel comfortable letting your server reboot automtically after a crash without human intervention.

 

I can see how this may be useful for some situations, but this also assumes that there are no new risks in doing this. What if there's a bug in Startup Restoration which corrupts your database silently?   That would not be good.

 

 

 

 

 

 

Edited by xochi
the forum auto merged two posts which did not make sense.
Link to comment
Share on other sites

That is correct: when you run that command you also disable the better parallel processing.

And yes: Data Restoration, Startup Restoration and Transaction Logging all refer to the same feature where FMS keeps a running log of all changes, to be applied to the files when there is a crash.

The problem with the better parallel processing is that it is hard to make any kind of blanket statements about its effects.  That link to community.filemaker.com demonstrates it very well.

There are way too many variables involved to predict the outcome without in-depth knowledge of how the solution is designed, what the load on it is and what kind of resources are available in the server machine.

Link to comment
Share on other sites

14 hours ago, xochi said:

Our UPS has 36 hours of battery life and 10 year lifespan batteries.  It's not really a limiting factor.

36 hrs when all systems are full throttle or idling? 10 year lifespan would be 10 year warranty right? Last time I checked you'll get your batteries replaced if they fail before that. They won't refund lost data or time because of a battery failure. And while they are rated for 10 years, that doesn't mean they can't fail before then.

A few years ago we had 2 55TB NAS's each with 2 shelves and 22 3TB disks in them. Purchased new a year and a half prior. RAID 6. For no apparent reason, both systems, though physically separated, developed the same symptoms. One after another, the year and a half disks (Seagate Constellations) started failing. Within a week or so I was scrambling trying to get data off it. Both at almost the same time.

Two near new systems, which were supposed to last many more years.

As it turned out the cause was (most likely) a flooding event in Tawain (or Thailand) at the Seagate factory in 2011. Seagate opted to sell disks to customers of inferior quality.

Freak incident, for sure, but never assume it can't happen. Just because something is rated for something, doesn't mean nothing will ever happen.

Just sayin.

Link to comment
Share on other sites

19 hours ago, xochi said:

 

I'm just not seeing the value proposition here for Startup Restoration.

 

Think of it from the other angle: is there value for your solution and your deployment in the Better Parallel Processing?

If there is then there is a theoretical downside to it through an increased chance of something going wrong on one of the threads (more threads on more cores than FMS17), which Claris mitigates through the Transaction Logging.

So the core feature to consider here is not the Transaction Logging / Startup Restoration but the Better Parallel Processing.

Link to comment
Share on other sites

3 hours ago, Wim Decorte said:

Think of it from the other angle: is there value for your solution and your deployment in the Better Parallel Processing?

 

Definitely - which is weird that there doesn't seem to be any official statements about this - the only Claris Inc documentation I can find talks about performance problems, not benefits.   Are there any official docs about this?

Link to comment
Share on other sites

No; for the reason stated earlier. The thread from community.filemaker.com that is mentioned earlier is a prime example.  For the better parallel processing to even have a chance of kicking in there has to be:

- spare processing capacity to be used.  The mac mini used in the thread simply didn't have an spare processing available

- enough load on the server, see the graphs I posted in that thread.  A single user or a handful of users may not be enough for the parallel processing to kick in to a degree that is noticeable

- the solution's design and nature needs to be such that there would tax the server in the right ways.

 

Because of this, Claris - or anybody else - cannot claim that better parallel processing will actually have a performance impact. It's all about not setting unrealistic expectations, which is what the guy in the community thread started off with.

Link to comment
Share on other sites

A. We guarantee better performance

B. Startup Restoration may hurt performance becuase it adds extra transactional logging.

C. Startup Restoration might help performance because it enables parallel processing features.

 

A: I'm not asking for this, please don't misunderstand.

B: is what Claris provides currently.  

C seems to be missing from the official documentation. 

 

 

Edited by xochi
Link to comment
Share on other sites

Each version of FMS throughout the years has worked on providing better performance.  FMS18 is no different and for this version it comes in the form of better parallel processing.  Almost never does FMI/Claris tout the potential increase in performance.  This is no different.  And the reasons are always the same: too many variables to make any kind of claim.

The performance feature is enabled by default and it comes bundled with transaction logging. So your C is not an accurate statement. Startup Restoration is never going to help performance.  It is always going to add extra load to your server compared to older versions.  BPP may help performance and it is BPP that triggers TL, not the other way around.

 

BPP will only have an effect if there is spare capacity over and beyond the extra load added by TL.  A solution for instance that does not add or modify a lot of data but does a lot of finds and reads, will not add much in the way of TL load and will likely have a better chance of seeing performance improvements provided that there are enough users doing it at the same time on a server that has enough processors and cores to spread the load across.

Link to comment
Share on other sites

27 minutes ago, Wim Decorte said:

So your C is not an accurate statement.

Wim, I'm not sure we are communicating.  Let me try again so that we are crystal clear:

 

In FMS18, we have 3 "features":   

SR - startup restoration

TL - transaction logging

BPP - better parallel processing

 

By default, SR is ON, TL is ON, and BPP is ON.  

The only control we users have is to enable or disable SR (we can't control TL or BPP individually).   The three are linked:  Disabling SR also disables TL and BPP.

 

Claris Inc documentation notes that leaving SR on may hurt performance, and suggests turning it off - here:

 

Quote

 

Disabling startup restoration 

Startup restoration is enabled by default. However, the process of creating the restoration log can impact performance. If performance is a concern, you can disable startup restoration using the CLI command:

 

https://fmhelp.filemaker.com/help/18/fms/en/index.html#page/FMS_Help/hostdb-startup-restoration.html

 

 

As noted at the top of this thread, BPP can in some cases give worse performance, but in some cases better performance, sometimes dramatically so: see https://community.filemaker.com/en/s/question/0D50H00006tgeVdSAI/fms18-any-definitive-documentationevidence-of-multicore-use  which shows a 15x speedup in one situation!

 

So, my statement:

Quote

C. Startup Restoration might help performance because it enables parallel processing features

 

Is true, and my point remains: Official Claris documentation says that disabling SR might help performance, but makes no mention of the fact that disabling SR might cause a dramatic slowdown in some situations.  

 

My opinion is that this needs better documentation.

 

 

 

Edited by xochi
Link to comment
Share on other sites

1 minute ago, xochi said:

 

In FMS18, we have 3 "features":   

SR - startup restoration

TL - transaction logging

BPP - better parallel processing

 

No, SR and TL are the same thing, they are not separate features.  SR works by keeping a log.

 

Other than that:  not trying to change your opinion, only trying to provide nuance / context / perspective.  I'll leave it at that.

Link to comment
Share on other sites

Ok, I think we are in agreement here on the key facts.  😄

"Transaction Logging" is probably a better term to use because it conveys more information and is an established term of art: https://en.wikipedia.org/wiki/Transaction_log  

Although Claris does point out that their version of TL is not ACID compliant.

 

Link to comment
Share on other sites

  • 5 weeks later...

Update: after enabling startup restoration (transaction logging), we found that several of our long-running batch processes were about 50% to 75% slower - an operation which used to take about 100 minutes was now taking 170 minutes.  We didn't notice any performance improvments, but our usage scenario doesn't typically have multiple users doing big Finds, so not clear we would have seen improvements anyway.

On balance for us, this is a net loss, so we are going to turn it back off.

FYI, regarding my concerns that Startup Restoration could possibly introduce new bugs, here's one example (yet to be verified) which is worrying:   https://community.filemaker.com/en/s/question/0D50H00007F4JmMSAV/deadlockedendless-queries-started-by-disconnected-users-on-fms-18v2-with-startup-restoration-enabled

Edited by xochi
Link to comment
Share on other sites

2012 Mac Mini 4/8 core i7 with 16GB RAM, internal SSD.  The process showing the major slowdown is in a table where we have one field which triggers auto-enter calcs in other fields, causing many other fields to update in that table.  Many of these fields pull data from related records across somewhat complex relationships that may involve 2 or more keys that are sorted.   

Basically, it's a combination of "read lots of data" and "write lots of data".     

I would presume the "write lots of data" is where the slowdown is happening, but that's just a guess.

Edited by xochi
clarity
Link to comment
Share on other sites

That's exactly where it happens; data writes are more expensive with Transaction Logging off and since it is a single user (server-side schedule I'm assuming) there limited benefit in the parallel processing.  So that's where the net loss comes from in your scenario.

Link to comment
Share on other sites

This topic is 1638 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.