Steven H. Blackwell Posted November 16, 2010 Posted November 16, 2010 Increasingly, professional FileMaker developers as well as IT administrators with responsibility for FileMaker Server are encountering some interesting issues with file sizes and backup schedules. Prudent organizations who run frequent backups–with or without file verification—and who also have large file sets running into the hundreds of megabytes, or even larger, are experiencing delays and the infamous “coffee cup” on their networks when FileMaker Server executes a backup schedule. Some developers and administrators have tried staggering the backups by selecting only some files for one backup and then selecting other files for the next one. While this reduces the size of a specific backup, it is not an ideal situation. Indeed, it seems to result in constant and on-going delays rather than less frequent but more pronounced ones. Additionally, while there is always a possibility for the entire set of files to be out of synchronization with itself if the backup size is large, that prospect would seem to increase with staggered backups. Still others have attempted to employ RAM disks as the target for backups from FileMaker Server. This procedure too strikes me as risky; and it would appear to require running FileMaker Server with a user logged into the server, something not considered as a best practice. So, I am wondering what approaches professional FileMaker developers and IT Administrators have employed to manage these issues. What are any of you doing to mitigate any issues you have encountered about the impact of a rigorous and disciplined backup policy on database performance or network performance? Steven
Paul de Halle Posted November 16, 2010 Posted November 16, 2010 Hi Steven We've definitely encountered this issue on both Macintosh and WIndows servers once the files sizes get around a few gigabytes. One client in particular has a file that is about 2Gb in a single file, multi-table solution (no images). This takes around 3-4 minutes even on a fast server, with fast drives, to backup. In itself this doesn’t sound to long, however the users are constantly on the phone to their clients and need zero downtime - they don’t want to have to stop mid-sentence and say hang on 4 minutes whilst are system backs-up. For the 50+ users, working an 8 hour day, that’s over 26 hours of downtime per day. It’s amazing how it adds up. So we came up with a solution utilising a plug-in that we were already looking at for different purposes. fmDataGuard. Originally we were planning to implement this purely as a security measure, to track each data change made by users and offer the ability to roll-back if needed. But this is where we thought of the idea of using it to offer instant backups, no matter how large their file size got. Traditionally with fmDataGuard, you create a second file (actually comes with it) and that is where your audit data is stored, so in the event of your primary file being corrupted you still have your data in your audit-log file. What we designed was an audit-log archive file that matched the primary audit-log file. Now what we do is every night the primary audit-log dumps all it’s data into the audit-log archive and then deletes all records, once validation has checked the data has transferred ok. This is run on an auto FileMaker Server script. FM Server then backs-up all three files, which can take as long as it likes as this is the middle of the night, when no users on accessing the system. Then through the day we backup ONLY the primary audit-log file, which only has data in for that day and never really grows beyond around 20Mb, which FM Server backs-up instantly with zero downtime to the user. Now in the event of the main file being corrupted, we restore last nights backup of that file, along with the last audit-log backup, say form around 30 minutes ago, and run a roll-forward, which is a built in command of fmDataGuard - one line of code. So the users get their data backed up as often as you like with zero downtime, no mater how large the file gets. And with offering them very little if any data loss. We’ve now incorporated this method into several of our larger clients systems and it works very well. Would love to hear other peoples thoughts on this subject and thanks Steven for posing the questions. Cheers Paul
HOnza Posted November 16, 2010 Posted November 16, 2010 We have the same issue with backups, having the primary data file of our own IS around 1 GB. Unfortunately, we cannot use the audit-log approach mainly because our IS historically contains a lot of backend scripts with many record commits, so applying fmDataGuard to all tables would slow the whole system down too much. We originally backed up our system every 10 minutes, which was unacceptable with every backup taking about 2 minutes. Now we back up the whole system once an hour and only check consistency of backups twice a day. We were considering switching the server storage to SSD, or even run the whole system including backups from RAMdisk and use a separate low-priority background process to copy the backups to hard disk. The latter solution would actually have also the benefit of speeding up the whole solution, but as far as I know it is not possible to create a RAMdisk larger than 2GB, which is a bit limiting. I don't know why RAMdisk would require user to be logged in. I am sure that at least on MacOS X Server the BSD subsystem should allow creating and mounting a RAMdisk to a server process. Then there is one alternative which was demoed at DevCon - eConnectix (http://econnectix.com/products/xpd/xfm/index.html). Not a cheap solution, but the demo was quite impressive. I have really seen a 2GB database backed up in 1 second and restored from backup within 1 minute as they claim in the technical specification.
Wim Decorte Posted November 16, 2010 Posted November 16, 2010 A topic near and dear to my heart, I did a presentation about this at this year's devcon. If pure backup volume is an issue then using a Solid State hard disk will certainly help, those are blazingly fast. Jason Erickson had a demo unit at his WorldSync booth at Devcon. Unfortunately in a strict corporate environment it may be a hard sell until those SSDs become more standard and vetted. To critical applications the issue is one of "high availability". Even with a backup every half hour you can lose up to half an hour's worth of data and whatever downtime it takes you to restore from a backup. We're solving this with fmDataGuard and more recently SyncDek (both from Worldsync). For us it solves two problems: - keeping a very robust audit log - create a high-availability setup Our deployment basically consists of two servers, one is live and one is standby. All data edits (new records, changes, deletes) are tracked by either fmDataGuard or Syncdek and collected in an audit log file on the standby server. There these edits are rolled forward into the standby copy of the solution every 5 minutes. We also have a very clever dynamic DNS setup with all the file references using DNS names. If the live server goes down we can do a couple of things depending on the scenario: 1- just reboot the live server and forget about its files, we just copy a set over from the standby server. This allows us to recover from a disaster in about 15 minutes 2- flip the Dynamic DNS settings over so that the servers switch roles. Standby becomes the live system, giving us time to resurrect the old live system 3- consolidate everything on one server All of these come with documented step-by-step procedures on how to handle the various scenarios. The whole aim of this consolidated approach was to: - minimize downtime - minimize data loss We're using SyncDek on the busiest system since it allows for the server-side data collection and rolling-forward and we're doing that on the standby server to lessen the impact on the production server. For the less busy systems we're using fmDataGuard. I wouldn't think of deploying a solution without using one of these products. In addition to all of these FM-centric precautions, the users need to think of a plan to be able to do their work without the system for a while (business continuity plans)
Adam Dempsey Posted November 16, 2010 Posted November 16, 2010 We have about 100 files with a few being > 1GB, split over 2 servers running backups every 30 minutes, then the files are copied to a remote server by an external script. We recently upgraded from FMS10 to FMS11 which decreased backup times from upto 10 minutes down to around 4. We currently run FMS on Win2003 on Virtual Machines, and we've noticed that backup performance was definitely much better when we were running FMS10 on Xserves. I'm currently being asked to further improve the backup times as for those few minutes each hour, all of out users are pretty much stuck. We already only backup some files every 30 mins and some only daily. Another thing I'm going to look into is archiving older data into a secondary file, particularly in our main data files which are over a GB, so that the active data is backed up more often than the static, archived data which can just be backed up daily. Looking forward to reading the replies from other developers and hoping to get some ideas on how we can improve our backups -)
Steven H. Blackwell Posted November 16, 2010 Author Posted November 16, 2010 We also have a very clever dynamic DNS setup with all the file references using DNS names. If the live server goes down we can do a couple of things depending on the scenario: 1- just reboot the live server and forget about its files, we just copy a set over from the standby server. This allows us to recover from a disaster in about 15 minutes 2- flip the Dynamic DNS settings over so that the servers switch roles. Standby becomes the live system, giving us time to resurrect the old live system Clever, very clever. Thanks for all the information so far. Steven
HOnza Posted November 16, 2010 Posted November 16, 2010 Well done failover solution, Wim! Also the "business continuity plan" mention is very important.
IdealData Posted January 1, 2011 Posted January 1, 2011 I've been running a Mac Mini with SSD for over 6 months after upgrading from G5 Xserve. The SSD is certainly very fast and a backup of the 2.4Gb of data over 40 files takes around a minute. Still this gives me some problems... During backup the databases get paused, however I have some scripted routines that are producing bad results. At first I thought it was because I sometimes develop "live", but I stopped doing that as I realised I was causing the files to pause and produce bad results. The frequency of the bad results is now diminished so much that I know I must have been the culprit, however I still get one or two per month - so I'm considering that it is the backups being run during the day causing the pauses, and the bad results. We don't seem to get any "coffee cup" situations.
Wim Decorte Posted January 2, 2011 Posted January 2, 2011 During backup the databases get paused, however I have some scripted routines that are producing bad results. Interesting. Can you expand a bit on what the bad results are? Incomplete records? Does error trapping give you an specific error numbers? As an aside: I would have expected the SSD to make a bigger impact on the backup speed. Our main solution is about 3GB and doing an old-fashioned backup (pausing/copy/resume) takes about a minute and a half. Wim
IdealData Posted January 3, 2011 Posted January 3, 2011 The bad results are records that don't get written. The script is a loop to duplicate a set of records (could be hundreds). Although I have error capture on I'm not actually monitoring the errors, I'm only doing this to avoid dialogs if there is an error. 99.9% of the time it's all okay. Maybe I will take the error trapping a little further. The SSD has made a massive difference in backup time compared to the older Xserve, runs around 4 times faster. Obviously the bigger a file then the longer it is paused for. It so happens that the file in question is the largest, around 1Gb, so it stands that highest chance of errors.
Wim Decorte Posted January 4, 2011 Posted January 4, 2011 The bad results are records that don't get written. The script is a loop to duplicate a set of records (could be hundreds). Although I have error capture on I'm not actually monitoring the errors, I'm only doing this to avoid dialogs if there is an error. 99.9% of the time it's all okay. Maybe I will take the error trapping a little further. Yes, do some error handling and let us know what you're getting. That should narrow the search significantly.
David Jondreau Posted March 1, 2012 Posted March 1, 2012 We also have a very clever dynamic DNS setup with all the file references using DNS names. If the live server goes down we can do a couple of things depending on the scenario: 1- just reboot the live server and forget about its files, we just copy a set over from the standby server. This allows us to recover from a disaster in about 15 minutes 2- flip the Dynamic DNS settings over so that the servers switch roles. Standby becomes the live system, giving us time to resurrect the old live system Can you tell me about your dynamic DNS service? I've been playing around with DynDNS but can't figure out if they offer what I want (even after e-mailing sales). We're looking for something where we can put a slightly branded name (dj.dnsservice.com) in our Opener files and that points to our primary server (123.server.com) and if something happens to the server we can switch to 456.server.com (where we keep hourly backups). We've done a little work with DynDNS installing an application on the client machines, but that's not a scalable option. Thanks, DJ
Recommended Posts
This topic is 4710 days old. Please don't post here. Open a new topic instead.