How to ease container data backup

June 11, 20206 yr

My solution uses a lot of pdf files stored externally in container fields, in open, non-secure way but managed by FM. The stored files never change its content, it's just their number that increases day by day, I have more than 100.000, about 12GB.

The Automatic Backup option in FMS copies the whole Database folder where my pdfs are, this makes backup slow, less secure and space demanding: 12GB per copy.

I'm thinking of storing the container data in an external folder and convert the container into a calculated field. I know this requires extra control on file and folder locations, but on the other hand it would speed up and ease the backup process very much.

The question is: should I store the pdf files in the "Additional database folder" that FMS allows to set up? if so, would FMS Backup copy them anyway? otherwise, should I store the pdfs in some other folder and organize my own backup system? will FMS be ok with this? (of course it requires FMS read/write permissions into it)

June 11, 20206 yr

1 hour ago, naio said:

The Automatic Backup option in FMS copies the whole Database folder where my pdfs are, this makes backup slow, less secure and space demanding: 12GB per copy.

It does NOT use 12GB per backup. It's a common misconception.

The OS will report the full backup set as being 12GB, and it is, because it is a complete set always with each backup. The FMS uses hard-linking under the hood. Meaning that any files that have not changed since the last backup will not occupy new hard disk space.

There is a very easy way to test this. Run a backup and check the free disk space. Wait a minute and run a new backup, check the free disk space: you will see it did not go down by 12GB. It went down by the size of whatever file got created or modified inbetween the two backup runs.

@Steven H. Blackwell and I wrote a series of white papers on this subject when this feature was first introduced; you'll find them here on this forum to download. I strongly suggest you read them, understanding the mechanism is crucial when you have this many files.

1 hour ago, naio said:

I'm thinking of storing the container data in an external folder and convert the container into a calculated field. I know this requires extra control on file and folder locations, but on the other hand it would speed up and ease the backup process very much.

Very likely, it will NOT. It would just make your solution unnecessarily complex.

Edited June 11, 20206 yr by Wim Decorte

1

June 11, 20206 yr

Author

You are right regarding server resources but my concern is to maintain the integrity of those hard links all across the backup chain: first to a local server and then to the cloud, I'll need to do some testing before trusting it completely.

Thanks for your help.

June 11, 20206 yr

When you copy stuff to another machine or volume, hard links are not maintained because those are relevant only to the hard drive that they belong to. But you can pick any backup set and sync or copy it to the cloud and it will remain exactly that backup set. Because for the

The crucial distinction that you need to make is between the concept of a 'file' as seen by the OS and the iNode that it represents on the physical storage.

When you ask the OS (directly or through a utility) to copy a *file* then the OS is smart enough to go fetch the content from the relevant iNode on the hard drive.

If you trust your Time Machine then you are already trusting hard links in a very major way.

Think of it this way: every file you see in your file system is in fact a hard link. Often it is the only link to the actual hard drive real estate but sometime it is not.

I would very strongly discourage you from trying to come up with your own container storage scheme just to overcome some doubt you have in mind about how these things work. If your concern is storage on the places where you will copy backups to, away from the FMS then solve that problem there, not by making your FM solution unnecessarily complex and brittle.

For instance, you can sync very efficiently to AWS S3, copying just the files that were added or changed since the last sync. Then do effective life-cycle maintenance of the backup sets using the S3 tool set.

June 12, 20206 yr

Author

So I understand that hard links are only useful in the first instance of the backup, that's within the server Backup folder.

In my system I first copy from the server to a local NAS and then sync with Dropbox, so the files take its full disk space (and transfer speed) as soon as they leave the host.

I'll follow your advice and will keep container data managed by FM.

I'm a bit lost about the solution you suggest, using S3 to create a new backup set with only the new pdf files, if you can shed light on it I'd appreciate.

Thanks in any case.

June 12, 20206 yr

The AWS CLI for S3 has the native ability to do a sync instead of a full copy. That means that I don't waste time or 'disk space' on S3 by copying files that it already has and that have not changed. I then ask S3 to create a new date/time-stamped folder in my bucket and tell it to copy all the files into it. That's blazingly fast and doesn't tax my FMS at all. Plus I can then set a retention policy so that it keeps enough backup sets but archives old ones.

I would suggest you do the same in the copy to your NAS: depending on your OS use either rsync or robocopy (both native to their OSes) to make that copy to the NAS not process the same unchanged files over and over.

1

June 12, 20206 yr

Author

Indeed I use rsync with the NAS, but with the -a option which I think it neglects hard links. I should test it without it.

My NAS has an app to sync with S3 but I'm not sure if it would preserve hard links, actually I use Dropbox that I think it doesn't either. I've never used the S3 CLI and I'm not sure if it's possible from my device.

How can I tell if a file uses hard links or if it's fully saved? The file size is not an indicator.

June 12, 20206 yr

3 minutes ago, naio said:

How can I tell if a file uses hard links or if it's fully saved? The file size is not an indicator.

That's not a good way to think about it.

See this current discussion for a lot of background info: https://community.claris.com/en/s/question/0D50H0000811zVESAY/server-backup-daily-backup-vs-own-backup-plan

You don't need to worry about preserving hard links when syncing to your NAS or S3. The sync mechanism itself will skip files that have not changed which is the only thing you're after.

July 2, 20206 yr

Author

So I discarded separating container files from the solution and also keeping many full copies of the solution folder. Just in case someone is interested:

Syncing the whole FMS Backups folder, with many copies of the solution with S3 or Dropbox using the NAS Sync utility proved impossible. The software disregarded hard links and syncing >1M files seemed too much to handle, the app crashed after every attempt.

So I sync only the latest backup into the NAS via rsync (that's the only backup I keep locally) and then I use a backup utility to store it in S3. The backup utility takes advantage of hard links so the daily file size increase is minimal. Now I can restore from any of the stored backups that I keep in S3.

As usual, thanks @Wim Decorte for your help.