Jump to content
Server Maintenance This Week. ×

Referenced PDF storage


This topic is 4352 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Our solution references PDFs that are stored on a NAS drive. Each PDF is stored by its DocID in a directory that is calculated from the DocID, so that we end up with about 100 PDFs per directory. It's been working great for 3 yrs.

However, we don't have a way to split this directory structure across more than one volume. Perhaps we'll need to in the future? Not sure. Aren't we heading for trouble and won't we eventually reach volume capacity?

This is a system that stores applications to a competition and their associated PDFs and so logically we can split documents by competition (which has an ID). And so, we're starting to think about doing so. Our path would therefore include a new directory before our PDF folder structure /Comp15/1000/1000.pdf for example. In this way, we could store a path prefix in the Comp record, and distribute our PDFs across volumes.

My question...for those who manage lots of referenced files...what's best practice? I've looked at FM12 containers (no plans for that here as we need access to the PDFs by CWP), but I wonder..hmm, they don't split the container structure, nor offer a way to do so. What will happen after 5 yrs of storing containers?

Advice welcome,

Barbara

Link to comment
Share on other sites

Barbara,

I am using SuperContainer to do something similar and have done other projects where files are stored in different server side volumes,

With in SuperContainer i split up things by year/month like 2012/01 2012/02 etc so new files are added to that folder regardless of any other logic - the thought is if we need to archive

documents from last year or earlier we can easily move these files to a different drive.

Since you are using FMP as a method to look at REFERENCED files this is more challenging as I assume that the need for a HUMAN READABLE HIERARCHY on a drive is desired and that the files are added to the directory FIRST and then a record or reference is added to FMP second?

If you would consider this: Using SuperContainer perhaps ScriptMaster too. You could make FMP the SOURCE of all the attachments - and if you STILL need another hosted share point for users to navigate you could build a system that generates on the fly as needed directories and structure where you can create a duplicate file in that location.

Should your be reaching capacity on your drive you could make it so all new documents are routed to the new share point with larger capacity drive - or some other approach to split projects by name or status to the other volume.

In one project i had as soon as the project when INACTIVE i would have a script run thru and MASS move content from one server to the other, keeping the hierarchy in tact.

Hope this helps.

Link to comment
Share on other sites

As Ocean West said, take a look at SuperContainer. You're a media manager. You must know about it (even if it 'only' inspires you).

If you want to do it yourself you need to:

1. create a TAB-text export of the PDF table with:

- the ID

- the current location

- the competition ID

2. create a script (python or perl grade) that reads the export, calculates and creates the new path, COPIES (see step 9) each file to the new path and creates a TAB-text file with ID TAB newpath

3. Import the output of step 2 into a new FileMaker file Newlocations.fp7

4. temporarily create a relationship in Newlocations: Newlocations::ID=PDFtable::ID

5. create a new text field in PDFtable "newlocation"

6. use "Replace Field content" so that PDFtable::newlocation = Newlocations::newpath on all records

7. write a FMP script that takes the path from PDFtable::newlocation, puts it in a variable and imports the file

8. A FMP record loop which uses the previous script to re-import all PDFs

9. write a script (perl/python) that compares the files at oldlocation and newlocation for equality; this could be merged with step 2

This is a sketch from memory and the COPY part in step 2 is because nobody gets it right the first time and you need several runs until all the oddities are handled correctly and this process should run atomically; all or nothing. You don't want an unrecoverable error in the middle of the move. I can't say if it's best practice in general but it works pretty well for me.

I think steps 2 and 9 could be made with ScriptMaster. If you're able to run Python, I can can help you with that.

I hope I could give you an idea.

-karsten

Link to comment
Share on other sites

Thank you both for your replies.

We use SuperContainer on a different project, and so I am familiar with it. It is not used on this project, as we went the shared volume route.

In a way, the customer wants to be able to access the files outside of FM. The hierarchy that exists now is not "friendly" in that it's

1000/1000.pdf

2000/2000.pdf

The doc record exists in FM and then the file is uploaded.

However, the discussion I suppose I wanted to start was...does it make sense to split docs up? As I mentioned, the customer does think in competition chunks...however, the reality is that a small minority of the uploaded pdfs are not competition-specific, and so that model falls apart. Also, the customer needs access to past year's competition docs. And so, archiving isn't really a viable option.

My brief research seems to indicate that there exists technology for what's called "automated storage tiering." From my reading, it suggests that the division of the media is not handled by FM, per se, but rather by the software running the disk array.

Link to comment
Share on other sites

However, the discussion I suppose I wanted to start was...does it make sense to split docs up? As I mentioned, the customer does think in competition chunks...however, the reality is that a small minority of the uploaded pdfs are not competition-specific, and so that model falls apart. Also, the customer needs access to past year's competition docs. And so, archiving isn't really a viable option.

My brief research seems to indicate that there exists technology for what's called "automated storage tiering." From my reading, it suggests that the division of the media is not handled by FM, per se, but rather by the software running the disk array.

Perhaps you should ask yourself:

- What do the users with the PDFs and can the database provide it?

- If they find their PDFs by browsing the server why do you need a database?

- How would they find non-competition files in the database/on the server?

The growing server is an administriviality of copying and replacing some hardware which you should do every couple of years.

Link to comment
Share on other sites

There are always back end measures you can add to provide adequate and redundant storage such RAID and NAS or other connected volumes to accommodate any storage size requirements.

The only provision is that you have a user viewable / modifiable structure that can invalidate the contents of any database as there isn't really any safeguard for ensuring that additions deletions and modifications take place on the file directory AND on the database.

And this logic also requires that the drives be mounted or mapped for each user.

A more simplistic approach would be to enforce that the database is the SOURCE of all documents a document can't be added deleted or otherwise downloaded unless a transaction event occurs. Once you have the data in the database you can of course view things in any sorted fashion - show me all clients from california that are completed - etc.

Where as a ridged OS file structure you only have ONE shot at how it's organized.

To leverage the old school thought process you may wish to investigate http://www.360works....ent-management/ DocuBin it allows you to interact with SuperContainer documents in much the same way users are use to navigating a OS or SHARE hierarchy.

Link to comment
Share on other sites

Thank you again for your advice. I'll certainly reference your thoughts in our discussion.

Just want to point out that we are not looking to access/edit any docs on the NAS drive directly. All views/edits/uploads are done via PHP or thru FM. Our concern is the time/space required for backup, and growth.

I suppose, in my ideal world, to FM it looks like one contiguous volume, it is however spread across many volumes. Otherwise, I am forced to come up with some logic for splitting the hierarchy. The idea of splitting by competition, as I said, has been suggested, but doesn't really hold up. (By the way, the non-competition docs would be email attachments to a person. These emails are "activity" records in FM, created on a person form and so there is no direct relation to a competition. That is why these docs fall thru the cracks).

Link to comment
Share on other sites

This topic is 4352 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.