Jump to content

FMSA 8v2 OS X 10.4.4 - Nightly Crashes


This topic is 6589 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Every night during the week my server crashes regularly between 5 pm and 6 pm (ie. quitting time). On the weekend, no one is logged in and the server does not crash. On occasion, the server has quit in the middle of the day, typically a Friday. Once it has quit for the day, it will be fine until the next evening.

I am not recovering files. I always pull from the latest backup.

When the server crashes, I have been watching the Activity Monitor. The “fmserverd” process will go away but the other two FM process will remain. To “restart” the server, I have been replacing the crashed files with my backups and started the Admin app. The “fmserverd” will startup and works fine for another 24 hours typically.

Does anyone have any ideas on what could be causing this? I have tried all of the standard debugging techniques to find the source of this problem but have not been successful.

Specs:

-Mac OS X Server 10.4.4 with XRAID 3 gigs of RAM, gigabit Ethernet, data on raid backup to system drive every 20 minutes, crashes have no relationship to backups

-Clients: cross platform all running FM8v2 and also running FM6 to access another FM Server.

-Latest version of FM Server and web engine, hosting 24 files some over 2 gigs, 60 max users, only custom web publishing using FX/PHP

-FM files: half of the files are mutli-table built from scratch in FM8, data imported from FM6 and merge files. One file has over 900,000 records. Other half of files, all small, converted from FM6.

Link to comment
Share on other sites

Thank you for the input.

My server is on a dedicated circuit in our server room with a UPS. I have started the server monitor software which should detect any issues with power as well.

There are no jobs that I know of running on the server except the backups. The crashes range from about 5:15 to 6:05. I would expect a chron job to hit at the same time everyday.

I have checked my logs daily to see if there is a pattern. If there is one then the server crashes before anything is record to the FM logs.

The server crashed again tonight at 6:05.

Link to comment
Share on other sites

Scott:

Because you're seeing this behavior at a particular time each day, you should examine what's happening at that time. You said you are backing up to a RAID every 20 minutes - are those system backups or FMServer backups? Do you have a daily backup in FMServer that is scheduled for 5:15, for example? I have noticed that FMServer backups can vary widely from their actual scheduled time, and it is possible that it may be colliding with an OS-level backup.

If that's not the case, there is always the possibility that it is not something internal to the server, but something external. To detect that, you could change the time setting on the server, and observe if the crashes occur between 5-6 pm in real time, or according to the server's clock.

-Stanley

Link to comment
Share on other sites

Please, when you do figure this one out, post the results. I love mysteries like this!

Some more ideas:

  • Make sure you check ALL the logs:
    1. System.log
    2. Console.log
    3. Event.log (filemaker)
    4. Web server logs (both the apache logs, and the filemaker web logs)
    5. Are there any additional PHP logs?
    6. Find the crash log for fmserverd

[*]I like the vacuum cleaner theory. Even if you are plugged into a UPS, are you sure the UPS is working? It might be under-specced for the load, the batteries may be shot, you may be plugged into the surge-protection outlets only. Or perhaps the computer is fine, but the router is not on the UPS and when the router dies, it causes FM to crash?

[*] Add a logging facility to your solution that tracks user login and logout. Nothing fancy, but just create a log table that tracks "User X, computer name Y, logged In (out) from IP address Y at [timestamp]". Perhaps you can figure out if it's one particular user, and if they are doing something odd?

Edited by Guest
Link to comment
Share on other sites

Thank you for the input. I will check into these and let you know.

The only errors I have found so far are "PHP undefined offset" errors. These seem to be minor.

Today, I am moving all the databases to a new box and turning off the web hosting during the fateful hour.

Link to comment
Share on other sites

Check to see if the server has virus software or backup software (other than FMS Schedulled backups) that are scanning or copying the hosted FMP files. In both cases, configure the software to ignore the hosted files (or in the case of the virus software, disable it completely).

I've heard two stories from different people that both involved a vacuum/toaster/kettle. Both took a lot of time and money to diagnose, so it's really worth looking at the power issue -- or at least raising it as a possibility -- early in the debugging process before swapping out hardware.

One was a computer centre in remote NSW that every second night would drop its router configurations. It was a cleaner's kettle plugged into the same circuit.

The other was waaay more scary, and it came from an ex intensive care unit nurse at a major hospital. One of the ICU beds was shutting down for about 90 seconds every night -- during this time the patients were essentially dead since the breathing machine was stopped, but the time period was short enough so that no real harm was being done to them. It turned out to be the cleaner unplugging ICU machine to free up a power outlet for their vacuum.

Link to comment
Share on other sites

I will test for power issues.

The "new" hardware had no impact except to make it very slow. The RAID really makes a big difference. When I say it had no impact, I mean the server crashed repeatedly on the "new" hardware.

I am also trying to test if a corruption in one of the tables could cause the server to crash. When I do a mass delete (10,000 records) the server will sometime crash. But if I delete the same records one by one with a script they delete fine. I have more to test on this theory to make sure it is a repeatable event. Is there a sure fire way to find a corrupted record?

Link to comment
Share on other sites

This topic is 6589 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.