Jump to content

unresolved interruptions


This topic is 4675 days old. Please don't post here. Open a new topic instead.

Recommended Posts

I'm not sure whether I should be starting a new post, but a significant problem I had a couple of months ago (http://fmforums.com/...__fromsearch__1) with frequent interruptions (every 5-10 minutes, though sometimes the pattern is erratic) has come into sharper focus, and I need a strategy that will lead to a diagnosis, especially since I am committed to a deadline of August 1 for an expanded version of my database to a significantly larger group of first-time users, who will be very shortly be committing to FM software on their end.

My IT person has put a lot of time into diagnosing and researching the problem but hasn't been able to figure out what's wrong, though he now believes that the problem is not with the server but with the file itself: he had 5 other FM databases from other sources connected and used over a period of two days, and my database was the only one that produced interruptions. It may be that my file has been affected by the deployment, which the other files haven't shared (or shared only in part), but I have already passed on to him all of the suggestions that respondents made in the original FM post, and he has assured me that he has accounted for all of the relevant suggestions. Specifically, on the following issues:

Close OS-level file sharing: "the database files aren't shared, they are deployed to a staging folder where they are copied into the active data directory after the server has verified that the database is closed and (as an additional level of protection) the database engine has been stopped."

Only close the files through the admin console or through the command line interface on the server itself, wait for confirmation that the file is closed properly and then move the file somewhere. That means waiting for ALL clients to disconnect properly: "We are using the fmsadmin command line tool to properly close the files and then a script to move the database in."

Make sure that anti-virus and disk indexing software isn't touching the hosted FMP files: "I can confirm we have no AV on the server and that indexing has been explicitly disabled for the filemaker folder."

He also says that he has read up on FM best practices and is following all of them.

(Note that I don't have access to the Console because my IT person feels that the kind of access that would give me to the server isn't consistent with our university's best practices on security. He made the point that hosting sites don't generally offer access to the Console. They script the opening and closing of files in the same way he does: using the fmsadmin command line tool.)

I should add that there was, as someone had suspected, an inconsistent file in place, but I compacted, checked for consistency, put the file back up, and the problem seemed solved for almost a day but then came back the next day.

My IT person suggested I post a snippet of the error messages on a recent log to see if they suggest anything to anyone on the Forum.

A question: Is it possible that something in my code (none of it written for the server, except for one scheduled script in which I just noticed and eliminated a Commit Records and a Show Custom Dialogue) is causing the problem? If so, what do I look for?

Next step for a problem like this one? Hire a consultant?

Screen shot 2011-07-20 at 1.35.52 PM.png.zip

Link to comment
Share on other sites

Sorry to not get back sooner. I seem not to be getting the usual automatic email notices.

Disconnection might be a more accurate term. I get a message saying that communication with the host has been interrupted and can't be re-established. It's exactly the same, I believe, as what happens when one's computer has gone to sleep and one starts it up later, only there's no apparent reason at all for the loss of connection. That's it for the session. The interruptions can occur as frequently as every five minutes, though there can be longer stretches with no interruption. Then when everything seems like it might be OK, they'll start happening frequently again.

Link to comment
Share on other sites

In all of my testing I've been connected by ethernet cable. The problem has occurred on both OS and Windows platforms, from more than one location, on the same LAN as the server and off the LAN. Though I remain the principal user until August 1, other users testing the database experience the problem as well.

Link to comment
Share on other sites

You are saying that the users are seeing a "communication with the host has been interrupted" for all of those disconnects? Sometimes you can see these server side logged events if the FMP client crashes, which would be a totally different kind of troubleshooting.

Is the university using VOIP or anything else that might be setting strict network QoS rules?

Link to comment
Share on other sites

You are saying that the users are seeing a "communication with the host has been interrupted" for all of those disconnects?

Yes, I'm attaching a recent error message. They're always the same.

Is the university using VOIP or anything else that might be setting strict network QoS rules?

I'm going to ask my IT support person for the answer to the server-related questions. I'll get back soon with an answer, or he'll join the post directly.

Thanks, Steven, Vaughan and Wim, for taking this up. If I can't resolve this issue, my database will not be useable.

FM error message.png.zip

Link to comment
Share on other sites

It could be something as simple as a bad ethernet patch cable. Are any other services running on the server machine?

A lot more information is needed.

Link to comment
Share on other sites

  • Newbies

Good Morning Everyone,

I am the admin that hosts the application that jjjjp is referring to - and i'm happy to provide some of the technical details that you are asking for in an effort to help us out. The server in question is a 2008 R2 Windows server running on a three-node vmware vsphere 4.1 cluster. Each node is connected with dual redundant gigabit channels to two independent switches which are themselves connected (each) with gigabit fibre directly to the campus backbone. We don't have any QoS enabled, and the cluster in question hosts many other services including a series of SQL databases, active directory servers, file servers, terminal servers, web servers etc. none of which are experiencing the kind of disconnects that the filemaker server is experiencing. On the server end the disconnects are showing up as error 51, which as far as I can tell doesn't exist in the standard list of fmserver error codes.

I am happy to entertain networking problems as a source of the disconnections, but this would mean (to me) that fmserver fmpro connections are *significantly* more sensitive to network latency/throughput/packetloss/etc than other applications that rely on constant connection such as terminal servers.

I can provide any log information that people feel would be useful and gather metrics on any system level counters - anyone have any thoughts?

Link to comment
Share on other sites

  • Newbies

It could be something as simple as a bad ethernet patch cable. Are any other services running on the server machine?

A lot more information is needed.

The VM running the fmserver isn't running any other services beyond those required for the OS. One of the advantages of our VM-only environment is that in almost one hundred percent of cases we dedicate a server to a given application. The patch cable is possible, but the inturruptions aren't happening in any other application - and as the vm floats from host to host there shouldn't be any common physical pathways.

Link to comment
Share on other sites

OK so FMS is running in a dedicated VM.

Back to basics: make sure the VM has enough RAM. Ensure that the FMS hosted files are NOT being shared at the OS level and NOTHING is touching them: virus scanners, backup programs etc.

Ensure FMS is configured optimally, especially the cache. Make sure FMS is patched to the latest version.

Get the hosted files and save compressed copies, put these copies back into production. Use the remote admin tool to upload the files back to the server to ensure permissions are set correctly.

If problems persist, get in a FM professional. I'm in Sydney and Melbourne.

Link to comment
Share on other sites

Just to be sure that there is nothing I can be checking at my (the programmer's) end: Is there something I might have done in my scripting or in my configuring of my database that could possibly be the cause of these frequent interruptions?

Link to comment
Share on other sites

Is there something I might have done in my scripting or in my configuring of my database that could possibly be the cause of these frequent interruptions?

Not with normal FMP script steps. FMP scripts that run OS-level scripts: perhaps?

Link to comment
Share on other sites

I'm not sure exactly what qualifies as an OS-level script. My one scheduled script sends out email reminders via smtp using the send mail command, but I take it this is not what you're referring to.

Link to comment
Share on other sites

No, that's OK as long as the steps are all server compatible, but even incompatible scripts should not cause interruptions.

Unless the script goes into an infinite loop or something... what would happen then?

Link to comment
Share on other sites

I have something quite similar...

FMS 11 on Mac Mini OS X 10.6, but I also have a mix of physical machine and virtual machine clients in OS X, XP and Win7.

The virtual clients are hosted on Parallels Server 4 Mac Bare Metal on Xserve hardware, and can be accessed via RDC or LogMeIn depending on location.

The virtual XP and Win7 clients get the "communication with the host was interrupted" frequently, but at random intervals. OS X VM clients NEVER get disconnected!! Physical clients don't get disconnected on any platform.

My workaround so far is to have a file hosted on FMS which is opened/closed every minute using a timed script for the VM clients. Seems to sort the problem.

HTH.

Link to comment
Share on other sites

On the server end the disconnects are showing up as error 51, which as far as I can tell doesn't exist in the standard list of fmserver error codes.

OK, that is a good clue.

The client is supposed to send the host a "tickle" if there has been no other network activity to the host for 60 seconds.

If the host doesn't hear from a client for 125 seconds and doesn't have anything to send to the client, it will disconnect the client since this means the client has failed to send two "tickle" messages.

51 means "Client tickled out", the 125 second timeout was reached with no contact from client.

So if you're seeing 51 errors, it means the client is trying to send a "hello, I'm still here" tickle, but it's not being received by the server, so the server eventually disconnects the client after 125 seconds.

Steven

Link to comment
Share on other sites

I am happy to entertain networking problems as a source of the disconnections, but this would mean (to me) that fmserver fmpro connections are *significantly* more sensitive to network latency/throughput/packetloss/etc than other applications that rely on constant connection such as terminal servers.

It most definitely is. Unlike any of the others you mentioned FMS and connected FMP clients are in CONSTANT communication, not just the typical request/response you would see from an AD or web server. The nature of the traffic is that it sends a lot of small packets. It has a traffic pattern that is very much unique, and that makes it particularly sensitive to network issues such as VOIP QoS, flaky cabling, faulty network switch equipement...

No, that's OK as long as the steps are all server compatible, but even incompatible scripts should not cause interruptions.

Unless the script goes into an infinite loop or something... what would happen then?

Good questions, I would add: any server-side plugins in use?

Link to comment
Share on other sites

  • Newbies

The client is supposed to send the host a "tickle" if there has been no other network activity to the host for 60 seconds.

If the host doesn't hear from a client for 125 seconds and doesn't have anything to send to the client, it will disconnect the client since this means the client has failed to send two "tickle" messages.

51 means "Client tickled out", the 125 second timeout was reached with no contact from client.

It most definitely is. Unlike any of the others you mentioned FMS and connected FMP clients are in CONSTANT communication, not just the typical request/response you would see from an AD or web server. The nature of the traffic is that it sends a lot of small packets. It has a traffic pattern that is very much unique, and that makes it particularly sensitive to network issues such as VOIP QoS, flaky cabling, faulty network switch equipement...

Good questions, I would add: any server-side plugins in use?

Ok these are interesting things to follow up on. Steven does this mean that from the server's perspective the client has not been in communication for 250 seconds? Are there anything special about these tickle packets? Are they a specific size or is there parity information contained in them which if lost makes the system not treat the tickle packet as received? Is there a timeout for an individual tickle packet aside from the timeout between packets?

No server side plugins and no os scripts are being triggered from within the application itself. Some monitoring of the FilemakerPro executable from a remote client showed no dropped packets (so unless the tickle packets are special, other packets aren't being dropped either).

Is anyone else running fully virtualized FMS installations? Have you had any problems with various virtualized network adaptor types? VMXNET 2 vs 3 vs emulated E1000? Aside from just general network monitors are there any applications or logs that might give more visibility into the FMS networking activity and stack? I'm finding the amount of information it provides a little lean for diagnostics.

Link to comment
Share on other sites

Tickle is every 60 seconds. If two consecutive tickles are missed, 5 seconds later, the disconnect occurs. This presumes server has not itself sent out some update information, IIRC.

I am not a big fan of virtual servers. But if using one be sure that the FMS one is not starved for resources. It needs at least 4 GB RAM (the virtual machine needs this). And it needs a good sized hard disk drive as well.

Steven

Link to comment
Share on other sites

This topic is 4675 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.