Jump to content
Claris Engage 2025 - March 25-26 Austin Texas ×

This topic is 7643 days old. Please don't post here. Open a new topic instead.

Recommended Posts

Posted

Hello All:

First off, I know my FM Server, so it isn't something obviously stupid, like having AppleTalk turned on or something like that. Now, the problem:

I've got a client with an FM Server system, which I set up, and which has run fine since installation about six months ago. Suddenly it is doing something I've never seen before. After a panicked call from the owner I went in today, and every single client machine was showing the squiggly line for a cursor - you know, the one that means it is communicating with the host, like it shows when first loading a database. Every machine was completely locked up. FMServer showed some wierd info: Out of 15 guests, only 5 were showing names. The others had no names, but their connect times, etc. were still showing. Everything was in italics, so someone had attempted to quit FM Server, probably in a panic after the first sign of trouble. (Nobody will admit to this, of course, as it is strictly verboten to touch the FM Server, and says so all over the server room, in several languages.)

I restarted the FM Server, rebooted all the client machines, and everything worked fine, for about five minutes. Then a PC went into this same tailspin, and a Mac displayed a "lost communication with host" error, so I went through the whole reboot process again, only this time I restarted the network hardware, too (24-port Asante ethernet switch and a DSL modem.) The lights on the switch had looked a bit odd to me, flashing a bit too much in synch. I also opened and closed each db in FMP to make sure that the files weren't showing any signs of corruption, or of having been improperly closed. Now everything works again - I'm suspecting the switch was at fault. Can anyone suggest what may have been the matter?

The details on the system are:

FM Server: Beige G3 running Mac OS 8.6, FileMaker Server 5, no other applications open, etc.

Clients: PCs, G3s, G4s, all running FMP 5.5 over TCP/IP

Switch: Asante 24-port 10/100 switch. Hasn't misbehaved since purchase.

Thanks in advance.

-Stanley

Posted

Hi Stanley,

Quick answer... check to make sure no one has looped a drop(patch) cable from one Ethernet port back into another.

Explanation... I have been designing, implementing, and managing networks for 20+ years. Since 100 & 1000 Mbit Ethernet has become popular, I have seen problems with 'loop-backs'. Someone (a user) will 'inadvertently' take a drop cable from a computer and connect it to another Ethernet port... in essence... both ends of the drop cable are plugged into two live ports. This is usually in a room, under a desk where it is not easy to see. This causes a 'flood' of packets on your network... nobody can do anything. In fact, if it goes on long enough, the users seem to have problems getting the mouse or keyboard to respond... almost like the workstation locked up.

This happens with both Mac and WinTel computers and doesn't seem to happen on 10 Mbit Ethernet... not sure why.

Trying to find the culprit... locate the 'loop-back' drop cable, disconnect, restart all switches, routers, servers, and workstations. If you have many switches... disconnect each switch from the remainder of the network and you should see immediate results on one side of the switch or the other side of the switch. This can be fun on a network with a 1000 workstations, 30 servers, 20 switches, wireless bridges, routers, numerous buildings, 200+ rooms & offices, and WAY TOO MANY KEYS!

Another thing... shutdown your servers and see if workstations can successfully connect to the Internet (www). You could connect a notebook directly to your DSL router and see if it connects to the Internet. This will help determine if the problem is an internal or external issue.

Hope this helps... Good Luck! Let us know what you found.

Bob Kundinger

[email protected]

Posted

Okay, guys, thanks for the input.

The IP addresses are configured manually and none of them are duplicates - but I'll check and make sure some "clever" person hasn't altered one of them.

The internet connection worked throughout on the one computer on the network which does not run FileMaker, but I wondered if there couldn't be some issue with the DSL router anyway - it is the one part of the network which I didn't install, and which I've never had anything to do with. It is also the newest part of the network, having been installed about two months ago.

As far as I can tell there are no loopbacks - but I'll have a look at that, too.

As of right now (Wednesday night) the network is still up and running with no problems, 36 hours after the event. I'm wondering if simply rebooting the computers & restarting/resetting the switch & router cured whatever was taking place. If so, then it could happen again. If that is the case, I'd like to figure it out, in order to prevent it.

Thanks again

Stanley

Posted

Hello again Stanley,

Is the network still functioning properly?

To figure out what happened... you need to do only ONE 'fix' at a time... this helps determine what ONE thing on your network was the culprit. If you 'fix' more than one thing at a time... you will never know which was the culprit.

That said... it is hard when all the users are 'breathing down your neck' and 'nothing works'. You just have to be patient, unemotional, and set the 'down-time' expectations with the users. This takes longer, but pays off in the long run when problems reoccur.

Unfortunately... old problems never reoccur... only new problems crop up.

Hope this helps... Good Luck!

Bob Kundinger

[email protected]

Posted

Bob:

Well, it worked for a few days, and now it's on the fritz again. I was trying to figure it all out when one user showed me that when he performs a "show all" on the biggest db in the system (a 670 MB image database), scrolling slows to a crawl (a sign that the user entering the images isn't shrinking them like he's supposed to) and the error ("communication with host lost...") comes up on one particular record. I couldn't stick around to diagnose any further (I'm on JURY DUTY and it was lunch, so I just had a minute) but tomorrow I'll go and see what the deal really is.

I'm wondering if a very large or corrupted (or both) JPEG could screw up the networking this badly, perhaps by overloading the server? I'm not a TCP/IP or ethernet hotshot, but I'd think that packet-switching should merely slow the system down in an overload situation, rather than just shut off communication in general.

That said, this is the first time someone has shown me the behavior as it happens, so I have my first real clue to what's going on. As far as the rest - it is indeed hard to maintain a scientific approach to debugging a network when the client is going ape in the server room, but I do try...

I'll post again if I can figure this out because, as you can see by the neighboring thread, Lucky is having the exact same problem.

-Stanley

Posted

Stanley,

Sorry, I don't have experience with clients with large imageDB systems.

You might be on the right track with the large and/or corrupted images. I know from experience that corrupted images on layout cause system crashes... then file corruption... ETC.

Bob

Posted

Bob:

Thanks for all the input. I'm going over in the morning to look at this stuff, and hopefully I'll sort it out then. I was hoping, though, that you could (as a network guy) tell me if a corrupted JPEG could cause the kind of network 'blackout' where all the machines drop communication with the host? I guess maybe only a FileMaker engineer would really know, because to me the only way this happens is if FileMaker Server is the one choking on the corrupted JPEG, but is that possible? I thought FM Server just shot stuff out from drive to network without trying to digest it... am I wrong?

I am just curious, apart from trying to solve my problem...

Thanks again

-Stanley

Posted

Your symptoms sound similar to a problem I dealt with for months before finding a bad Ethernet cable or two. Consider having your network cabling certified, it cost a few dollars but saved me hours of work ridding of the two bad patch cables.

Posted

Cabling, huh? Well, I'm going over there tomorrow, so I'll test all the cabling.

Thanks

-Stanley

Posted

After all that, it turned out to be a corrupt JPEG in the main image catalog DB (this is a custom print shop, so every machine in the place accesses this DB) which was doing the damage. I don't understand it entirely, but one of the users showed me that accessing this one particular record immediately cause the "communication lost..." error to appear. After the record was deleted, everything returned to normal.

Nice to find a simple solution - I was afraid I'd have to essentially rebuild the entire system.

-Stanley

This topic is 7643 days old. Please don't post here. Open a new topic instead.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.