January 10, 200322 yr This happened last summer on a different machine: Server cannot be seen in Hosts list. Server is working (Dell PowerEdge 600/Win 2K Server/FM Server 5.0), and you can force it by specifying the IP address, but the host is gone by name. This loss did not effect anyone using the files at the time, but anyone trying to open a file hosted by that machine was SOL unless he knew the IP Address.
January 10, 200322 yr A lot depends on your network configuration. The process of locating the FM Server relies on a broadcast. Routers, bridges, and repeaters typically don't pass on a broadcast. Has something changed on your network that might be blocking a broadcast?
January 10, 200322 yr If stopping and restarting the Filemaker service or rebooting the server doesn't return it to the hosts list then trevorg is probably right - something must have changed in your network.
January 13, 200322 yr Author I restarted the server and it came back again. I did not try to specify the name, but the problem is solved but not resolved.
January 14, 200322 yr I guess the best is set "opener" file and copy that to user machines. Then they can simple forget the hassle with server.
January 14, 200322 yr Author Our launcher files fail to see the server to open and run scripts in this case.
July 3, 200322 yr Hi gang, I have had the same or very similar problem. We traced the problem to two different causes: a. We had a CLIENT PC in bad shape. IPCONFIG showed bogus IP addresses. A reboot w/ CHKDSK fixed the problem. Anytime this user clicked on his HOSTS button the entire domain would go blank. b. Any client PC with multiple (more than 1) network card makes our server behave as described (host list gone). * We are running FMP3.0v6 on on W2K and XP. -Ted
July 3, 200322 yr RE: * We are running FMP3.0v6 on on W2K and XP. It is not officially supported. That is old, obsolete version.
July 11, 200322 yr Newbies This has been happening to us for a long time (years) very intermittently. Now it is occurring once or twice a week. We have two W2K FMP 5.5 servers hosting about 160 files between them. Our site has about 120 clients, mostly on FMP 6.0v4, but some in the shop on various earlier versions. Sometimes both servers disappear from the list, sometimes only one. Usually if one disappears, the other one will also within a few hours. After rebooting the servers, everything is fine (until the next time). Over the course of time I have replaced the servers, upgraded FM server and replaced our hubs and switches and can't get rid of the problem. I'm getting ready to set up a sniffer on the network to capture net traffic. I tried a paid call to Filemaker, but their offical response was that there were "no known issues". Becasue of the dynamic nature of our active databases, opener files would be hard to manage. Does anyone have any other ideas?
July 11, 200322 yr RE: I tried a paid call to Filemaker, but their offical response was that there were "no known issues". Bullsh**. Only server IP will work 100%. But looks to me, that FMS version 5 has less such problems, than FM 5.5.
July 13, 200421 yr Newbies Just in case anyone else is searching on this thread, I've found a couple of things that contribute to the hosts disappearing. We have a group of remote users that connected to us through a pair of cisco routers. I blocked access to filemaker through the routers and forced them to use Filemaker through a terminal server at our location and the problem "almost" disappeared. As Ted S. mentioned previously, a client with dual network cards can cause problems too. We purchased a new laptop this spring for one of our sales reps. It had a regular nic and a Wi-Fi adapter built in. He uses the nic while at the office and the Wi-Fi while on the road. Soon after he got the laptop, we started losing the hosts files in Filemaker whenever he would search the list to open a file. I had to show him how to disable the Wi-Fi card in Windows while he was in the office. We are currently on FM Server 5.5, clients are all 6.0v4. We just started testing FMP 7, I don't know yet if it has the same issues.
July 15, 200421 yr im having the same exact problem but we have no remote users. everything is on lan. were using win2k server and filemaker server 5.5. this dissappearence happens once in a while. after rebooting it reappears but then gone a while later again. I cant keep doing this casue its starting to happen way too often. does anyone have a solotion to this problem? i was thinking of redoing the server but since that isnt going to fix the problem then i wont.
September 19, 200421 yr This has been happening to us for the past month or so ever since we switched our server environment from NT to Windows 2003. Prior to this, under NT, the FileMaker Server service would just suddenly stop, whereas now under Windows 2003, the service is still running but the host is not visible. Anybody who is currently connected can still use the databases (and it seems that they can still open other databases hosted on the server), but no one else can connect. In both scenarios, restarting the service fixes the problem until the next occurrence, which can be anywhere from a few minutes to about one month later. Our situation is eerily similar to that reported by Ted S in the Hosts Problem thread: we are running 3.0v6 on XP and Server 3.0v4 on Windows 2003, and we've been in the process of phasing out FileMaker for the past few years. However, the company has recently made a commitment to upgrade our FileMaker environment. This will take a while, though. I realize the versions we are currently running are no longer supported, but until we get to 7.0, I am desperate to fix this lost host problem, as this is undermining the credibility of some wonderful solutions that have been built so far using FileMaker. Like Ted S, I've performed file recoveries, studied the FM server logs to try to establish some sort of pattern of when this occurs (I've kept log histories for the past two years dating back to the NT days), and pinged the server during outages with good results. The Server team have even rebuilt the server, but this has not solved the problem. It just seems to be a totally random thing. I was excited to see some potential causes identified in this thread which would seem to explain the randomness, but I'm not sure how to test for these (I'm not a server or network person, just a business user who loves FileMaker). Our FileMaker solutions have about 1000 clients, many of whom use laptops and are widely dispersed geographically. I will pursue the dual network card and router theories with some network people, but I would be interested in hearing some insights into the following questions regarding some of the other theories: 1. Are there methods for finding bad client PCs on the network? I really wish I could have the eureka experience described by Ted S in his YIKES post in the Hosts Problem thread! 2. This one may be more for Ted S: what is a bogus IP address and how can these be found? Can anyone explain (to a non-techie like me) how the presence of bogus IP addresses on a client machine could result in losing the host on the network for everybody? 3. What role, if any, can the FileMaker Server log file play in tracking down the problem? For example, in the log files, I can see the name of the last person who was able to make a connection and I can see the names of all persons who were connected at the time that the problem occurred. Will this help at all, or, as in Ted S's case, are we looking for someone who just simply was not able to connect and who took the host down with them? 4. We use a lot of HP printers (e.g., HP 5si LaserJets), and I've heard that there are potential conflicts with certain HP printer drivers. Is it possible that the problem occurs when a connected user prints something from a hosted FileMaker database? If so, what is the mechanism of this type of failure? Thank you very much!
January 2, 200521 yr It doesn't appear to be related to the DNS alias. I've reconfigured all openers and relationships to use IP address, and the problem still occurs. Also, I've tried connecting with a wireless NIC activated on the client, but this did not replicate the problem. I do have a clearer understanding of the symptoms of the problem, but unfortunately still do not know why it is happening. The problem does appear to be related to the UDP portion of the client/server exchange. UDP packets are exchanged during the initial discovery phase. Following successful discovery, all subsequent exchanges use TCP. When the problem occurs, TCP continues to work fine for everyone who had already established a connection. However, any attempts to establish new connections fail, as the UDP exchange does not work properly. We performed network analysis upon occurrence of the problem and found that UDP packets from the client were reaching the server and were not generating any ICMP 'destination unreachable' messages. This suggests that the UDP packets are reaching port 5003 and that the port is open. However, no return UDP packet is seen from the server. A comparison of the UDP packets from the client in a working versus broken state revealed that, with the exception of bytes representing the source port and checksum portions of the packet, the packets are identical, including the data or payload portions. Differences in the checksum and source port bytes are to be expected, so it does not appear to be a case of the client UDP packet being corrupted. But this is a comparison after the occurrence of the problem. It might still be possible that one or more corrupt UDP packets were received when everything was working fine, and that the receipt of corrupt UDP packets is what is causing the problem in the first place. But I'm not sure how to test for this. The Netstat and PortQuery utilities also suggest that UDP port 5003 is still open. But I'm wondering whether it is possible that FileMaker Server (FMS) has become unbound from UDP port 5003. Does anyone know of a mechanism that would cause FMS to unbind from UDP port 5003 but remain bound to TCP port 5003? At some level, I think the problem has to do with a difference between Windows 2003 Server and Windows NT. Maybe if consecutive corrupt packets are received, Windows 2003 somehow unbinds the process/service from that port, whereas this would not occur on Windows NT? Any suggestions will be most appreciated.
January 3, 200521 yr I have seen similar situations on 2 client machines, not server, but it may be a starting point. The client machines could not see or connect to any FM server. The problem ended up being a corrupt IP stack. The solution was to remove the network card and all networking DLLs and APIs and reinstall a new network card and IP stack. I suspect that the network cards were bad, thereby corrupting the IP stack, but all utilities showed that the card was OK. We sent the cards in for replacement (3Com) under a lifetime warranty.
January 4, 200521 yr Zardoz, I'm impressed with your research. Nice work! Do me a favor and post your findings here. I'll be interested because although our system has been running flawlessly for a couple of months now, I suspect it won't last and it would be nice to "bottom out" on this.
January 4, 200521 yr Just to add to this. When mine does the same thing, I notice the IP mask from the router changed from a local to an external. All my PC's go down. When I umplug the router and then plug them back in, the IP's reset and everything goes great again. My case might be different but when you lose your connection, check to see if you can access your mail or other resources. My local network stays active, as in, I can access other PC's from My network Places. That stays running.
January 6, 200521 yr Newbies I had the same problem, running server 5.0 on Windows 2k Advance Server, running 60 files. What I started doing was restarting the whole server, not just the service, once a week and the problem does not appear any more, but I know that it's not the best solution.
January 7, 200521 yr Thanks everyone for the feedback. I'll review your suggestions with my network and server contacts to see what we can try. What we're doing now is attempting to capture a few network traces around the time that the problem occurs to see if there are common patterns of packet activity into UDP port 5003 just prior to the problem. We captured a few such traces yesterday, but I don't yet have the results--there is a lot of data to wade through. If we find anything significant, I'll be sure to post it here.
January 7, 200521 yr While I wait for the results of our network traces, there are a couple of things I'm wondering about: 1. Is this problem occurring for anybody in FMS 7 on W2K3 server? 2. To troubleshoot this problem, has anyone tried running the older versions of FMP and FMS in compatibility mode with an appropriate earlier version of Windows? I was just reviewing the troubleshooting information for W2K3 server and it suggests that, for "an application that stops responding, quits or behaves improperly," to try running it in compatibility mode with the version of Windows for which it was originally designed. In our case, I imagine we would try running FMS 3 in compatibility mode with Windows NT. This selection is made by right-clicking the application, choosing properties and picking the appropriate option on the compatibility tab. I've seen similar suggestions in other threads in this forum to run older FMP clients in compatibility mode on the newer OSes. In our case, XP is our OS and we would run FMP 3 in compatibility mode with Windows NT. Not sure if this has anything to do with the problem, but I might give it a try if the network traces don't yield anything useful.
January 10, 200520 yr This subject came up in a thread last October. It was happening to me as well. The problem is frustratingly not duplicatable, it seems to happen at random only. I thought I had an agle on it, wherein some kinds of load on the files (huge replace operations) seemed to trigger it every time. However, the host vanishings *also* happened during the day under normal load. My problem happened every other day for two frustrating weeks, then seemed to go away on its own. It hasn't returned (knock wood) but I'm following this thread with interest. FYI, here is my post from October, outlining my experience: ---------------------------------------------------------------- I'm having the same problem and have been watching this thread to see if something comes up. How big are your files? Mine total up to about half a gig, with 50,000 records in the most populated file (the others average around 10,000 to 20,000 records, fifty files in all). I'm running Win2000 server, with FM 5.5 server running as a service. Nothing else runs on that box. I've noticed that when my files vanish, I can log directly into the server and see everything apparently working normally. FM 5.5 thinks it is still serving the files. Only a reboot seems to fix the problem. I do know that RAM usage is not the issue. During a cache flush after heavy usage, RAM usage spikes up to a max of 47%. If it was over 90%, I'd be worried about it. I've been scrutinizing my event logs, trying to figure out what triggers the problem. One thing that happened recently is that I had 73 files being served. I thought I'd pull off and archive about 20 that didn't really need to be there, hoping to lighten the load. I did this and left myself a trap. I have my FM files set up to do a few hours of processing in the middle of the night. A master updater script performs a set of about thirty external scripts, opening files and running updates one after another. My trap was that a couple of my files had calc fields on the default layout that referred to now-archived old files. So I'd come in, find an error halt about not finding the file (I'm doing this in a Win2000 client machine running FM 6). It gives me a nav window. To get the processing done, I direct FM through the nav window to where the old files are now archived. The processing picks up and finishes. This happened two days in a row (before I'd found and killed all those old calc fields). Both times, right after pointing FM to the archived files, after processing finished, the hosts vanished. This morning, after processing normally without needing redirection, no host vanishing. The problem here is that this may have something to do with it, but it isn't all there is. I've had host vanishings happen three times in the middle of the afternoon without anything like the above happening at all. __________________________________________ Steve Brown
January 12, 200520 yr Thanks Steve. I agree this is a very frustrating and seemingly random problem that so far has proven impossible to replicate. Many times I think I've solved it only to see it happen again. As you state, the only things that work are restarting the service or restarting the server. This allows clients to once again "discover" the server. Our files are not particularly large, and I don't believe there are any heavy calculation or replace operations going on. The problem does occur at any time of the day. Two quick updates to report. 1. Very preliminary results from the network traces indicate that the server is receiving ICMP "destination unreachable" messages around the time of the problem. This would mean that the server is attempting to send packets but the packets cannot be delivered to the destination(s). At this time, I'm not even certain whether these are TCP or UDP packets or whether they have anything to do with FMS. I'll attempt to find out more details as soon as possible. 2. Late last week, the problem was occurring 4-5 times per day, but it hasn't occurred in the past 32 hours ever since I restarted FMS 3 in compatibility mode with Windows NT. Far too early to tell if this is something to get excited about....
January 14, 200520 yr Well, using compatibility mode does not solve this problem. However, a review of the network traces has shed some significant light. We now have three network traces that show the transition from a working state to a broken state, and I have had a chance to examine one of them in some detail so far. In this trace, the break occurs during a seemingly normal attempt by a client to "discover" the FMS. In a normal discovery exchange (normal at least for FMS 3), the client sends two consecutive UDP packets to FMS (port 5003). This occurs when a client doubleclicks an opener file or clicks the hosts button (or specifies a host by name or IP address). When the first UDP packet is received, FMS sends a UDP packet back to the client. When the second UDP packet from the client arrives, FMS sends another UDP packet to the client. After this mutual exchange of two UDP packets, which presumably confirms that the FMS is available and listening (this produces the list of available databases in the hosts window for a client using File/Open/Hosts), the client clicks a database and establishes a TCP connection (or, if an opener file is used, a TCP connection is then made to the specified database) and all further exchanges occur via TCP. The FMS logfile (FMSRVLOG.txt) captures the moment of the TCP connection; it does not capture the UDP-driven discovery phase. What's happening at the moment of the break is that, after FMS has sent the second UDP packet to the client, the server is receiving two consecutive ICMP "destination unreachable" messages, which are saying that the two UDP packets sent from the server via port 5003 are not reaching their destination. This client is then unable to establish a TCP connection to a database. What's interesting is what is seen on the trace after the receipt of these two ICMP messages. The trace shows continued exchange of TCP packets for clients who had already established a TCP connection. However, for every subsequent client who attempts to "discover" FMS, only the two inbound UDP packets from the client into port 5003 on the server are seen; there are no return UDP packets going from FMS to the client. It seems as if FMS is no longer even attempting to respond to any more UDP packets after the two ICMP messages were received. There is still more work to be done, as I understand that ICMP messages have an inherent meaning of their own which can reveal the reason why the message has been generated. And I want to confirm that this is what is happening on the other two traces and get some additional traces to review as further confirmation. Also, it would be interesting to determine if there are any commonalities among the clients for whom these ICMP messages are being generated (i.e., are they from the same subnet, using the same router(s), etc.). But this does potentially explain why remediation attempts that focus on something to do with specific databases do not resolve the problem--the problem appears to occur even before a connection is established to any database. I hope to have some additional information in the next few days.
January 21, 200520 yr Hi, I'm happy to have found this thread . Indeed, for the past few weeks I loose my host and get the message (traducted from french the french message : la communication avec l'utilisateur principal a
January 21, 200520 yr That error code (-70) is not a FileMaker error, it is a Windows error code. Good luck getting anything from Microsoft.
January 22, 200520 yr I have now examined five network traces that show network traffic into and out of the server (W2K3 in our case) when the FMS (3.0v4 in our case) host disappears. A clear pattern has emerged. In all five cases, the host disappearance appears to be associated with the receipt, by the server, of an Internet Control Message Protocol (ICMP) packet from a client (FM 3.0v6 in our case) who has just completed the mutual exchange of two UDP packets with FMS. The first trace (described in an earlier post) turns out to be a bit anomalous in that the client sent two ICMP packets and was unable to initiate a TCP conversation with the FMS. In all four of the other traces, the client sends only one ICMP packet and is able to initiate and establish a TCP connection with the FMS. However, the end result in all five traces is the same: after the host server receives the ICMP packet, FMS does not respond to any further UDP packets from any FileMaker clients until the FMS service is restarted. In all five cases, the ICMP packet is a Type 3--Code 3, which translates as "Destination Unreachable--Port Unreachable". When a destination unreachable--port unreachable ICMP packet is received, it signifies to the receiving computer that a packet that it sent was able to traverse the entire network path successfully to the destination computer but was unable to be delivered to the specified port on that destination computer because there was no process listening on that port. The ICMP message identifies the packet that was unable to be delivered. In the first trace, there were two ICMP messages--one identified the first UDP packet sent by the server to the client, the other identified the second UDP packet sent by the server to the client. In the other four traces, there is only one ICMP message, and in each case, the ICMP message identifies the second UDP packet sent by the server as the one that could not be delivered to the specified port on the client. There is lots of information to summarize, and I'll get to more of it later this weekend.
January 22, 200520 yr So what seems to be happening in the other four traces is that the second UDP packet from FMS cannot be delivered because the FMP client is no longer listening on the UDP port. The FMP client seems to be listening on the UDP port long enough to receive the first UDP packet from the FMS, but not long enough to receive the second UDP packet. The inability of the second UDP packet from the server to find the specified port on the client computer generates the ICMP message back to the server. In the sense that the client in each of these four cases is able to establish a TCP connection, successful receipt of the second UDP packet from FMS does not seem to matter--a TCP connection can be established based on receipt of the first UDP packet from the server. However, successful receipt of the second UDP packet seems to be critical in order to avoid the generation of the ICMP message and to ensure continued UDP response from FMS for subsequent clients. As soon as one client anywhere on the network fails to receive the second UDP packet from FMS, the host "disappears" and no new connections are possible. This raises at least two questions. First, why is the FMP client not staying around long enough on the UDP port to receive the second UDP packet? Second, why is the receipt of the ICMP message on the host server proving fatal for the UDP functionality of FMS but not for the TCP functionality of FMS? The traces do not appear to provide any answers to these questions, but I've got some theories which I'll share next time.
January 25, 200520 yr I'm not confident that all the pieces of the puzzle are in place yet, but an answer to why the host disappears is starting to crystallize. The explanation will be lengthy, so please bear with me. This might take a few posts to complete. The answer appears to lie in Windows Sockets (Winsock) programming considerations. An important bit of information is found in Microsoft KnowledgeBase article number 245442 entitled "INFO: Winsock Ignores ICMP Port Unreachable Control Messages". This article states that ICMP port unreachable messages are ignored by the Winsock layer in Windows NT Server and in other pre-W2K versions of Windows. In other words, when an ICMP port unreachable message is received by one of these earlier versions of Windows, the winsock layer does not perform any handling on it--i.e., it does not translate the error into a winsock error message and does not pass any information about the ICMP message to the Winsock UDP application in the application layer (i.e., to FMS). FMS can carry on as if nothing unusual happened. The article states that if the ICMP port unreachable message is received when the Winsock UDP application (i.e., FMS) is waiting for a response from the remote host (i.e., the client), the receipt of the ICMP message goes undetected by the Winsock UDP application, and the application will continue to wait for a response from the client. The application will be in what is referred to as a "blocked" state--it cannot perform any other UDP operations until the expected response from the client is received. Other socket programming resources that I've consulted suggest that a blocked UDP application would continue to receive incoming UDP requests from clients and would queue them in a UDP receive buffer. It would not be able to respond to any of them until the block is cleared (i.e., until the expected response is received). This is interesting, and it might provide a mechanism to explain occurrences of the disappearing hosts problem in Windows NT, but it does not seem to be consistent with our scenario for a couple of reasons. Firstly, according to the network traces, the problem is occurring after FMS has sent the second UDP response to the client. After FMS has sent the second UDP response, nothing in the traces indicates that FMS is expecting any further UDP response from the client. In a normal UDP transaction between FMS and a client, all that is exchanged are two UDP packets each way, and then the client initiates a TCP connection. There is never a third UDP packet sent from the client to FMS. So FMS does not appear to be waiting for another UDP response from the client. Since it is not waiting for any further UDP response, I don't believe that, in the Windows NT Server environment, the scenario of the server receiving an ICMP port unreachable message in response to the second UDP packet from the server would cause FMS to enter a blocked state. Secondly, I believe that everyone who has posted here regarding this problem is using something later than Windows NT (typically W2K or W2K3) for their server operating system. This disappearing host problem does not appear to be much of an issue on Windows NT. Although FMS is not expecting a further UDP response from the client after sending the second UDP packet, it is certainly not expecting to receive an ICMP port unreachable message, either. In the W2K and W2K3 server environments, though, this is exactly what is happening--the Winsock layers in these newer server operating system environments are passing information about the ICMP port unreachable message back to FMS. This is at the heart of the disappearing hosts problem, and I'll provide more details in my next post.
January 26, 200520 yr The Microsoft article mentioned earlier (#245442) provides another important clue as to what is going on. It states that a feature has been added on Windows 2000 to unblock a Winsock UDP application that has received an ICMP port unreachable message. In this new feature, receipt of the ICMP message is handled by the winsock layer and translated as a WSAECONNRESET error message (message #10054), which is then passed along to the Winsock UDP application for processing, presumably under the assumption that the Winsock UDP application contains error trapping code to deal appropriately with receipt of this error message. Remember that this is new behaviour at the winsock layer in Windows 2000; previously under Windows NT, ICMP port unreachable messages were simply ignored at the winsock layer. The WSAECONNRESET message type is described in the Windows Sockets 2 API (ftp://ftp.microsoft.com/bussys/winsock/winsock2/WSAPI22.DOC) as follows: "An existing connection was forcibly closed by the remote host. This normally results if the peer application on the remote host is suddenly stopped, the host is rebooted, or the remote host used a 'hard close'." This new feature was likely added as a fix to the problem of a Winsock UDP application waiting for, and consequently blocking on, a client response that will never come because the client has closed the connection and returned an ICMP port unreachable message to the server. This fix would be very useful to a Winsock UDP application that was expecting another UDP response from the client. Being informed, via receipt of the WSAECONNRESET error message, that the client has closed the UDP connection and will not be sending any further responses would enable the Winsock UDP application to carry on if it was adequately prepared for this eventuality. However, this would not be a good thing for a Winsock UDP application that is not expecting a response from the client. I don't believe that FMS 3 (and possibly some later versions of FMS as well) is programmed to expect UDP responses from the client. The network traces suggest that the client actually fires off its two UDP packets right off the bat without waiting for any response from the FMS. In other words, the sending of the second UDP packet from the client is not contingent on receiving a response from FMS to the first client UDP packet. This is evidenced by what is seen in the traces after the problem occurs (i.e., after the host disappears): two UDP packets are seen arriving from each client who is trying to connect, but no response is seen going back from the FMS to the client. This suggests that FMS is programmed to respond to each UDP request received from a client but is not expecting a response from the client in return--the client has already sent all of the UDP packets that it is going to send. When the client returns an ICMP port unreachable message to the server, the winsock layer on post-NT servers thinks that this is something that would be useful for the Winsock UDP application to know, and consequently captures this as a WSAECONNRESET error message and forwards it to the Winsock UDP application. However, FMS is not programmed to expect or handle this error in this context, and consequently the UDP portion of the FMS application hangs upon receipt of this error. Restarting the FMS presumably clears out the WSAECONNRESET error message. Microsoft must have realized, or been informed, that this new feature was causing a problem for certain Winsock UDP applications because it issued a fix as documented in KnowledgeBase article # 263823 entitled "WinSock Recvfrom() Now Returns WSAECONNRESET Instead of Blocking or Timing Out." The intent of this fix was to provide WinSock UDP applications with a technique for obtaining the original Windows NT behaviour when an ICMP port unreachable message is received. To get this technique to work, it was necessary to rewrite the WinSock UDP application specifically for Windows 2000. By the time this fix was issued (the Last Review date is 2003/09/11), I suspect that some versions of FMS, including version 3, were no longer supported by FileMaker and that consequently, rewriting these versions of FMS to incorporate this fix was just not an option. So this potentially explains what is happening at the server end of things, but what is happening at the client end that causes the ICMP port unreachable message to be generated in the first place? Next time, I'll discuss a theory for this.
February 3, 200520 yr Newbies Zardoz!! I can confirm exactly what you have posted. I wish I would have found your posting a year ago. I have been wresting with this issue for almost 2 years on and off. A real pain in the caboose! I have a win2k server fully patched from MS. I host maybe 20 files to about 60 users. Users are both local and remote. I have the remotes trained to select the server by IP. The locals panic when they don't see the server immediatley after clicking on "hosts". This problem hit randomly as you describe from the workstation level. My nightly scripts on the server stop and start the FM server service. A bandaid and a terrible one at that. I have FM7 Server ready and plan to upgrade within the next 30 days. I haven't seen any listings of a ver7 system with these problems yet. I have opened trouble tickets with FM on this issue. And still there are "no known issues" with this. Great SUPPORT!
February 3, 200520 yr Zardoz mentioned at the end of his last post that he had a theory as to what is happening on the client to make this problem occur. I'll be curious to see what he comes up with. Zardoz has done excellent work so far! We had an outage again as recently as last week when one of our traveling salesman tried to fire up FMP while he was both physically plugged into our network and had his wireless adapter enabled. In this scenario were were able to make the host list disappear at will.
February 7, 200520 yr Thanks wga79403 and Ted S. One of the perplexing aspects of this disappearing hosts problmem is its apparent randomness. But there is nothing random about what is happening at the server end. All it takes is one ICMP port unreachable message for the host to disappear. So the randomness must be associated somehow with the client end. I don't have any definitive explanations for this randomness, but the observations based on the traces suggest a couple of possibilities. According to our network traces, we are seeing two scenarios at the client end. In one trace, a client who was attempting to connect to FileMaker Server (FMS) closed the UDP port before receiving any UDP packets from the FMS. The client sent two UDP packets from this UDP port to port 5003 on the FMS, but after sending these two UDP packets, the FileMaker Pro (FMP) application on the client did not listen on the UDP port long enough to receive any UDP responses from the FMS. In the other four traces, FMP listened on the client's UDP port long enough to receive the first UDP response from FMS, but did not listen long enough to receive the second UDP response. We use opener files for most, if not all, of our databases. In the first case, my guess is that the client doubleclicked an opener file and FMP launched long enough to send the two UDP packets to the FMS but then something happened to cause FMP to quit suddenly--either the client's computer shutdown or, more likely, FMP encountered a problem in Windows XP and had to quit. I'm not sure what the nature of the problem would have been, but every so often when I'm working in FMP 3.0v6 in XP, I suddenly see an XP error message window popping up on the screen stating that FileMaker Pro has encountered a problem and had to quit. Sometimes it's when I'm printing and sometimes it's when I'm reordering scripts in ScriptMaker. There seem to be some basic incompatibilities between FMP 3.0v6 and XP that surface once in a while. Anyway, this particular client has successfully connected many other times without bringing the host down. In the other four traces, my best guess, as unsatisfying as it may seem, is that this is simply a function of random network delays. The network traces reveal that the normal exchange between client and server goes like this (this is what is seen at the server end, but remember that the client appears to send both UDP packets at the outset): 1. incoming UDP packet from client to server 2. outgoing UDP packet from server to client 3. incoming UDP packet from client to server 4. outgoing UDP packet from server to client 5. incoming TCP packet from client to server (client initiates TCP connection) 6. outgoing TCP acknowledgement packet from server to client This is followed by exchange of many TCP packets until client initiates shutdown of TCP connection. At the moment of the breakdown, this is what is typically seen at the server: 1. incoming UDP packet from client to server 2. outgoing UDP packet from server to client 3. incoming UDP packet from client to server 4. outgoing UDP packet from server to client 5. incoming TCP packet from client to server (client initiates TCP connection) 6. outgoing TCP acknowledgement packet from server to client 7. incoming ICMP port unreachable packet from client to server (always related to second UDP packet from server to client) This is followed by exchange of many TCP packets until client initiates shutdown of TCP connection. However, once the FMS receives the ICMP message, no UDP responses are seen from the server to any subsequent client who attempts to connect. Subsequent connection attempts look like this: 1. incoming UDP packet from client to server 2. incoming UDP packet from client to server Normally, the FMP client waits long enough to receive the second UDP packet from the server before closing the UDP port and initiating a TCP connection. However, it appears as if the FMP client can initiate a TCP connection based on receipt of just one UDP packet from the server. Perhaps FMP is programmed to wait only a certain reasonable length of time for the second UDP packet to arrive from the server. If the first UDP packet from the server is received successfully, but there is congestion or some other problem on the network that delays the arrival of the second UDP packet beyond this reasonable time, FMP still unbinds from and closes the UDP port and initiates a TCP connection. When the second UDP packet arrives, the client UDP port is no longer open and thus an ICMP port unreachable message is sent back to the server. Again, the clients associated with these four traces were different clients in each case, and all four of these clients have connected successfully on many other occasions without bringing the host down. The second packets sent each way appear to be redundant--they are identical to the first packet. UDP is considered a fast but unreliable protocol, so maybe the second identical packet is programmed in for redundancy purposes in case the first packet does not make it. In all five network traces, the ICMP messages were generated by remote clients--clients who are in regional locations in other cities/towns/provinces as opposed to in our head office building. In the first trace, where two ICMP messages were sent, the client did not appear in the FMSERVLOG.txt file because no TCP connection was established. This would be an example of a client who is unable to connect and who takes the host down with them and leaves no trace in the logfile. In the other four traces, the clients who generated the ICMP messages do appear in the FMSERVLOG.txt file--these are the last clients who are shown in the log file as establishing a connection to a database. All subsequent entries in the log file are typically people closing connections to databases, although it is possible to see clients who have already established a TCP connection establishing connections with other databases on the server. So the logfile can be useful, but since it is possible that a client can take the host down without appearing in the logfile, you can never be certain from examining the logfile alone whether the last person to establish a connection to FMS is the one who generated the ICMP message; the ICMP message could be the result of the next "invisbile" person who attempted to connect but who experienced a problem with FMP before receiving either of the two UDP responses from FMS. We have six servers running FMS 3.0v4 on W2K3 and we have only seen the disappearing hosts problem on two of these servers--on the two servers that host databases that are used by regional clients. I don't know much about networks, but it seems reasonable to expect that these random network delays might be more likely to occur in those parts of the network where the distance is greatest. Our data is not conclusive on this, however. Sometimes the logfile shows that the last person who established a connection is a head office client. This happens maybe 20% of the time. The other 80% of the time it is a regional client who is the last person to connect according to the logfile. Without a network trace, it is impossible to know whether the head office client is the one who generated the ICMP message or whether there was some other invisible client who brought the host down a short time later. Maybe it is possible that random network delays could be affecting packet delivery speeds within the head office as well. Does anyone have any thoughts on what might cause random delays in packet delivery? It would be interesting if someone could confirm how FMP is programmed to act once a UDP response is received from FMS--does it wait a predetermined amount of time for the second UDP response from FMS before closing the UDP port? Next time, I'll try to put a concise explanation together that summarizes what appears to be happening that causes the host to disappear. There is one possible remedial step to address this problem--blocking or filtering ICMP port unreachable messages from reaching the server running FMS--that I'll discuss next time as well.
February 9, 200520 yr Before summarizing, there are a few corrections to make. I'm not satisfied with the explanation so far of what is happening at the client end, so I went back to the five traces to see if there was anything else there that might confirm whether a delay is occurring and that might possibly explain the source of the delay. I also performed a few traces on my own client machine to get a better picture of what is seen at the client end during a connection attempt. Although nothing conclusive turned up, some more interesting pieces of the puzzle came to light that suggest some fine-tuning is needed to the theory about what is happening at the client end. In addition, some errors surfaced regarding information that I previously reported. Specifically, I looked at the following packet intervals in a normal state and when the break occurs: 1. The interval between the arrival of the 1st and 2nd UDP packets from the client. 2. The interval between the departure of the 1st and 2nd UDP response packets from FMS to the client. In a normal state (i.e., when no ICMP message is generated), the interval between the two UDP packets from the client ["client interval"] is consistently between 0.990403-1.014624 seconds. In other words, they are seen to arrive at the FMS about 1 second apart. So it looks like FMP (at least version 3.0v6) is programmed to send the two UDP packets 1 second apart. The interval between the two outgoing UDP response packets from FMS to the client ["FMS interval"] closely mimics the interval of the incoming packets--they are seen leaving the FMS about 1 second apart. The interval between receipt of a UDP packet and the response to the packet ["FMS response time"] is consistently between 0.000049-0.000068 seconds--FMS seems to respond very quickly to each UDP packet received. I compared these intervals with those seen at the moment of the break in the four traces in which one ICMP message was generated. In all four traces, the FMS response time was within the range above, and the FMS interval closely mimicked the client interval. In two of the traces, the client intervals at the moment of the break were within the range seen in a normal state--i.e., very close to 1 second. However, in the other two traces, the client intervals were 1.921551 seconds and 2.734555 seconds, meaning that the interval between receipt of the first and second UDP packets from the clients were 0.92 and 1.73 seconds greater than normal, respectively. The traces on my client machine actually revealed that the two UDP packets are not sent right off the bat--the traces show a response being received from FMS before the second UDP packet is sent to the FMS. However, I believe it is still true that the sending of second UDP packet by the client is not contingent upon receiving a response from FMS, as, in a broken state, two UDP packets are seen arriving at the FMS from each client without any UDP response from FMS. It looks like FMP is programmed to send two UDP packets 1 second apart regardless of whether a response is received to the first one. The trace on my machine showed intervals of 0.000499 seconds and 0.000410 seconds between transmission of my first UDP packet and receipt of the FMS response and between transmission of my second UDP packet and receipt of the FMS response. Both response times are well under 1 second. I also captured a trace on my machine in the broken state (i.e., after FMS stopped responding to UDP packets). This trace confirmed that there is no response coming in from FMS--all that is seen are the two outgoing UDP packets from my machine to FMS, and they were sent 1 second apart. I also tried a trace from my machine using the File/Open/Hosts/SpecifyHost route as opposed to using an opener file. I was trying to get some insights into how or when the transition from UDP to TCP occurs, thinking that this might somehow be part of the problem. So I captured a trace up to the point of seeing the list of databases in the hosts window, thinking that I should not see any TCP packets in the trace because I did not doubleclick a database in the list to open it. However, the trace showed lots of TCP packets between my computer and FMS. An examination of the data portion of the TCP packets sent from FMS to me revealed the names of the all of the hosted databases. So it looks like the list of databases actually comes across in the TCP packets, not in the UDP packets as previously stated. In rereading some WinSock information, it looks like the UDP Recvfrom() call simply returns the address from which the response packet was received--in other words, it is used to confirm or discover that the FMS is listening, but does not capture the list of available databases. So, putting all of this new information together, I've got a slightly revised theory as to what is happening at the client end to cause the generation of the ICMP port unreachable message, which I'll hopefully get to later tonight or tomorrow.
February 10, 200520 yr WinSock sources suggest that the typical sequence of library calls for a UDP session is as follows (parameters can be specified in parentheses for each call): socket() bind() sendto() and/or recvfrom() close() The socket is created, bound to a port and then UDP datagrams are sent to and/or received from a port on another host (e.g., a server) until the socket is closed. An application may call sendto and recvfrom as many times as desired. There are many other library calls available, but these are probably the ones that we're most interested in. Based on observations from the network traces, it looks like the sequence of calls used by FMP 3.0v6 is something like this: socket() bind() sendto() recvfrom() sendto() recvfrom() close() If at least one of the recvfrom calls returns valid data (i.e., the IP address of the FMS), then, prior to or after the UDP socket is closed, a transition is made to TCP, whereby a new socket call to open a TCP session is made. The TCP socket uses a different port on the client machine. This seems to be the next sequentially available port number on the client machine, such that if the UDP session used port 1535, then the TCP socket would bind to port 1536 or the next available port number. If neither of the UDP recvfrom calls returns valid data, then the connection attempt times out, the socket closes and nothing displays in the hosts window. This timeout seems to occur after approximately 3 seconds. When I go File/Open/Hosts/SpecifyHosts and type in an invalid FMS IP address, FMP appears to try for about 3 seconds to establish a connection before returning a blank hosts window. Three seconds also appears to be about the length of time it takes for a successful connection to be made and a list of available databases to display. This is all speculative, but I'm guessing that the first recvfrom call waits for about 1 second to receive a response from FMS. Regardless of whether a response is received, control in the FMP UDP program passes to the second sendto call after 1 second, and FMP sends a second UDP packet. Control then passes to the second recvfrom call, which waits for a specified time interval (likely also 1 second). At the end of this interval, if a response to at least one of the recvfrom calls has been received, then a TCP socket is created and a TCP packet is sent to FMS (this packet is seen arriving at the FMS at almost exactly 1 second after the second UDP packet), whereas if no response has been received to either recvfrom call, then the connection attempt times out and the UDP socket closes. Based on these speculations, there are three possible scenarios that could result in generating an ICMP port unreachable message: (1) the first recvfrom call times out before receiving a response from FMS; (2) the second recvfrom call times out before receiving a response from FMS; or (3) both recvfrom calls timeout before receiving a response from FMS. If one of the recvfrom calls times out before receiving a response, then one of the incoming UDP response packets from FMS will have nowhere to go and will generate an ICMP port unreachable message. If both of the recvfrom calls time out before receiving a response, then both of the incoming UDP response packets from FMS will have nowhere to go and will generate ICMP port unreachable messages. In the five traces, we saw one occurrence of both UDP response packets generating the ICMP message and four occurrences of the second UDP response packet generating the ICMP message. But we do not have any occurrences of an ICMP message being generated solely by the first UDP reponse packet. I suspect this is because the first UDP response packet has two shots at being received successfully by the client. If it is delayed and misses the first recvfrom call, then it still has a shot at making it in time for the second recvfrom call. If it misses the second recvfrom call, then the second UDP response packet is also going to miss both calls, and two ICMP messages will be generated. If the first response packet makes the second recvfrom call, then the second UDP response packet will have nowhere to go and will generate an ICMP message. We see clear evidence of a delay happening in two of the four network traces in which one ICMP message is generated. In one trace, the second UDP packet from the client arrived at the FMS 1.92 seconds after the first packet; it was delayed by almost a full second. In the other trace, the second UDP packet from the client arrived at the FMS 2.73 seconds after the first packet, a delay of 1.73 seconds. Without seeing a corresponding trace on the client machines, it is not possible to know for sure which recvfrom call was filled by the first UDP response packet. But given the significant delays in receiving the second client packets at the FMS, there is a good chance that the second FMS response packets did not arrive before the timeout period expired for the second recvfrom call. Although the second client packet in the other two traces arrived at the FMS within the normal interval (i.e., approximately one second after the first client packet), it is possible that the first UDP packet from the client was delayed and/or that one or both of the FMS response packets were delayed in reaching the client. Without seeing client traces, it is impossible to know for certain. In both of these traces, the ICMP message is seen arriving back at the FMS between 1.19 and 1.28 seconds after the second FMS response packet was sent. These seem like rather lengthy round trips, given that in the traces on my client machine, the interval between sending my UDP packet to the FMS and receiving a response was in the 0.000410 to 0.000499 seconds range. The relatively lengthy interval between transmission of the second response packet and receipt of the ICMP packet could indicate some general slowness on the network at this time. Regarding the fifth network trace in which two ICMP messages were generated, the second client packet did arrive at the FMS within the normal 1 second interval after the first packet. However, the two ICMP packets arrived 2.98 and 2.48 seconds, respectively, after FMS had sent the corresponding response packets. This indicates that there might have been some significant network slowness impacting delivery of the two FMS response packets. In this trace, it is possible that the FMP did not crash at all on the client machine and that it was actually a case of the two response packets not reaching the client before both recvfrom calls timed out. This client is seen on the trace attempting another connection 128 seconds later. None of this is conclusive, but, in the context of the UDP implementation used by FMP, the random network delay theory seems to offer a few plausible scenarios under which ICMP port unreachable messages can be generated. Summary to follow.
February 10, 200520 yr To summarize, disappearance of the FileMaker Server (FMS) from the hosts window is associated with the receipt by post-NT servers of an Internet Control Message Protocol (ICMP) port unreachable message. The ICMP message is generated by a client during the UDP discovery phase of an attempt at connecting to a FMS. A randomly occurring network delay causes the delivery of one or both UDP response packets from the FMS to the client to fail to reach the FileMaker Pro (FMP) client within the timeout period associated with FMP's WinSock recvfrom calls. By the time the delayed packet reaches FMP, the UDP socket has closed and an ICMP message is returned to FMS. The ICMP message is translated by the WinSock layer in post-NT servers into a WinSock error message (error message # 10054--WSAECONNRESET) which is then passed on to the UDP runtime portion of FMS. FMS is not programmed to expect an error message in this context and therefore cannot clear this message. During this time, the FMS service is still running, port 5003 is still open on the server and TCP sockets continue to function normally. However, FMS cannot respond to any more UDP packets from clients until this message is cleared, and the only way to clear the message is to restart the FileMaker Server service. All it takes is one ICMP message to produce this result, and all it takes to generate one ICMP message is for one UDP response packet from the FMS to fail to reach the FMP client within the required timeout interval associated with the WinSock recvfrom call. Useful references include Microsoft KnowledgeBase articles 245442 and 263823. As far as possible solutions go, one strategy would be to block these ICMP messages from reaching the server running FMS. I believe ICMP messages can be blocked at routers and/or firewalls. There may be some reluctance to do this, however (I don't think my network and server contacts are too keen on trying this), as ICMP messages provide tremendous network troubleshooting information. If there is a way to block a specific type of ICMP message (e.g., type 3 code 3) that is related to a packet sent from a specific port (e.g. 5003), then this might be more acceptable, but I don't know if this level of specificity is possible. Another option might be to block UDP altogether since the problem appears to be associated with the UDP discovery phase. FileMaker TechInfo article 106727 mentions the alternative of selectively blocking UDP packet routing for port 5003 but allowing TCP. I don't really understand this alternative, but I would be interested to know if anyone has tried it and whether, without UDP, discovery still works. Because the problem seems to hit so randomly, I don't imagine that there are any settings that can be tweaked on network devices (routers, switches, etc.) that would make this random network problem go away. But if anyone has any suggestions in this area, please respond. Other options might include reverting back to running FMS on Windows NT Server or, of course, upgrading to more recent versions of the FileMaker client and server products. But these are not always feasible options for everybody. Regarding more recent versions of FileMaker products, I notice a lot of people reporting this problem with Server 5 or 5.5. What Windows Server operating system are these FMS server products certified to run under? And has anyone seen or heard of this problem occurring with FMS 7? It appears that remediation attempts that focus on variables within specific databases (e.g., data, scripts, layouts, etc.) or on variables within the Windows server boxes do not work. The problem seems to be strictly a random network occurrence that exploits an incompatibility between post-NT Windows server OSes and certain versions of FMS. If anyone has other suggestions for dealing with this problem, please post them here. That's it. Sorry about all of the long-winded posts, but I've attempted to provide all of the relevant supporting data and background information so that the reasoning behind my conclusions and assumptions is somewhat clear. Hopefully I've got at least some of it right. Incidentally, there is a freely available network tracer and analyzer at www.ethereal.com that you can download and install on client and/or server machines and gather some network traces. It would be interesting to know whether anyone else who is experiencing this problem is seeing ICMP port unreachable messages in the traces, or whether there are other failure mechanisms at play.
February 12, 200520 yr It looks like the option mentioned above of blocking UDP as described in TechInfo article 106727 might provide a viable solution for those of you using FMP 5.x or 6. TechInfo Article 108618 seems to suggest that the TCP/IP network plugin used by these versions of FMP differs from that used in earlier versions in one significant respect: although they continue to use UDP, successful receipt of UDP packets from FMS is not required in order for the list of hosted databases to be displayed. I don't fully understand everything that 108618 is saying about this, but it certainly sounds like something worth trying. Even though successful receipt of UDP packets may not be required in FMP 5.x or 6, I suspect that an ICMP port unreachable message will be sent back to FMS for each UDP packet that is not successfully received by the FMP client. So to take advantage of this difference in the network plugin, it will be necessary to block UDP packets altogether from reaching FMS as described in 106727. If I understand this correctly, for those of us using FMP versions earlier than 5.x, it sounds like this is not a viable option for us, as the displaying of the list of hosted databases seems to depend upon successful UDP receipt by the FMP client. With these earlier FMP versions, blocking UDP packets from reaching the FMS would result in clients never seeing the list of hosted databases.
February 24, 200520 yr Just a quick update. We are in the very early stages of testing FMS 7 Advanced on W2K3. For fun, I ran some network traces on my client machine (using FM 7 Developer on XP SP1) to capture traffic to and from FMS 7 Advanced in two scenarios: (1) connecting to a database via the TCP network; and (2) connecting to a web-published database (Instant Web Publishing) via Internet Explorer 6.0 (SP1). In both cases, the first packets seen are TCP, not UDP. After the initial exchange of TCP packets in the TCP network scenario, the session continues with a mix of TCP and General Inter-ORB Protocol (GIOP) packets. In the web-publishing scenario, the session continues with TCP and HTTP packets. There are no UDP packets exchanged at any time in either scenario. UDP appears to be out of the equation entirely in 7, which suggests that the disappearing host problem may not occur in 7. We haven't done any testing over our WAN yet with our regional clients, but I'll try to provide updates as they become available.
March 28, 200520 yr Newbies Would clearing the application log help with the errors? (something I heard)
April 8, 200520 yr Thanks for the suggestion, Rundvelt. I'm not familiar with what types of problems would be solved by clearing the application log, but I would be interested in any further details that you can provide. Regarding the application log, when the problem occurs with FMS 3.0v4, there is no message of any kind (error or otherwise) generated in any of the server event logs, including the application log. I think this is because FMS is still running (as evidenced by the continued communication with established TCP connections) and the problem that has occurred in the WinSock UDP layer is not the type of problem that would generate a recordable error in an event log. When the WSAECONNRESET error message is received, my guess is that the UDP portion of FMS enters a suspended or paused state rather than entering a state that would generate an error event--i.e., it is still running but cannot respond further until some action is taken. This would be something like running a FM script in your database that then pauses indefinitely waiting for user input or generates an error message that must be acknowledged before the script can continue. Neither of these types of occurrences are fatal to the FM application and would not be captured as errors in something like an event log, as FM is still running but it cannot do anything until the required user intervention/response occurs. Unfortunately, in the case of the WSAECONNRESET error message, there is no possible user intervention that will address this; the intervention can only come from within the UDP application itself, and it appears that FMS 3.0v4 has not been programmed to deal with this situation. Incidentally, when FMS 3.0v4 is stopped and restarted, these stop and start events are captured in the server application log.
June 3, 200520 yr Just a final note to close out this old thread. I've had about 60+ users live on version 7 for about 4 months now and have not had even 1 incident. I believe that this problem was solved in version 7 probably because v7 doesn't utilize UDP (network) protocol unlike earlier versions. Zardoz confirmed this with his testing. There is a lot, I mean A LOT of information in this thread. Zardoz deserves credit for finding and explaining the root cause. To future readers of this thread, don't be frightened-off by the volume of data. If you are interested in reading about the deeply technical aspect then read the lengthy posts; if you are interested in a practical solution, just follow the shorter posts and you will find the answer there.
June 5, 200520 yr I got hit, but for a different reason I have FM Server 5 on Windows 2000. I just added a second FM Server 5 on Windows XP. Now, one or the other server drops off the "hosts" list (but not both) There appears to be something in Windows XP SP2 that interferes with the hosts broadcast, in a "mixed" network (XP, 2000, OS 9, OS X). In my case, removing the second FM Server on XP fixes the problem
July 5, 200520 yr Hi, can this issue be related to my problem with FMPU on windows 2003? Is it because windows 2003??
July 5, 200520 yr xtrim, I quickly read your posts and it doesn't look like the same problem. You are running web based and I was 100% fat client. I also did not lose the FM Service, just the listing of hosted files. In fact there is some evidence that the files continued to be hosted but since nobody could see them we were under the impression that they were no longer being hosted. I'm sorry but I can't shed any more light on your problem that the others already have. Good Luck.
Create an account or sign in to comment