* Debian Bug#203077: Locks not released on NFS client reboot [not found] <Pine.OSF.4.56.0307271137050.10355@grover.WPI.EDU> @ 2003-07-27 16:31 ` Chip Salzenberg 2003-07-28 0:56 ` Neil Brown 0 siblings, 1 reply; 25+ messages in thread From: Chip Salzenberg @ 2003-07-27 16:31 UTC (permalink / raw) To: nfs I confess I don't quite understand this bug report. Is this person asking for something that NFS can't do, or is there perhaps some error in configuration ... ? Please advise. The report is again Debian stable (woody), which uses nfs-utils 1.02. According to Nick Nassar: > File locks on NFS clients are not released on reboot. When machines shut > down unexpectedly for whatever reason, the locks stay there with no > apparent way to clear them except restarting NFS on the server. > > See "I'm having a lock file problem. What do I do?" in > http://www.gnome.org/projects/gconf/ for an example of the kind of problem > caused by this. > > The machine that I'm specifically having the trouble with is using a stock > 2.4.20 kernel that I compiled from source. -- Chip Salzenberg - a.k.a. - <chip@pobox.com> "I wanted to play hopscotch with the impenetrable mystery of existence, but he stepped in a wormhole and had to go in early." // MST3K ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2003-07-27 16:31 ` Debian Bug#203077: Locks not released on NFS client reboot Chip Salzenberg @ 2003-07-28 0:56 ` Neil Brown 2005-01-14 1:51 ` Ara.T.Howard 0 siblings, 1 reply; 25+ messages in thread From: Neil Brown @ 2003-07-28 0:56 UTC (permalink / raw) To: Chip Salzenberg; +Cc: nfs On Sunday July 27, chip@pobox.com wrote: > I confess I don't quite understand this bug report. Is this person > asking for something that NFS can't do, or is there perhaps some error > in configuration ... ? > > Please advise. This is something that NFS *can* do, so it's either a config error or a source-code error... When the server gets a lock request from a client, it asked the local statd to monitor that client. The server-statd contacts the client-statd and tells it that it wants to know about reboot. The client-statd record the address of the server in /var/lib/nfs/sm/ as a new empty file. Only if this succeeds does the server grant the lock. When the client reboots, statd will be restarted and will move all files from .../sm/ to .../sm.bak/ and will then iterate through those files sending a message to statd on the relevant servers telling them that a reboot has happened. The server-statd will, on receipt of this message, tell the server-lockd that the client has rebooted, and lockd will release all the locks. I just tried this and it worked (2.4.19/21 kernels and nfs-utils 1.0.5) so it doesn't seem to be a source-code error. The mostly likely config error is not running statd on the client. i.e. not having nfs-common installed. But given that the bug was reported against nfs-common, that seems unlikely in this case. The next mosty likely would be tcpwrappers problems. I notice that the man page for statd says that "You have to give the clients access to rpc.statd", but ofcourse you need to give the server access to statd on the clients aswell. In order for the server to allow a lock from the client, statd on the client must allow access from the server. In order for the client to be able to revoke locks on reboot, statd on the server must allow access from the client. It is possible that the problem is caused by the server not allowing statd requests from the client. Looking at the tcp_wrapper stuff used by statd, it looks rather bogus(*), though I think it is more likely to give away access that it shouldn't rather than restrict access that it should grant. Anyway, I suggest that the person having the problem tries: rpcinfo -u SERVERNAME status from the client, and rpcinfo -u CLIENTNAME status from the server and checks that both works. If either doesn't I suspect that is the problem. If they both work.... I don't know. (*) good_client in tcp_wrapper.c calls hosts_ctl twice, once with IP address and once with hostname. If the first successed, the second isn't tried. So if my hosts.deny says that a specific hostname is restricted, but doesn't say the IP address is restricted, then access is granted. NeilBrown ------------------------------------------------------- This SF.Net email sponsored by: Free pre-built ASP.NET sites including Data Reports, E-commerce, Portals, and Forums are available now. Download today and enter to win an XBOX or Visual Studio .NET. http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01 _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2003-07-28 0:56 ` Neil Brown @ 2005-01-14 1:51 ` Ara.T.Howard 2005-01-14 2:29 ` Neil Brown 2005-01-14 3:01 ` Trond Myklebust 0 siblings, 2 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 1:51 UTC (permalink / raw) To: Neil Brown; +Cc: Chip Salzenberg, nfs On Mon, 28 Jul 2003, Neil Brown wrote: > On Sunday July 27, chip@pobox.com wrote: >> I confess I don't quite understand this bug report. Is this person >> asking for something that NFS can't do, or is there perhaps some error >> in configuration ... ? >> >> Please advise. > SNIP > > Anyway, I suggest that the person having the problem tries: > > rpcinfo -u SERVERNAME status > from the client, and > > rpcinfo -u CLIENTNAME status > from the server > and checks that both works. If either doesn't I suspect that is the > problem. If they both work.... I don't know. SNIP i am seeing problems here on my system (which has rebooted and now has stale locks on server) client: bligh:~ > rpcinfo -u mussel status program 100024 version 1 ready and waiting server: mussel:~ > rpcinfo -u bligh status rpcinfo: RPC: Port mapper failure - RPC: Unable to receive program 100024 is not available the client rebooted into a new kernel (latest enterprise) and the server has not. could this cause this problem? if not what other info should i be looking for. not sure if this is helpful but: client: bligh:~ > rpcinfo -p program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 32768 status 100024 1 tcp 32768 status 100021 1 udp 32769 nlockmgr 100021 3 udp 32769 nlockmgr 100021 4 udp 32769 nlockmgr 100021 1 tcp 32769 nlockmgr 100021 3 tcp 32769 nlockmgr 100021 4 tcp 32769 nlockmgr server: mussel:~ > rpcinfo -p program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 32768 status 100024 1 tcp 32768 status 100011 1 udp 768 rquotad 100011 2 udp 768 rquotad 100011 1 tcp 771 rquotad 100011 2 tcp 771 rquotad 100003 2 udp 2049 nfs 100003 3 udp 2049 nfs 100003 2 tcp 2049 nfs 100003 3 tcp 2049 nfs 100021 1 udp 33551 nlockmgr 100021 3 udp 33551 nlockmgr 100021 4 udp 33551 nlockmgr 100021 1 tcp 37939 nlockmgr 100021 3 tcp 37939 nlockmgr 100021 4 tcp 37939 nlockmgr 100005 1 udp 784 mountd 100005 1 tcp 787 mountd 100005 2 udp 784 mountd 100005 2 tcp 787 mountd 100005 3 udp 784 mountd 100005 3 tcp 787 mountd thanks in advance for any help. kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 1:51 ` Ara.T.Howard @ 2005-01-14 2:29 ` Neil Brown 2005-01-14 2:57 ` Ara.T.Howard ` (3 more replies) 2005-01-14 3:01 ` Trond Myklebust 1 sibling, 4 replies; 25+ messages in thread From: Neil Brown @ 2005-01-14 2:29 UTC (permalink / raw) To: Ara.T.Howard; +Cc: Chip Salzenberg, nfs On Thursday January 13, Ara.T.Howard@noaa.gov wrote: > > i am seeing problems here on my system (which has rebooted and now has stale > locks on server) > .. > server: > > mussel:~ > rpcinfo -u bligh status > rpcinfo: RPC: Port mapper failure - RPC: Unable to receive > program 100024 is not available ... > > client: > > bligh:~ > rpcinfo -p > program vers proto port > 100000 2 tcp 111 portmapper > 100000 2 udp 111 portmapper > 100024 1 udp 32768 status > 100024 1 tcp 32768 status ... So bligh, the client, is running statd (the "status" service), but mussel can not talk to it. This is a problem. It would appear that some for of firewall is blocking access to bligh's statd from mussel, or that bligh's statd is ignoring requests from mussel. I don't know which. NeilBrown ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 2:29 ` Neil Brown @ 2005-01-14 2:57 ` Ara.T.Howard 2005-01-14 16:05 ` Ara.T.Howard ` (2 subsequent siblings) 3 siblings, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 2:57 UTC (permalink / raw) To: Neil Brown; +Cc: Chip Salzenberg, nfs On Fri, 14 Jan 2005, Neil Brown wrote: > On Thursday January 13, Ara.T.Howard@noaa.gov wrote: >> >> i am seeing problems here on my system (which has rebooted and now has stale >> locks on server) >> > .. >> server: >> >> mussel:~ > rpcinfo -u bligh status >> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive >> program 100024 is not available > ... >> >> client: >> >> bligh:~ > rpcinfo -p >> program vers proto port >> 100000 2 tcp 111 portmapper >> 100000 2 udp 111 portmapper >> 100024 1 udp 32768 status >> 100024 1 tcp 32768 status > ... > > > So bligh, the client, is running statd (the "status" service), but > mussel can not talk to it. This is a problem. > > It would appear that some for of firewall is blocking access to > bligh's statd from mussel, or that bligh's statd is ignoring requests > from mussel. I don't know which. > > NeilBrown alright - i'll look into this. as you might have guessed, i'm the developer and not the sysad so i can't do much at the moment. i'm guessing you are correct on the former count. i surely wasn't told about any changes but that has been known to happen before. in addition to that government security policies get stricter and stricter and redhat could have done something in the new kernel that is 'safer'. i'll investigate and get back to you. thanks tons for the lead... i'll post more tomorrow. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 2:29 ` Neil Brown 2005-01-14 2:57 ` Ara.T.Howard @ 2005-01-14 16:05 ` Ara.T.Howard 2005-01-14 19:43 ` Trond Myklebust 2005-01-14 17:47 ` Dan Stromberg 2005-01-21 0:24 ` Ara.T.Howard 3 siblings, 1 reply; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 16:05 UTC (permalink / raw) To: Neil Brown; +Cc: Chip Salzenberg, nfs On Fri, 14 Jan 2005, Neil Brown wrote: > So bligh, the client, is running statd (the "status" service), but mussel > can not talk to it. This is a problem. are you saying inbound rpc traffic flowing from server -> client MUST not be blocked by the firewall and that it is NOT sufficient to allow ONLY inbound rpc traffic client -> server? sorry if this does not make sense - i'm a bit out of my domain here... > It would appear that some for of firewall is blocking access to bligh's > statd from mussel, or that bligh's statd is ignoring requests from mussel. > I don't know which. does that fit with this senario: - after reboot client/server have stale locks - oddly enough though, locking DOES work between client and server the reason it works (even on the files with stale locks) is that i have built in my own 'leasing' system to all the files i lock. it basically does if get_lock refresher = forked_process_touching_file_at_interval at_exit{ release_lock_and_kill_refresher } else if lock_is_too_old mv file file.tmp && mv file.tmp file end retry end although it's quite a bit smarter than that (for instance it uses an nfs safe lockfile to ensure only one node could attempt lock recovery at a time). this seems to work because it give the file a new inode and, therefore, the stale lock is invalidated - though it obviously still exists. whenever i attempt this procedure - which is admittedly pretty sketchy - i send emails to myself detailing the file in question (stale lock), it's inode, etc. i have only ever seen this happen one time in 8 months and that was during brutal testing that did a bunch of kill -9's on things. that was before yesterday - yesterday AALL my processes ran this procedure and this is how i came to know that the system was fubar. so, in summary, does your understanding indicate that it should be possible for locks themselves to work but lock recovery to fail? is that consistent with some sort of firewall mis-config between server and client? eg. is the traffic pattern required different for the two? many thanks for the insight! kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 16:05 ` Ara.T.Howard @ 2005-01-14 19:43 ` Trond Myklebust 2005-01-14 19:50 ` Ara.T.Howard 2005-01-18 20:59 ` Ara.T.Howard 0 siblings, 2 replies; 25+ messages in thread From: Trond Myklebust @ 2005-01-14 19:43 UTC (permalink / raw) To: Ara.T.Howard; +Cc: Neil Brown, Chip Salzenberg, nfs fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard: > On Fri, 14 Jan 2005, Neil Brown wrote: > > > So bligh, the client, is running statd (the "status" service), but mussel > > can not talk to it. This is a problem. > > are you saying inbound rpc traffic flowing from server -> client MUST not be > blocked by the firewall and that it is NOT sufficient to allow ONLY inbound > rpc traffic client -> server? sorry if this does not make sense - i'm a bit > out of my domain here... Bi-directional RPC traffic must be allowed if you plan on using NLM locking, since it is callback based. A couple of issues that immediately spring to mind in the case where the server cannot call the client are: - Blocking locks (F_SETLKW) will be hampered since the client expects the server's lockd daemon to call it back as soon as any conflicting locks have been released and the lock granted... - Server reboot recovery will be broken, since the server's rpc.statd daemon will be incapable of notifying the clients that their locks have been lost and need to be recovered. Cheers, Trond -- Trond Myklebust <trond.myklebust@fys.uio.no> ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 19:43 ` Trond Myklebust @ 2005-01-14 19:50 ` Ara.T.Howard 2005-01-18 20:59 ` Ara.T.Howard 1 sibling, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 19:50 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Chip Salzenberg, nfs On Fri, 14 Jan 2005, Trond Myklebust wrote: > fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard: >> On Fri, 14 Jan 2005, Neil Brown wrote: >> >>> So bligh, the client, is running statd (the "status" service), but mussel >>> can not talk to it. This is a problem. >> >> are you saying inbound rpc traffic flowing from server -> client MUST not be >> blocked by the firewall and that it is NOT sufficient to allow ONLY inbound >> rpc traffic client -> server? sorry if this does not make sense - i'm a bit >> out of my domain here... > > Bi-directional RPC traffic must be allowed if you plan on using NLM > locking, since it is callback based. A couple of issues that immediately > spring to mind in the case where the server cannot call the client are: > > - Blocking locks (F_SETLKW) will be hampered since the client > expects the server's lockd daemon to call it back as soon as any > conflicting locks have been released and the lock granted... > > - Server reboot recovery will be broken, since the server's > rpc.statd daemon will be incapable of notifying the clients that > their locks have been lost and need to be recovered. > > Cheers, > Trond > -- > Trond Myklebust <trond.myklebust@fys.uio.no> thanks trond! i don't know if you remember - but i was complaining about F_SETLKW performace a while back. ;-) sounds like this IS the problem... i poked around the nfs-fag and how-to and didn't see this... if i didn't miss it (quite possible) may suggest it be added? i certainly will volunteer but am unsure of the procedure. kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 19:43 ` Trond Myklebust 2005-01-14 19:50 ` Ara.T.Howard @ 2005-01-18 20:59 ` Ara.T.Howard 1 sibling, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-18 20:59 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Chip Salzenberg, nfs On Fri, 14 Jan 2005, Trond Myklebust wrote: > fr den 14.01.2005 Klokka 09:05 (-0700) skreiv Ara.T.Howard: >> On Fri, 14 Jan 2005, Neil Brown wrote: >> >>> So bligh, the client, is running statd (the "status" service), but mussel >>> can not talk to it. This is a problem. >> >> are you saying inbound rpc traffic flowing from server -> client MUST not be >> blocked by the firewall and that it is NOT sufficient to allow ONLY inbound >> rpc traffic client -> server? sorry if this does not make sense - i'm a bit >> out of my domain here... > > Bi-directional RPC traffic must be allowed if you plan on using NLM > locking, since it is callback based. A couple of issues that immediately > spring to mind in the case where the server cannot call the client are: > > - Blocking locks (F_SETLKW) will be hampered since the client > expects the server's lockd daemon to call it back as soon as any > conflicting locks have been released and the lock granted... > > - Server reboot recovery will be broken, since the server's > rpc.statd daemon will be incapable of notifying the clients that > their locks have been lost and need to be recovered. > > Cheers, > Trond > -- > Trond Myklebust <trond.myklebust@fys.uio.no> to anyone following this thread: this was indeed the problem - we had a firewall rule in place that allowed only one-way traffic. if you are having lock recovery issues look here! cheers. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 2:29 ` Neil Brown 2005-01-14 2:57 ` Ara.T.Howard 2005-01-14 16:05 ` Ara.T.Howard @ 2005-01-14 17:47 ` Dan Stromberg 2005-01-14 18:19 ` Ara.T.Howard 2005-01-21 0:24 ` Ara.T.Howard 3 siblings, 1 reply; 25+ messages in thread From: Dan Stromberg @ 2005-01-14 17:47 UTC (permalink / raw) To: Neil Brown; +Cc: strombrg, Ara.T.Howard, Chip Salzenberg, nfs [-- Attachment #1: Type: text/plain, Size: 1417 bytes --] On Fri, 2005-01-14 at 13:29 +1100, Neil Brown wrote: > On Thursday January 13, Ara.T.Howard@noaa.gov wrote: > > > > i am seeing problems here on my system (which has rebooted and now has stale > > locks on server) > > > .. > > server: > > > > mussel:~ > rpcinfo -u bligh status > > rpcinfo: RPC: Port mapper failure - RPC: Unable to receive > > program 100024 is not available > ... > > > > client: > > > > bligh:~ > rpcinfo -p > > program vers proto port > > 100000 2 tcp 111 portmapper > > 100000 2 udp 111 portmapper > > 100024 1 udp 32768 status > > 100024 1 tcp 32768 status > ... > > > So bligh, the client, is running statd (the "status" service), but > mussel can not talk to it. This is a problem. > > It would appear that some for of firewall is blocking access to > bligh's statd from mussel, or that bligh's statd is ignoring requests > from mussel. I don't know which. > > NeilBrown I'm actually seeing a lot of problems on *ix systems were a service is registered, but then the corresponding daemon doesn't actually service requests. My rpc-health script allowed me to identify a lot of such problems fairly quickly: http://dcs.nac.uci.edu/~strombrg/rpc-health.html ...so I guess the upshot is "It isn't necessarily a firewall problem". [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 17:47 ` Dan Stromberg @ 2005-01-14 18:19 ` Ara.T.Howard 2005-01-14 21:20 ` Dan Stromberg 0 siblings, 1 reply; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 18:19 UTC (permalink / raw) To: Dan Stromberg Cc: Neil Brown, Chip Salzenberg, nfs, thomas.r.carey, Mark O Sleeper On Fri, 14 Jan 2005, Dan Stromberg wrote: > On Fri, 2005-01-14 at 13:29 +1100, Neil Brown wrote: >> On Thursday January 13, Ara.T.Howard@noaa.gov wrote: >>> >>> i am seeing problems here on my system (which has rebooted and now has stale >>> locks on server) >>> >> .. >>> server: >>> >>> mussel:~ > rpcinfo -u bligh status >>> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive >>> program 100024 is not available >> ... >>> >>> client: >>> >>> bligh:~ > rpcinfo -p >>> program vers proto port >>> 100000 2 tcp 111 portmapper >>> 100000 2 udp 111 portmapper >>> 100024 1 udp 32768 status >>> 100024 1 tcp 32768 status >> ... >> >> >> So bligh, the client, is running statd (the "status" service), but >> mussel can not talk to it. This is a problem. >> >> It would appear that some for of firewall is blocking access to >> bligh's statd from mussel, or that bligh's statd is ignoring requests >> from mussel. I don't know which. >> >> NeilBrown > > I'm actually seeing a lot of problems on *ix systems were a service is > registered, but then the corresponding daemon doesn't actually service > requests. > > My rpc-health script allowed me to identify a lot of such problems > fairly quickly: > > http://dcs.nac.uci.edu/~strombrg/rpc-health.html > > ...so I guess the upshot is "It isn't necessarily a firewall problem". nice! it is showing (mussel=server, bligh=client) : mussel: ~ > ./rpc-health bligh rpcinfo: can't contact portmapper: RPC: Remote system error - No route to host bligh: ~ > ./rpc-health mussel Program portmapper/100000, Proto tcp, Version 2 is OK Program portmapper/100000, Proto udp, Version 2 is OK Program status/100024, Proto udp, Version 1 is OK Program status/100024, Proto tcp, Version 1 is BAD <======== Program rquotad/100011, Proto udp, Version 1 is OK Program rquotad/100011, Proto udp, Version 2 is OK Program rquotad/100011, Proto tcp, Version 1 is BAD <======== Program rquotad/100011, Proto tcp, Version 2 is BAD <======== Program nfs/100003, Proto udp, Version 2 is OK Program nfs/100003, Proto udp, Version 3 is OK Program nfs/100003, Proto tcp, Version 2 is OK Program nfs/100003, Proto tcp, Version 3 is OK Program nlockmgr/100021, Proto udp, Version 1 is OK Program nlockmgr/100021, Proto udp, Version 3 is OK Program nlockmgr/100021, Proto udp, Version 4 is OK Program nlockmgr/100021, Proto tcp, Version 1 is BAD <======== Program nlockmgr/100021, Proto tcp, Version 3 is BAD <======== Program nlockmgr/100021, Proto tcp, Version 4 is BAD <======== Program mountd/100005, Proto udp, Version 1 is OK Program mountd/100005, Proto tcp, Version 1 is BAD <======== Program mountd/100005, Proto udp, Version 2 is OK Program mountd/100005, Proto tcp, Version 2 is BAD <======== Program mountd/100005, Proto udp, Version 3 is OK Program mountd/100005, Proto tcp, Version 3 is BAD <======== so apparently our system is severly misconfigured! i'm guess all the BAD's for tcp are o.k. but that the 'no route to host' is not a good thing. sound accurate? btw. here is a small patch: [ahoward@mussel ahoward]$ diff -u rpc-health.org rpc-health --- rpc-health.org 2005-01-06 17:48:53.000000000 -0700 +++ rpc-health 2005-01-14 11:11:20.000000000 -0700 @@ -1,7 +1,9 @@ -#!/dcs/bin/bash2 +#!/usr/bin/env bash #set -x +PATH=$PATH:/usr/sbin:sbin # for rpcinfo + function usage { echo Usage "$0" hostname 1>&2 kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 18:19 ` Ara.T.Howard @ 2005-01-14 21:20 ` Dan Stromberg 0 siblings, 0 replies; 25+ messages in thread From: Dan Stromberg @ 2005-01-14 21:20 UTC (permalink / raw) To: Ara.T.Howard Cc: strombrg, Neil Brown, Chip Salzenberg, nfs, thomas.r.carey, Mark O Sleeper [-- Attachment #1: Type: text/plain, Size: 2357 bytes --] On Fri, 2005-01-14 at 11:19 -0700, Ara.T.Howard wrote: > nice! Why thank you. :) > it is showing (mussel=server, bligh=client) : > > mussel: > > ~ > ./rpc-health bligh > rpcinfo: can't contact portmapper: RPC: Remote system error - No route to > host That sounds like your portmapper is blocked somehow. > bligh: > > ~ > ./rpc-health mussel > Program portmapper/100000, Proto tcp, Version 2 is OK > Program portmapper/100000, Proto udp, Version 2 is OK > Program status/100024, Proto udp, Version 1 is OK > Program status/100024, Proto tcp, Version 1 is BAD <======== > Program rquotad/100011, Proto udp, Version 1 is OK > Program rquotad/100011, Proto udp, Version 2 is OK > Program rquotad/100011, Proto tcp, Version 1 is BAD <======== > Program rquotad/100011, Proto tcp, Version 2 is BAD <======== > Program nfs/100003, Proto udp, Version 2 is OK > Program nfs/100003, Proto udp, Version 3 is OK > Program nfs/100003, Proto tcp, Version 2 is OK > Program nfs/100003, Proto tcp, Version 3 is OK > Program nlockmgr/100021, Proto udp, Version 1 is OK > Program nlockmgr/100021, Proto udp, Version 3 is OK > Program nlockmgr/100021, Proto udp, Version 4 is OK > Program nlockmgr/100021, Proto tcp, Version 1 is BAD <======== > Program nlockmgr/100021, Proto tcp, Version 3 is BAD <======== > Program nlockmgr/100021, Proto tcp, Version 4 is BAD <======== > Program mountd/100005, Proto udp, Version 1 is OK > Program mountd/100005, Proto tcp, Version 1 is BAD <======== > Program mountd/100005, Proto udp, Version 2 is OK > Program mountd/100005, Proto tcp, Version 2 is BAD <======== > Program mountd/100005, Proto udp, Version 3 is OK > Program mountd/100005, Proto tcp, Version 3 is BAD <======== > > so apparently our system is severly misconfigured! i'm guess all the BAD's > for tcp are o.k. but that the 'no route to host' is not a good thing. sound > accurate? Those bads are probably instances of services that are registered, but do not respond to a minimalist, "ping like" RPC procedure. They may actually be troublesome. Or not. :) > btw. here is a small patch: Thanks. I've incorporated something along these lines now. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 2:29 ` Neil Brown ` (2 preceding siblings ...) 2005-01-14 17:47 ` Dan Stromberg @ 2005-01-21 0:24 ` Ara.T.Howard 2005-01-21 0:52 ` Trond Myklebust 3 siblings, 1 reply; 25+ messages in thread From: Ara.T.Howard @ 2005-01-21 0:24 UTC (permalink / raw) To: Neil Brown; +Cc: Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey On Fri, 14 Jan 2005, Neil Brown wrote: > So bligh, the client, is running statd (the "status" service), but mussel > can not talk to it. This is a problem. > > It would appear that some for of firewall is blocking access to bligh's > statd from mussel, or that bligh's statd is ignoring requests from mussel. > I don't know which. so i thought we had this figured - but it seems we do not. here is what we are (still) seeing client > obtain_lock server > cat /proc/locks # shows client pid client > reboot client > obtain_lock # fails server > cat /proc/locks # shows OLD client pid so lock recovery is still not working. our firewalls are as follows: server iptables: *filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -N NFS -N ICMP -A INPUT -i lo -j ACCEPT -A INPUT -p icmp --icmp-type any -j ICMP -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH -A INPUT -m state --state NEW -p udp -m udp -j NFS -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited -A INPUT -j REJECT --reject-with icmp-host-prohibited -A ICMP -s 0/0 -j ACCEPT -A ICMP -j REJECT --reject-with icmp-host-prohibited -A NFS -s 10.1.0.0/16 -j ACCEPT -A NFS -j REJECT --reject-with icmp-host-prohibited -A SSH -s 10.1.0.0/16 -j ACCEPT -A SSH -j REJECT --reject-with icmp-host-prohibited COMMIT client iptables: filter :INPUT ACCEPT [0:0] :FORWARD ACCEPT [0:0] :OUTPUT ACCEPT [0:0] -N NFS -N ICMP -A INPUT -i lo -j ACCEPT -A INPUT -p tcp --dport 111 -j NFS -A INPUT -p udp --dport 111 -j NFS -A INPUT -p tcp --dport 32768 -j NFS -A INPUT -p udp --dport 32768 -j NFS -A INPUT -p tcp --dport 32769 -j NFS -A INPUT -p udp --dport 32769 -j NFS -A INPUT -s 10.1.0.0/16 -j ACCEPT -A INPUT -p icmp --icmp-type any -j ICMP -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited -A INPUT -j REJECT --reject-with icmp-host-prohibited -A ICMP -s 0/0 -j ACCEPT -A ICMP -j REJECT --reject-with icmp-host-prohibited -A SSH -s 10.1.0.0/16 -j ACCEPT -A SSH -j REJECT --reject-with icmp-host-prohibited -A NFS -s 10.1.0.0/16 -j ACCEPT -A NFS -j REJECT --reject-with icmp-host-prohibited COMMIT if i understand correctly (and i realize this is off list) this should be allowing everything between server and client. we addeded the hole between client and server to confirm that the firewall is not the problem. we still, however, see the problem. btw - we a 'clearing' the lock in question by doing mv file_with_stale_lock foobar && mv foobar file_with_stale_lock to give it a fresh inode. this leaves the record in /proc/locks but allows us to continue testing (we can again get the lock). is there a better way to do this that cleans out /proc/locks? is anything obvious here? some other bits of info: - both the server and client have two network cards (frontdoor/backdoor). nfs runs all on back door. the holes we opened up were on both client and server for both frontdoor/backdoor. - all names live in dns (server, server.b) - we are seeing this kind of thing (not only assoc with lock recovery) in /var/log/messages rpc.statd[1734]: Received erroneous SM_UNMON request from <client> for <server> i gather this is cause by some name confusion... so. where to go from here? i can reproduce a 'dead' lock at will by simply rebooting a client while holding a lock. if i understand correctly the server should be notified by the client of any locks it held before halting on the subsequent reboot? can this communication be logged verbosly somehow? is there an easier way to cause the notification of old locks to the server? perhaps something like 'service nfslock restart' or is rebooting the only way? sorry for false positive earlier. kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 0:24 ` Ara.T.Howard @ 2005-01-21 0:52 ` Trond Myklebust 2005-01-21 18:32 ` Ara.T.Howard 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2005-01-21 0:52 UTC (permalink / raw) To: Ara.T.Howard Cc: Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard: > server iptables: > > *filter > :INPUT ACCEPT [0:0] > :FORWARD ACCEPT [0:0] > :OUTPUT ACCEPT [0:0] > -N NFS > -N ICMP > -A INPUT -i lo -j ACCEPT > -A INPUT -p icmp --icmp-type any -j ICMP > -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT > -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH > -A INPUT -m state --state NEW -p udp -m udp -j NFS > -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS > -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS > -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited > -A INPUT -j REJECT --reject-with icmp-host-prohibited > -A ICMP -s 0/0 -j ACCEPT > -A ICMP -j REJECT --reject-with icmp-host-prohibited > -A NFS -s 10.1.0.0/16 -j ACCEPT > -A NFS -j REJECT --reject-with icmp-host-prohibited > -A SSH -s 10.1.0.0/16 -j ACCEPT > -A SSH -j REJECT --reject-with icmp-host-prohibited > COMMIT Where is the rule to accept incoming rpc.statd connections? Cheers, Trond -- Trond Myklebust <trond.myklebust@fys.uio.no> ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 0:52 ` Trond Myklebust @ 2005-01-21 18:32 ` Ara.T.Howard 2005-01-21 18:40 ` Dan Stromberg 0 siblings, 1 reply; 25+ messages in thread From: Ara.T.Howard @ 2005-01-21 18:32 UTC (permalink / raw) To: Trond Myklebust Cc: Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey On Thu, 20 Jan 2005, Trond Myklebust wrote: > to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard: > >> server iptables: >> >> *filter >> :INPUT ACCEPT [0:0] >> :FORWARD ACCEPT [0:0] >> :OUTPUT ACCEPT [0:0] >> -N NFS >> -N ICMP >> -A INPUT -i lo -j ACCEPT >> -A INPUT -p icmp --icmp-type any -j ICMP >> -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH >> -A INPUT -m state --state NEW -p udp -m udp -j NFS >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS >> -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited >> -A INPUT -j REJECT --reject-with icmp-host-prohibited >> -A ICMP -s 0/0 -j ACCEPT >> -A ICMP -j REJECT --reject-with icmp-host-prohibited >> -A NFS -s 10.1.0.0/16 -j ACCEPT >> -A NFS -j REJECT --reject-with icmp-host-prohibited >> -A SSH -s 10.1.0.0/16 -j ACCEPT >> -A SSH -j REJECT --reject-with icmp-host-prohibited >> COMMIT > > Where is the rule to accept incoming rpc.statd connections? > > Cheers, > Trond sorry, we edited out the critical info. on the client we have [root@bligh root]# rpcinfo -p program vers proto port 100000 2 tcp 111 portmapper 100000 2 udp 111 portmapper 100024 1 udp 32768 status 100024 1 tcp 32768 status 100021 1 udp 32769 nlockmgr 100021 3 udp 32769 nlockmgr 100021 4 udp 32769 nlockmgr 100021 1 tcp 32769 nlockmgr 100021 3 tcp 32769 nlockmgr 100021 4 tcp 32769 nlockmgr [root@bligh root]# grep NFS /etc/sysconfig/iptables -N NFS -A INPUT -p tcp --dport 111 -j NFS -A INPUT -p udp --dport 111 -j NFS -A INPUT -p tcp --dport 32768:32769 -j NFS -A INPUT -p udp --dport 32768:32769 -j NFS -A NFS -s 10.1.186.70/32 -j ACCEPT -A NFS -j REJECT --reject-with icmp-host-prohibited this did not work. just to be safe we added on the server - where ip is the client's ip -A INPUT -s 10.1.186.71/32 -j ACCEPT on the client, where ip is the server's ip -A INPUT -s 10.1.186.54/32 -j ACCEPT to the top of our ruleset before ANY denys. still no go. on shutdown/reboot of the client there are no error message whatsoever. however, we are seeing lots of these in /var/log/messages ... ... ... Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.62 Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.67 ... ... ... where the ips are those of various clients that are successfully performing locking. what is this about? as i said before, both the server and all clients are multihomed with nfs running only on the backdoor. the frontdoor/backdoor have names like name, name.b respectively. these names are all in dns. is there any chance this could be related to lock recovery failure? our sysad has suggested starting a tcpdump in the nfslock init.d script to see what's happening - any other suggestions? kind regards. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 18:32 ` Ara.T.Howard @ 2005-01-21 18:40 ` Dan Stromberg 2005-01-21 18:59 ` Ara.T.Howard 0 siblings, 1 reply; 25+ messages in thread From: Dan Stromberg @ 2005-01-21 18:40 UTC (permalink / raw) To: Ara.T.Howard Cc: strombrg, Trond Myklebust, Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey [-- Attachment #1: Type: text/plain, Size: 5034 bytes --] A sniffer is your friend. ethereal is the best I've encountered. If you fire up an ethereal to sniff packets from your client on the server, and/or vice versa, you will likely get an idea of what's wrong fairly quickly. BTW, last I heard, NFS on linux could be firewalled, but it required starting up some daemons with some magic options to hard code them to specific ports, rather than allowing portmap/rpcbind to move them around. On Fri, 2005-01-21 at 11:32 -0700, Ara.T.Howard wrote: > On Thu, 20 Jan 2005, Trond Myklebust wrote: > > > to den 20.01.2005 Klokka 17:24 (-0700) skreiv Ara.T.Howard: > > > >> server iptables: > >> > >> *filter > >> :INPUT ACCEPT [0:0] > >> :FORWARD ACCEPT [0:0] > >> :OUTPUT ACCEPT [0:0] > >> -N NFS > >> -N ICMP > >> -A INPUT -i lo -j ACCEPT > >> -A INPUT -p icmp --icmp-type any -j ICMP > >> -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT > >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 22 -j SSH > >> -A INPUT -m state --state NEW -p udp -m udp -j NFS > >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 2049 -j NFS > >> -A INPUT -m state --state NEW -p tcp -m tcp --dport 111 -j NFS > >> -A INPUT -m state --state NEW,INVALID -j REJECT --reject-with icmp-host-prohibited > >> -A INPUT -j REJECT --reject-with icmp-host-prohibited > >> -A ICMP -s 0/0 -j ACCEPT > >> -A ICMP -j REJECT --reject-with icmp-host-prohibited > >> -A NFS -s 10.1.0.0/16 -j ACCEPT > >> -A NFS -j REJECT --reject-with icmp-host-prohibited > >> -A SSH -s 10.1.0.0/16 -j ACCEPT > >> -A SSH -j REJECT --reject-with icmp-host-prohibited > >> COMMIT > > > > Where is the rule to accept incoming rpc.statd connections? > > > > Cheers, > > Trond > > sorry, we edited out the critical info. > > on the client we have > > [root@bligh root]# rpcinfo -p > program vers proto port > 100000 2 tcp 111 portmapper > 100000 2 udp 111 portmapper > 100024 1 udp 32768 status > 100024 1 tcp 32768 status > 100021 1 udp 32769 nlockmgr > 100021 3 udp 32769 nlockmgr > 100021 4 udp 32769 nlockmgr > 100021 1 tcp 32769 nlockmgr > 100021 3 tcp 32769 nlockmgr > 100021 4 tcp 32769 nlockmgr > > [root@bligh root]# grep NFS /etc/sysconfig/iptables > -N NFS > -A INPUT -p tcp --dport 111 -j NFS > -A INPUT -p udp --dport 111 -j NFS > -A INPUT -p tcp --dport 32768:32769 -j NFS > -A INPUT -p udp --dport 32768:32769 -j NFS > -A NFS -s 10.1.186.70/32 -j ACCEPT > -A NFS -j REJECT --reject-with icmp-host-prohibited > > this did not work. just to be safe we added > > on the server - where ip is the client's ip > > -A INPUT -s 10.1.186.71/32 -j ACCEPT > > on the client, where ip is the server's ip > > -A INPUT -s 10.1.186.54/32 -j ACCEPT > > to the top of our ruleset before ANY denys. still no go. on shutdown/reboot > of the client there are no error message whatsoever. however, we are seeing > lots of these in /var/log/messages > > ... > ... > ... > Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.62 > Jan 21 10:55:50 moby rpc.statd[1985]: Received erroneous SM_UNMON request from moby.ngdc.noaa.gov for 10.1.186.67 > ... > ... > ... > > where the ips are those of various clients that are successfully performing > locking. what is this about? > > as i said before, both the server and all clients are multihomed with nfs > running only on the backdoor. the frontdoor/backdoor have names like name, > name.b respectively. these names are all in dns. is there any chance this > could be related to lock recovery failure? > > our sysad has suggested starting a tcpdump in the nfslock init.d script to see > what's happening - any other suggestions? > > kind regards. > > -a > -- > =============================================================================== > | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov > | PHONE :: 303.497.6469 > | When you do something, you should burn yourself completely, like a good > | bonfire, leaving no trace of yourself. --Shunryu Suzuki > =============================================================================== > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > NFS maillist - NFS@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nfs > [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 18:40 ` Dan Stromberg @ 2005-01-21 18:59 ` Ara.T.Howard 2005-01-21 19:05 ` Dan Stromberg 0 siblings, 1 reply; 25+ messages in thread From: Ara.T.Howard @ 2005-01-21 18:59 UTC (permalink / raw) To: Dan Stromberg Cc: Trond Myklebust, Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey On Fri, 21 Jan 2005, Dan Stromberg wrote: > > A sniffer is your friend. ethereal is the best I've encountered. > > If you fire up an ethereal to sniff packets from your client on the server, > and/or vice versa, you will likely get an idea of what's wrong fairly > quickly. probably the next step. > BTW, last I heard, NFS on linux could be firewalled, but it required > starting up some daemons with some magic options to hard code them to > specific ports, rather than allowing portmap/rpcbind to move them around. unless i am mistaken, by adding >> on the server - where ip is the client's ip >> >> -A INPUT -s 10.1.186.71/32 -j ACCEPT >> >> on the client, where ip is the server's ip >> >> -A INPUT -s 10.1.186.54/32 -j ACCEPT we effictively did NOT firewall ANYTHING between client and server so portmapping shouldn't have made any difference right? cheers. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 18:59 ` Ara.T.Howard @ 2005-01-21 19:05 ` Dan Stromberg 2005-01-21 19:10 ` Ara.T.Howard 0 siblings, 1 reply; 25+ messages in thread From: Dan Stromberg @ 2005-01-21 19:05 UTC (permalink / raw) To: Ara.T.Howard Cc: strombrg, Trond Myklebust, Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey [-- Attachment #1: Type: text/plain, Size: 549 bytes --] On Fri, 2005-01-21 at 11:59 -0700, Ara.T.Howard wrote: > >> on the server - where ip is the client's ip > >> > >> -A INPUT -s 10.1.186.71/32 -j ACCEPT > >> > >> on the client, where ip is the server's ip > >> > >> -A INPUT -s 10.1.186.54/32 -j ACCEPT > > we effictively did NOT firewall ANYTHING between client and server so > portmapping shouldn't have made any difference right? Likely, but the sniffer should give you an empirical indication, which is better than what our theoretical discussion might give you. [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-21 19:05 ` Dan Stromberg @ 2005-01-21 19:10 ` Ara.T.Howard 0 siblings, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-21 19:10 UTC (permalink / raw) To: Dan Stromberg Cc: Trond Myklebust, Neil Brown, Chip Salzenberg, nfs, Mark O Sleeper, thomas.r.carey On Fri, 21 Jan 2005, Dan Stromberg wrote: > On Fri, 2005-01-21 at 11:59 -0700, Ara.T.Howard wrote: >>>> on the server - where ip is the client's ip >>>> >>>> -A INPUT -s 10.1.186.71/32 -j ACCEPT >>>> >>>> on the client, where ip is the server's ip >>>> >>>> -A INPUT -s 10.1.186.54/32 -j ACCEPT >> >> we effictively did NOT firewall ANYTHING between client and server so >> portmapping shouldn't have made any difference right? > > Likely, but the sniffer should give you an empirical indication, which > is better than what our theoretical discussion might give you. that's the right attitude - never trust anything! ;-) we'll look further as you reccomend. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 1:51 ` Ara.T.Howard 2005-01-14 2:29 ` Neil Brown @ 2005-01-14 3:01 ` Trond Myklebust 2005-01-14 14:53 ` Ara.T.Howard 1 sibling, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2005-01-14 3:01 UTC (permalink / raw) To: Ara.T.Howard; +Cc: Neil Brown, Chip Salzenberg, nfs to den 13.01.2005 Klokka 18:51 (-0700) skreiv Ara.T.Howard: > server: > > mussel:~ > rpcinfo -u bligh status > rpcinfo: RPC: Port mapper failure - RPC: Unable to receive > program 100024 is not available If the portmapper on the server is not responding to the client (as the above error message appears to indicate) then that would explain your problem. Cheers, Trond -- Trond Myklebust <trond.myklebust@fys.uio.no> ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 3:01 ` Trond Myklebust @ 2005-01-14 14:53 ` Ara.T.Howard 0 siblings, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 14:53 UTC (permalink / raw) To: Trond Myklebust; +Cc: Neil Brown, Chip Salzenberg, nfs On Thu, 13 Jan 2005, Trond Myklebust wrote: > to den 13.01.2005 Klokka 18:51 (-0700) skreiv Ara.T.Howard: > >> server: >> >> mussel:~ > rpcinfo -u bligh status >> rpcinfo: RPC: Port mapper failure - RPC: Unable to receive >> program 100024 is not available > > If the portmapper on the server is not responding to the client (as the > above error message appears to indicate) then that would explain your > problem. here's the thing - nothing has changed execpt a kernel upgrade on the client - it rebooted to a new kernel and now the locks are stale. client is latest enterprise and server is the next latest enterprise since it has not (yet) been rebooted. i'm waiting for our sysads to get here to work on it... i'd reboot now except i'm afraid i'll lose any debugging state... -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Debian Bug#203077: Locks not released on NFS client reboot
@ 2005-01-14 21:23 Lever, Charles
2005-01-14 21:32 ` Ara.T.Howard
0 siblings, 1 reply; 25+ messages in thread
From: Lever, Charles @ 2005-01-14 21:23 UTC (permalink / raw)
To: Ara.T.Howard; +Cc: nfs
> i poked around the nfs-fag and how-to and didn't see this...=20
> if i didn't miss it (quite possible) may suggest it be added?=20
so far this has not been a common issue, but i will consider it.
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 25+ messages in thread* RE: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 21:23 Lever, Charles @ 2005-01-14 21:32 ` Ara.T.Howard 0 siblings, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-14 21:32 UTC (permalink / raw) To: Lever, Charles; +Cc: nfs On Fri, 14 Jan 2005, Lever, Charles wrote: >> i poked around the nfs-fag and how-to and didn't see this... >> if i didn't miss it (quite possible) may suggest it be added? > > so far this has not been a common issue, but i will consider it. great. it may not be common but it is, for a developer, __extrememely__ difficult to debug. i guaruntee every government lab will have this issue without knowing it because of increased security policies. cheers. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
* RE: Debian Bug#203077: Locks not released on NFS client reboot
@ 2005-01-14 21:46 Lever, Charles
2005-01-15 5:44 ` Ara.T.Howard
0 siblings, 1 reply; 25+ messages in thread
From: Lever, Charles @ 2005-01-14 21:46 UTC (permalink / raw)
To: nfs
> >> i poked around the nfs-fag and how-to and didn't see this... if i=20
> >> didn't miss it (quite possible) may suggest it be added?
> >
> > so far this has not been a common issue, but i will consider it.
>=20
> great. it may not be common but it is, for a developer,=20
> __extrememely__ difficult to debug. i guaruntee every=20
> government lab will have this issue without knowing it=20
> because of increased security policies.
general question, then: what can we add to the client or server
implementation to make it easier to diagnose?
-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 25+ messages in thread* RE: Debian Bug#203077: Locks not released on NFS client reboot 2005-01-14 21:46 Lever, Charles @ 2005-01-15 5:44 ` Ara.T.Howard 0 siblings, 0 replies; 25+ messages in thread From: Ara.T.Howard @ 2005-01-15 5:44 UTC (permalink / raw) To: Lever, Charles; +Cc: nfs On Fri, 14 Jan 2005, Lever, Charles wrote: >>>> i poked around the nfs-fag and how-to and didn't see this... if i >>>> didn't miss it (quite possible) may suggest it be added? >>> >>> so far this has not been a common issue, but i will consider it. >> >> great. it may not be common but it is, for a developer, >> __extrememely__ difficult to debug. i guaruntee every >> government lab will have this issue without knowing it >> because of increased security policies. > > general question, then: what can we add to the client or server > implementation to make it easier to diagnose? the hostname of the node holding a lock would be awesome. unless i am mistaken (normally ;-)) fcntl will only tell you the pid of the process holding the lock - not the hostname. this info in /proc/locks would be great too. if that were easy to get at it would be easy for an application to detect stale locks. i guess general (userland) meta-data from the nfs server on files would be great (who has it locked, for how long, etc) - but i realize this is severly limited by the vfs layer. cheers. -a -- =============================================================================== | EMAIL :: Ara [dot] T [dot] Howard [at] noaa [dot] gov | PHONE :: 303.497.6469 | When you do something, you should burn yourself completely, like a good | bonfire, leaving no trace of yourself. --Shunryu Suzuki =============================================================================== ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ NFS maillist - NFS@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nfs ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2005-01-21 19:10 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <Pine.OSF.4.56.0307271137050.10355@grover.WPI.EDU>
2003-07-27 16:31 ` Debian Bug#203077: Locks not released on NFS client reboot Chip Salzenberg
2003-07-28 0:56 ` Neil Brown
2005-01-14 1:51 ` Ara.T.Howard
2005-01-14 2:29 ` Neil Brown
2005-01-14 2:57 ` Ara.T.Howard
2005-01-14 16:05 ` Ara.T.Howard
2005-01-14 19:43 ` Trond Myklebust
2005-01-14 19:50 ` Ara.T.Howard
2005-01-18 20:59 ` Ara.T.Howard
2005-01-14 17:47 ` Dan Stromberg
2005-01-14 18:19 ` Ara.T.Howard
2005-01-14 21:20 ` Dan Stromberg
2005-01-21 0:24 ` Ara.T.Howard
2005-01-21 0:52 ` Trond Myklebust
2005-01-21 18:32 ` Ara.T.Howard
2005-01-21 18:40 ` Dan Stromberg
2005-01-21 18:59 ` Ara.T.Howard
2005-01-21 19:05 ` Dan Stromberg
2005-01-21 19:10 ` Ara.T.Howard
2005-01-14 3:01 ` Trond Myklebust
2005-01-14 14:53 ` Ara.T.Howard
2005-01-14 21:23 Lever, Charles
2005-01-14 21:32 ` Ara.T.Howard
-- strict thread matches above, loose matches on Subject: below --
2005-01-14 21:46 Lever, Charles
2005-01-15 5:44 ` Ara.T.Howard
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.