* [NLM] 2.6.27.14 breakage when grace period expires @ 2009-02-11 11:23 Frank van Maarseveen 2009-02-11 20:35 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-11 11:23 UTC (permalink / raw) To: Linux NFS mailing list I'm sorry to inform you but... it seems that there is a similar problem in the NLM subsystem as reported previously but this time it is triggered when the grace time expires after a reboot. Client and server run 2.6.27.14 + previous fix, NFSv3. On the client there are three shells running: while :; do lck -w /mnt/foo 2; done The "lck" program is the same as posted before and it obtains an exclusive write lock then waits 2 seconds in above invocation (there's probably an "fcntl" command equivalent). After an orderly server reboot + grace time expiration one of above command loops reports: lck: fcntl: No locks available and all three get stuck. After ^C-ing all "lck" loops the server still shows an entry in /proc/locks which causes the file to be locked indefinately. Maybe two loops are sufficient to reproduce the issue or maybe you need more, I don't know. Interestingly, during the grace time at least one of the "lck" processes should have re-obtained the lock but it didn't show up in /proc/locks on the server. Interestingly (#2), after removing the file on the server (i.e. no sillyrename) the now free inode is still locked according to /proc/locks. Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter grace). -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-11 11:23 [NLM] 2.6.27.14 breakage when grace period expires Frank van Maarseveen @ 2009-02-11 20:35 ` J. Bruce Fields 2009-02-11 20:37 ` Frank van Maarseveen 0 siblings, 1 reply; 25+ messages in thread From: J. Bruce Fields @ 2009-02-11 20:35 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > I'm sorry to inform you but... it seems that there is a similar problem > in the NLM subsystem as reported previously but this time it is triggered > when the grace time expires after a reboot. > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > On the client there are three shells running: > > while :; do lck -w /mnt/foo 2; done > > The "lck" program is the same as posted before and it obtains an exclusive > write lock then waits 2 seconds in above invocation (there's probably an > "fcntl" command equivalent). After an orderly server reboot + grace time How are you rebooting the server? --b. > expiration one of above command loops reports: > > lck: fcntl: No locks available > > and all three get stuck. After ^C-ing all "lck" loops the server still > shows an entry in /proc/locks which causes the file to be locked > indefinately. Maybe two loops are sufficient to reproduce the issue or > maybe you need more, I don't know. > > Interestingly, during the grace time at least one of the "lck" processes > should have re-obtained the lock but it didn't show up in /proc/locks > on the server. > > Interestingly (#2), after removing the file on the server (i.e. no > sillyrename) the now free inode is still locked according to /proc/locks. > Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo > 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter > grace). > > -- > Frank > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-11 20:35 ` J. Bruce Fields @ 2009-02-11 20:37 ` Frank van Maarseveen 2009-02-11 20:39 ` J. Bruce Fields 0 siblings, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-11 20:37 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Frank van Maarseveen, Linux NFS mailing list On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > I'm sorry to inform you but... it seems that there is a similar problem > > in the NLM subsystem as reported previously but this time it is triggered > > when the grace time expires after a reboot. > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > On the client there are three shells running: > > > > while :; do lck -w /mnt/foo 2; done > > > > The "lck" program is the same as posted before and it obtains an exclusive > > write lock then waits 2 seconds in above invocation (there's probably an > > "fcntl" command equivalent). After an orderly server reboot + grace time > > How are you rebooting the server? "reboot" > > --b. > > > expiration one of above command loops reports: > > > > lck: fcntl: No locks available > > > > and all three get stuck. After ^C-ing all "lck" loops the server still > > shows an entry in /proc/locks which causes the file to be locked > > indefinately. Maybe two loops are sufficient to reproduce the issue or > > maybe you need more, I don't know. > > > > Interestingly, during the grace time at least one of the "lck" processes > > should have re-obtained the lock but it didn't show up in /proc/locks > > on the server. > > > > Interestingly (#2), after removing the file on the server (i.e. no > > sillyrename) the now free inode is still locked according to /proc/locks. > > Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo > > 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter > > grace). > > > > -- > > Frank > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-11 20:37 ` Frank van Maarseveen @ 2009-02-11 20:39 ` J. Bruce Fields 2009-02-11 20:57 ` Frank van Maarseveen 2009-02-12 14:28 ` Frank van Maarseveen 0 siblings, 2 replies; 25+ messages in thread From: J. Bruce Fields @ 2009-02-11 20:39 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: Linux NFS mailing list On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote: > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > > I'm sorry to inform you but... it seems that there is a similar problem > > > in the NLM subsystem as reported previously but this time it is triggered > > > when the grace time expires after a reboot. > > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > > > On the client there are three shells running: > > > > > > while :; do lck -w /mnt/foo 2; done > > > > > > The "lck" program is the same as posted before and it obtains an exclusive > > > write lock then waits 2 seconds in above invocation (there's probably an > > > "fcntl" command equivalent). After an orderly server reboot + grace time > > > > How are you rebooting the server? > > "reboot" Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the server is actually sending the reboot notification to the client, and that the client is trying to reclaim? (Wireshark should make this all fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and send it to me if you're having trouble interpreting it.) --b. > > > > > --b. > > > > > expiration one of above command loops reports: > > > > > > lck: fcntl: No locks available > > > > > > and all three get stuck. After ^C-ing all "lck" loops the server still > > > shows an entry in /proc/locks which causes the file to be locked > > > indefinately. Maybe two loops are sufficient to reproduce the issue or > > > maybe you need more, I don't know. > > > > > > Interestingly, during the grace time at least one of the "lck" processes > > > should have re-obtained the lock but it didn't show up in /proc/locks > > > on the server. > > > > > > Interestingly (#2), after removing the file on the server (i.e. no > > > sillyrename) the now free inode is still locked according to /proc/locks. > > > Even stopping/starting /etc/init.d/nfs-kernel-server plus "echo > > > 3 >/proc/sys/vm/drop_caches" did not remove the lock (it did re-enter > > > grace). > > > > > > -- > > > Frank > > > -- > > > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > > > the body of a message to majordomo@vger.kernel.org > > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > -- > Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-11 20:39 ` J. Bruce Fields @ 2009-02-11 20:57 ` Frank van Maarseveen 2009-02-12 14:28 ` Frank van Maarseveen 1 sibling, 0 replies; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-11 20:57 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Frank van Maarseveen, Linux NFS mailing list On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote: > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote: > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > > > I'm sorry to inform you but... it seems that there is a similar problem > > > > in the NLM subsystem as reported previously but this time it is triggered > > > > when the grace time expires after a reboot. > > > > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > > > > > On the client there are three shells running: > > > > > > > > while :; do lck -w /mnt/foo 2; done > > > > > > > > The "lck" program is the same as posted before and it obtains an exclusive > > > > write lock then waits 2 seconds in above invocation (there's probably an > > > > "fcntl" command equivalent). After an orderly server reboot + grace time > > > > > > How are you rebooting the server? > > > > "reboot" > > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the > server is actually sending the reboot notification to the client, and > that the client is trying to reclaim? (Wireshark should make this all > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and > send it to me if you're having trouble interpreting it.) Can't try it right now but tomorrow I can. However, I'm pretty sure at least the reboot notification is there because: 1) The issue happens too in a totally different NFS server setup which by definition invokes sm-notify in a script. This is the real use case. 2) If not, then I would expect different behavior anyway compared to what I saw. A lost reboot notification is always possible but in that case the client(s) might end up holding more locks than the server, not the other way around as it is right now. I'll make a capture. -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-11 20:39 ` J. Bruce Fields 2009-02-11 20:57 ` Frank van Maarseveen @ 2009-02-12 14:28 ` Frank van Maarseveen 2009-02-12 15:16 ` Trond Myklebust 1 sibling, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-12 14:28 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Linux NFS mailing list On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote: > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote: > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > > > I'm sorry to inform you but... it seems that there is a similar problem > > > > in the NLM subsystem as reported previously but this time it is triggered > > > > when the grace time expires after a reboot. > > > > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > > > > > On the client there are three shells running: > > > > > > > > while :; do lck -w /mnt/foo 2; done > > > > > > > > The "lck" program is the same as posted before and it obtains an exclusive > > > > write lock then waits 2 seconds in above invocation (there's probably an > > > > "fcntl" command equivalent). After an orderly server reboot + grace time > > > > > > How are you rebooting the server? > > > > "reboot" > > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the > server is actually sending the reboot notification to the client, and > that the client is trying to reclaim? (Wireshark should make this all > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and > send it to me if you're having trouble interpreting it.) I have a capture with comment below. It raised so many questions that I decided to do some more testing, trying to figure out how it looks when the locking works. This issue now appears to predate the fuse changes and is also present when both client and server run 2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 + previous fix as discussed earlier. The full capture is available at http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and was started on the server as part of initscripts, right after the reboot and filtered on client IP address. Exported by wireshark (filter: nfs or stat or nlm) and condensed: # time src prot 1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a 2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 5 0.000583 server: ICMP Destination unreachable (Port unreachable) 6 0.000589 server: ICMP Destination unreachable (Port unreachable) 7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 8 1.891320 server: ICMP Destination unreachable (Port unreachable) 9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 10 5.827119 server: ICMP Destination unreachable (Port unreachable) 11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 12 14.626587 server: ICMP Destination unreachable (Port unreachable) 15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 16 15.726505 server: ICMP Destination unreachable (Port unreachable) 17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 18 17.926368 server: ICMP Destination unreachable (Port unreachable) 25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 26 22.326090 server: ICMP Destination unreachable (Port unreachable) 35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0 36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD 37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0 38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0 39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD 40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD 41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a 42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0 43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a 45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0 47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48) 48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47) Reboot notification ok. 51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0 52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0 53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0 54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD 55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD 56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD The three contending client processes. I don't see a lock registration for svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The above goes on for a while. Neither the server or client shows any lock in /proc/locks at this point. 166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0 167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0 168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD 169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD 170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0 171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD 172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0 173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0 174 120.030601 server: NLM V4 LOCK Reply (Call In 173) 175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED This doesn't add up. There hasn't been a successful unlock for svid:114 (see #213 for that) but still one of the locks is granted. 176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0 177 120.030849 server: NLM V4 LOCK Reply (Call In 176) Strange: an identical lock request but with a different rpc xid (i.e. no packet duplication). 178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc 179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500 180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a 181 120.034030 server: NFS V3 ACCESS Reply (Call In 180) 182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc 183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0 184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c 185 120.034526 server: NFS V3 ACCESS Reply (Call In 184) 186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c 187 120.034776 server: NFS V3 ACCESS Reply (Call In 186) 188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest 189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc 190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc 191 120.035230 server: NFS V3 ACCESS Reply (Call In 190) 193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0 194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0 195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193) 197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201) 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205) 207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc 208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500 209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0 210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED 213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0 214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213) 215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc 216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500 217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0 218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED 224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0 225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED 226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0 227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED 230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0 231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED To recap the problem: one of the fcntl calls to obtain a write lock returns lck: fcntl: No locks available shortly after the grace period expires. After that everything gets stuck, server holding a write lock with no corresponding client side lock. IMO looks like the client is to blame, even if/when the server should/could have accepted UNLOCK during grace (I don't know, I'm not an expert on that one). -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 14:28 ` Frank van Maarseveen @ 2009-02-12 15:16 ` Trond Myklebust [not found] ` <1234451789.7190.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 15:16 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 15:28 +0100, Frank van Maarseveen wrote: > On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote: > > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote: > > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > > > > I'm sorry to inform you but... it seems that there is a similar problem > > > > > in the NLM subsystem as reported previously but this time it is triggered > > > > > when the grace time expires after a reboot. > > > > > > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > > > > > > > On the client there are three shells running: > > > > > > > > > > while :; do lck -w /mnt/foo 2; done > > > > > > > > > > The "lck" program is the same as posted before and it obtains an exclusive > > > > > write lock then waits 2 seconds in above invocation (there's probably an > > > > > "fcntl" command equivalent). After an orderly server reboot + grace time > > > > > > > > How are you rebooting the server? > > > > > > "reboot" > > > > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the > > server is actually sending the reboot notification to the client, and > > that the client is trying to reclaim? (Wireshark should make this all > > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and > > send it to me if you're having trouble interpreting it.) > > I have a capture with comment below. It raised so many questions > that I decided to do some more testing, trying to figure out how > it looks when the locking works. This issue now appears to predate the > fuse changes and is also present when both client and server run > 2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 + > previous fix as discussed earlier. The full capture is available at > http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and > was started on the server as part of initscripts, right after the reboot > and filtered on client IP address. > > Exported by wireshark (filter: nfs or stat or nlm) and condensed: > > # time src prot > 1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a > 2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 5 0.000583 server: ICMP Destination unreachable (Port unreachable) > 6 0.000589 server: ICMP Destination unreachable (Port unreachable) > 7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 8 1.891320 server: ICMP Destination unreachable (Port unreachable) > 9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 10 5.827119 server: ICMP Destination unreachable (Port unreachable) > 11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 12 14.626587 server: ICMP Destination unreachable (Port unreachable) > 15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 16 15.726505 server: ICMP Destination unreachable (Port unreachable) > 17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 18 17.926368 server: ICMP Destination unreachable (Port unreachable) > 25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 26 22.326090 server: ICMP Destination unreachable (Port unreachable) > 35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0 > 36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD > 37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0 > 38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0 > 39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD > 40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD > 41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > 42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0 > 43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a > 45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0 > 47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48) > 48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47) > > Reboot notification ok. > > 51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0 > 52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0 > 53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0 > 54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD > 55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD > 56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD > > The three contending client processes. I don't see a lock registration for > svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The > above goes on for a while. Neither the server or client shows any lock > in /proc/locks at this point. > > 166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0 > 167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0 > 168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD > 169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD > 170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0 > 171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD > 172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0 > 173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0 > 174 120.030601 server: NLM V4 LOCK Reply (Call In 173) > 175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED > > This doesn't add up. There hasn't been a successful unlock for svid:114 > (see #213 for that) but still one of the locks is granted. Has the lock for svid:114 been attempted recovered by the client? If not, then the server has no knowledge of that lock. > 176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0 > 177 120.030849 server: NLM V4 LOCK Reply (Call In 176) > > Strange: an identical lock request but with a different rpc xid (i.e. no > packet duplication). No. That would be the non-blocking lock that is intended as a 'ping' to see if the server is still alive. It duplicates the blocking lock in all details except that the 'block' flag is not set. > 178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc > 179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500 > 180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a > 181 120.034030 server: NFS V3 ACCESS Reply (Call In 180) > 182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc > 183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0 > 184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c > 185 120.034526 server: NFS V3 ACCESS Reply (Call In 184) > 186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c > 187 120.034776 server: NFS V3 ACCESS Reply (Call In 186) > 188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest > 189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc > 190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc > 191 120.035230 server: NFS V3 ACCESS Reply (Call In 190) > 193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0 > 194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0 > 195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193) > 197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED > 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0 > 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201) > 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED > 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205) What happened here? Why did the client refuse the lock for svid 116? Did the task get signalled? If so, where is the CANCEL request? > 207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc > 208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500 > 209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0 > 210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED > 213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0 > 214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213) > 215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc > 216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500 > 217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0 > 218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED > 224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0 > 225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED > 226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0 > 227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED > 230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0 > 231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED > > To recap the problem: one of the fcntl calls to obtain a write lock > returns > > lck: fcntl: No locks available > > shortly after the grace period expires. After that everything gets stuck, > server holding a write lock with no corresponding client side lock. > > > IMO looks like the client is to blame, even if/when the server > should/could have accepted UNLOCK during grace (I don't know, I'm not > an expert on that one). Possibly... It depends entirely on what happened to cause it to deny the GRANTED callback... Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234451789.7190.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234451789.7190.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 15:36 ` Frank van Maarseveen 2009-02-12 18:17 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-12 15:36 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, Feb 12, 2009 at 10:16:29AM -0500, Trond Myklebust wrote: > On Thu, 2009-02-12 at 15:28 +0100, Frank van Maarseveen wrote: > > On Wed, Feb 11, 2009 at 03:39:48PM -0500, J. Bruce Fields wrote: > > > On Wed, Feb 11, 2009 at 09:37:03PM +0100, Frank van Maarseveen wrote: > > > > On Wed, Feb 11, 2009 at 03:35:55PM -0500, J. Bruce Fields wrote: > > > > > On Wed, Feb 11, 2009 at 12:23:18PM +0100, Frank van Maarseveen wrote: > > > > > > I'm sorry to inform you but... it seems that there is a similar problem > > > > > > in the NLM subsystem as reported previously but this time it is triggered > > > > > > when the grace time expires after a reboot. > > > > > > > > > > > > Client and server run 2.6.27.14 + previous fix, NFSv3. > > > > > > > > > > > > On the client there are three shells running: > > > > > > > > > > > > while :; do lck -w /mnt/foo 2; done > > > > > > > > > > > > The "lck" program is the same as posted before and it obtains an exclusive > > > > > > write lock then waits 2 seconds in above invocation (there's probably an > > > > > > "fcntl" command equivalent). After an orderly server reboot + grace time > > > > > > > > > > How are you rebooting the server? > > > > > > > > "reboot" > > > > > > Could you watch the nfs/nlm/nsm traffic on reboot and make sure that the > > > server is actually sending the reboot notification to the client, and > > > that the client is trying to reclaim? (Wireshark should make this all > > > fairly clear. But capture the traffic with tcpdump -s0 -wtmp.pcap and > > > send it to me if you're having trouble interpreting it.) > > > > I have a capture with comment below. It raised so many questions > > that I decided to do some more testing, trying to figure out how > > it looks when the locking works. This issue now appears to predate the > > fuse changes and is also present when both client and server run > > 2.6.24.4. I decided to stick with the traffic capture for 2.7.27.14 + > > previous fix as discussed earlier. The full capture is available at > > http://www.frankvm.com/tmp/2.6.27.14-nlm-grace.pcap. It's about 33k and > > was started on the server as part of initscripts, right after the reboot > > and filtered on client IP address. > > > > Exported by wireshark (filter: nfs or stat or nlm) and condensed: > > > > # time src prot > > 1 0.000000 client: NFS V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 2 0.000018 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 5 0.000583 server: ICMP Destination unreachable (Port unreachable) > > 6 0.000589 server: ICMP Destination unreachable (Port unreachable) > > 7 1.891277 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 8 1.891320 server: ICMP Destination unreachable (Port unreachable) > > 9 5.827053 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 10 5.827119 server: ICMP Destination unreachable (Port unreachable) > > 11 14.626501 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 12 14.626587 server: ICMP Destination unreachable (Port unreachable) > > 15 15.726426 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 16 15.726505 server: ICMP Destination unreachable (Port unreachable) > > 17 17.926284 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 18 17.926368 server: ICMP Destination unreachable (Port unreachable) > > 25 22.326006 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 26 22.326090 server: ICMP Destination unreachable (Port unreachable) > > 35 30.022271 client: NLM V4 UNLOCK Call (Reply In 36) FH:0xcafa61cc svid:114 pos:0-0 > > 36 30.029511 server: NLM V4 UNLOCK Reply (Call In 35) NLM_DENIED_GRACE_PERIOD > > 37 30.029660 client: NLM V4 LOCK Call (Reply In 39) FH:0xcafa61cc svid:116 pos:0-0 > > 38 30.029691 client: NLM V4 LOCK Call (Reply In 40) FH:0xcafa61cc svid:115 pos:0-0 > > 39 30.029884 server: NLM V4 LOCK Reply (Call In 37) NLM_DENIED_GRACE_PERIOD > > 40 30.029914 server: NLM V4 LOCK Reply (Call In 38) NLM_DENIED_GRACE_PERIOD > > 41 31.125403 client: NFS [RPC retransmission of #1]V3 GETATTR Call (Reply In 42), FH:0x0308030a > > 42 31.127499 server: NFS V3 GETATTR Reply (Call In 1) Directory mode:0755 uid:0 gid:0 > > 43 31.127942 client: NFS V3 GETATTR Call (Reply In 45), FH:0x0308030a > > 45 31.129378 server: NFS V3 GETATTR Reply (Call In 43) Directory mode:0755 uid:0 gid:0 > > 47 31.129958 server: STAT V1 NOTIFY Call (Reply In 48) > > 48 31.130301 client: STAT V1 NOTIFY Reply (Call In 47) > > > > Reboot notification ok. > > > > 51 35.029968 client: NLM V4 UNLOCK Call (Reply In 54) FH:0xcafa61cc svid:114 pos:0-0 > > 52 35.030003 client: NLM V4 LOCK Call (Reply In 55) FH:0xcafa61cc svid:116 pos:0-0 > > 53 35.030016 client: NLM V4 LOCK Call (Reply In 56) FH:0xcafa61cc svid:115 pos:0-0 > > 54 35.030085 server: NLM V4 UNLOCK Reply (Call In 51) NLM_DENIED_GRACE_PERIOD > > 55 35.030126 server: NLM V4 LOCK Reply (Call In 52) NLM_DENIED_GRACE_PERIOD > > 56 35.030153 server: NLM V4 LOCK Reply (Call In 53) NLM_DENIED_GRACE_PERIOD > > > > The three contending client processes. I don't see a lock registration for > > svid:114, only UNLOCK calls which fail with NLM_DENIED_GRACE_PERIOD. The > > above goes on for a while. Neither the server or client shows any lock > > in /proc/locks at this point. > > > > 166 115.028376 client: NLM V4 LOCK Call (Reply In 168) FH:0xcafa61cc svid:115 pos:0-0 > > 167 115.028394 client: NLM V4 LOCK Call (Reply In 169) FH:0xcafa61cc svid:116 pos:0-0 > > 168 115.028440 server: NLM V4 LOCK Reply (Call In 166) NLM_DENIED_GRACE_PERIOD > > 169 115.028465 server: NLM V4 LOCK Reply (Call In 167) NLM_DENIED_GRACE_PERIOD > > 170 120.027233 client: NLM V4 UNLOCK Call (Reply In 171) FH:0xcafa61cc svid:114 pos:0-0 > > 171 120.027337 server: NLM V4 UNLOCK Reply (Call In 170) NLM_DENIED_GRACE_PERIOD > > 172 120.028234 client: NLM V4 LOCK Call (Reply In 175) FH:0xcafa61cc svid:116 pos:0-0 > > 173 120.028258 client: NLM V4 LOCK Call (Reply In 174) FH:0xcafa61cc svid:115 pos:0-0 > > 174 120.030601 server: NLM V4 LOCK Reply (Call In 173) > > 175 120.030656 server: NLM V4 LOCK Reply (Call In 172) NLM_BLOCKED > > > > This doesn't add up. There hasn't been a successful unlock for svid:114 > > (see #213 for that) but still one of the locks is granted. > > Has the lock for svid:114 been attempted recovered by the client? If > not, then the server has no knowledge of that lock. exactly. Apparently the client tries to unlock an unrecovered lock. > > > 176 120.030781 client: NLM V4 LOCK Call (Reply In 177) FH:0xcafa61cc svid:115 pos:0-0 > > 177 120.030849 server: NLM V4 LOCK Reply (Call In 176) > > > > Strange: an identical lock request but with a different rpc xid (i.e. no > > packet duplication). > > No. That would be the non-blocking lock that is intended as a 'ping' to > see if the server is still alive. It duplicates the blocking lock in all > details except that the 'block' flag is not set. > > > 178 120.031078 client: NFS V3 GETATTR Call (Reply In 179), FH:0xcafa61cc > > 179 120.031154 server: NFS V3 GETATTR Reply (Call In 178) Regular File mode:0644 uid:363 gid:1500 > > 180 120.033973 client: NFS V3 ACCESS Call (Reply In 181), FH:0x0308030a > > 181 120.034030 server: NFS V3 ACCESS Reply (Call In 180) > > 182 120.034223 client: NFS V3 LOOKUP Call (Reply In 183), DH:0x0308030a/loc > > 183 120.034285 server: NFS V3 LOOKUP Reply (Call In 182), FH:0x81685ca0 > > 184 120.034472 client: NFS V3 ACCESS Call (Reply In 185), FH:0x0308030c > > 185 120.034526 server: NFS V3 ACCESS Reply (Call In 184) > > 186 120.034722 client: NFS V3 ACCESS Call (Reply In 187), FH:0x0308030c > > 187 120.034776 server: NFS V3 ACCESS Reply (Call In 186) > > 188 120.034922 client: NFS V3 LOOKUP Call (Reply In 189), DH:0x0308030c/locktest > > 189 120.034993 server: NFS V3 LOOKUP Reply (Call In 188), FH:0xcafa61cc > > 190 120.035172 client: NFS V3 ACCESS Call (Reply In 191), FH:0xcafa61cc > > 191 120.035230 server: NFS V3 ACCESS Reply (Call In 190) > > 193 122.032218 client: NLM V4 UNLOCK Call (Reply In 195) FH:0xcafa61cc svid:115 pos:0-0 > > 194 122.032253 client: NLM V4 LOCK Call (Reply In 197) FH:0xcafa61cc svid:119 pos:0-0 > > 195 122.032343 server: NLM V4 UNLOCK Reply (Call In 193) > > 197 122.032794 server: NLM V4 LOCK Reply (Call In 194) NLM_BLOCKED > > 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0 > > 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201) > > 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED > > 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205) > > What happened here? Why did the client refuse the lock for svid 116? > > Did the task get signalled? If so, where is the CANCEL request? The task did not get signaled, there is no CANCEL. > > > 207 122.036312 client: NFS V3 GETATTR Call (Reply In 208), FH:0xcafa61cc > > 208 122.036394 server: NFS V3 GETATTR Reply (Call In 207) Regular File mode:0644 uid:363 gid:1500 > > 209 122.036611 client: NLM V4 LOCK Call (Reply In 210) FH:0xcafa61cc svid:120 pos:0-0 > > 210 122.036674 server: NLM V4 LOCK Reply (Call In 209) NLM_BLOCKED > > 213 125.027091 client: NLM V4 UNLOCK Call (Reply In 214) FH:0xcafa61cc svid:114 pos:0-0 > > 214 125.027194 server: NLM V4 UNLOCK Reply (Call In 213) > > 215 125.029487 client: NFS V3 GETATTR Call (Reply In 216), FH:0xcafa61cc > > 216 125.029570 server: NFS V3 GETATTR Reply (Call In 215) Regular File mode:0644 uid:363 gid:1500 > > 217 125.029836 client: NLM V4 LOCK Call (Reply In 218) FH:0xcafa61cc svid:121 pos:0-0 > > 218 125.029895 server: NLM V4 LOCK Reply (Call In 217) NLM_BLOCKED > > 224 152.032157 client: NLM V4 LOCK Call (Reply In 225) FH:0xcafa61cc svid:119 pos:0-0 > > 225 152.032283 server: NLM V4 LOCK Reply (Call In 224) NLM_BLOCKED > > 226 152.035103 client: NLM V4 LOCK Call (Reply In 227) FH:0xcafa61cc svid:120 pos:0-0 > > 227 152.035157 server: NLM V4 LOCK Reply (Call In 226) NLM_BLOCKED > > 230 155.029676 client: NLM V4 LOCK Call (Reply In 231) FH:0xcafa61cc svid:121 pos:0-0 > > 231 155.029761 server: NLM V4 LOCK Reply (Call In 230) NLM_BLOCKED > > > > To recap the problem: one of the fcntl calls to obtain a write lock > > returns > > > > lck: fcntl: No locks available > > > > shortly after the grace period expires. After that everything gets stuck, > > server holding a write lock with no corresponding client side lock. > > > > > > IMO looks like the client is to blame, even if/when the server > > should/could have accepted UNLOCK during grace (I don't know, I'm not > > an expert on that one). > > Possibly... It depends entirely on what happened to cause it to deny the > GRANTED callback... A little theorizing: If the unlock of a yet unrecovered lock has failed up to that point then the client sure must remember the lock somehow. That might explain the secondary error when a conflicting lock is granted by the server. -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 15:36 ` Frank van Maarseveen @ 2009-02-12 18:17 ` Trond Myklebust [not found] ` <1234462647.7190.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 18:17 UTC (permalink / raw) To: Frank van Maarseveen; +Cc: J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: > A little theorizing: > If the unlock of a yet unrecovered lock has failed up to that point then > the client sure must remember the lock somehow. That might explain the > secondary error when a conflicting lock is granted by the server. Sorry, but that doesn't hold water. The client will release the VFS 'mirror' of the lock before it attempts to unlock. Otherwise, you could have some nasty races between the unlock thread and the recovery thread... Besides, the granted callback handler on the client only checks the list of blocked locks for a match. Oh, bugger, I know what this is... It's the same thing that happened to the NFSv4 callback server. If you compile with CONFIG_IPV6 or CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then the NLM server will listen on an IPv6 socket, and so the RPC request come in with their IPv4 address mapped into the IPv6 namespace. The client, on the other hand, is using an IPv4 socket, 'cos you specified an IPv4 address to the mount command. The result is that the call to nlm_cmp_addr() in nlmclnt_grant() always fails... Basically, we need to replace nlm_cmp_addr() with something akin to nfs_sockaddr_match_ipaddr(), which will compare v4 mapped addresses. The workaround should be simply to turn off CONFIG_SUNRPC_REGISTER_V4 if you're not planning on ever using NFS-over-IPv6... Cheers Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234462647.7190.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234462647.7190.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 18:29 ` Frank van Maarseveen 2009-02-12 19:10 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-12 18:29 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote: > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: > > A little theorizing: > > If the unlock of a yet unrecovered lock has failed up to that point then > > the client sure must remember the lock somehow. That might explain the > > secondary error when a conflicting lock is granted by the server. > > Sorry, but that doesn't hold water. The client will release the VFS > 'mirror' of the lock before it attempts to unlock. Otherwise, you could > have some nasty races between the unlock thread and the recovery > thread... > Besides, the granted callback handler on the client only checks the list > of blocked locks for a match. ok, then we have more than one NLM bug to resolve. > > Oh, bugger, I know what this is... It's the same thing that happened to > the NFSv4 callback server. If you compile with CONFIG_IPV6 or > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then > the NLM server will listen on an IPv6 socket, and so the RPC request > come in with their IPv4 address mapped into the IPv6 namespace. Nope: $ zgrep IPV6 /proc/config.gz # CONFIG_IPV6 is not set $ zgrep SUNRPC /proc/config.gz CONFIG_SUNRPC=y CONFIG_SUNRPC_GSS=y # CONFIG_SUNRPC_BIND34 is not set And remember this is not a recent regression. -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 18:29 ` Frank van Maarseveen @ 2009-02-12 19:10 ` Trond Myklebust [not found] ` <1234465837.7190.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 19:10 UTC (permalink / raw) To: Frank van Maarseveen, Mr. Charles Edward Lever Cc: J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote: > On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote: > > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: > > > A little theorizing: > > > If the unlock of a yet unrecovered lock has failed up to that point then > > > the client sure must remember the lock somehow. That might explain the > > > secondary error when a conflicting lock is granted by the server. > > > > Sorry, but that doesn't hold water. The client will release the VFS > > 'mirror' of the lock before it attempts to unlock. Otherwise, you could > > have some nasty races between the unlock thread and the recovery > > thread... > > Besides, the granted callback handler on the client only checks the list > > of blocked locks for a match. > > ok, then we have more than one NLM bug to resolve. > > > > > Oh, bugger, I know what this is... It's the same thing that happened to > > the NFSv4 callback server. If you compile with CONFIG_IPV6 or > > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then > > the NLM server will listen on an IPv6 socket, and so the RPC request > > come in with their IPv4 address mapped into the IPv6 namespace. > > Nope: > > $ zgrep IPV6 /proc/config.gz > # CONFIG_IPV6 is not set > $ zgrep SUNRPC /proc/config.gz > CONFIG_SUNRPC=y > CONFIG_SUNRPC_GSS=y > # CONFIG_SUNRPC_BIND34 is not set Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is specific to 2.6.29. Chuck, are you planning on fixing this before 2.6.29-final comes out? > And remember this is not a recent regression. It would help if you sent us the full binary tcpdump, instead of just the summary. That should enable us to figure out which of the tests is failing in nlmclnt_grant(). Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234465837.7190.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234465837.7190.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 19:16 ` Frank van Maarseveen 2009-02-12 20:24 ` Trond Myklebust 2009-02-12 19:35 ` Chuck Lever 1 sibling, 1 reply; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-12 19:16 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, Mr. Charles Edward Lever, J. Bruce Fields, Linux NFS mailing list [-- Attachment #1: Type: text/plain, Size: 2007 bytes --] On Thu, Feb 12, 2009 at 02:10:37PM -0500, Trond Myklebust wrote: > On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote: > > On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote: > > > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: > > > > A little theorizing: > > > > If the unlock of a yet unrecovered lock has failed up to that point then > > > > the client sure must remember the lock somehow. That might explain the > > > > secondary error when a conflicting lock is granted by the server. > > > > > > Sorry, but that doesn't hold water. The client will release the VFS > > > 'mirror' of the lock before it attempts to unlock. Otherwise, you could > > > have some nasty races between the unlock thread and the recovery > > > thread... > > > Besides, the granted callback handler on the client only checks the list > > > of blocked locks for a match. > > > > ok, then we have more than one NLM bug to resolve. > > > > > > > > Oh, bugger, I know what this is... It's the same thing that happened to > > > the NFSv4 callback server. If you compile with CONFIG_IPV6 or > > > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then > > > the NLM server will listen on an IPv6 socket, and so the RPC request > > > come in with their IPv4 address mapped into the IPv6 namespace. > > > > Nope: > > > > $ zgrep IPV6 /proc/config.gz > > # CONFIG_IPV6 is not set > > $ zgrep SUNRPC /proc/config.gz > > CONFIG_SUNRPC=y > > CONFIG_SUNRPC_GSS=y > > # CONFIG_SUNRPC_BIND34 is not set > > Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is > specific to 2.6.29. Chuck, are you planning on fixing this before > 2.6.29-final comes out? > > > And remember this is not a recent regression. > > It would help if you sent us the full binary tcpdump, instead of just > the summary. That should enable us to figure out which of the tests is > failing in nlmclnt_grant(). I posted the link already. Anyway, see attachment. -- Frank [-- Attachment #2: 2.6.27.14-nlm-grace.pcap --] [-- Type: application/octet-stream, Size: 33708 bytes --] ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 19:16 ` Frank van Maarseveen @ 2009-02-12 20:24 ` Trond Myklebust [not found] ` <1234470251.7190.102.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 20:24 UTC (permalink / raw) To: Frank van Maarseveen Cc: Mr. Charles Edward Lever, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 20:16 +0100, Frank van Maarseveen wrote: > On Thu, Feb 12, 2009 at 02:10:37PM -0500, Trond Myklebust wrote: > > On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote: > > > On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote: > > > > On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: > > > > > A little theorizing: > > > > > If the unlock of a yet unrecovered lock has failed up to that point then > > > > > the client sure must remember the lock somehow. That might explain the > > > > > secondary error when a conflicting lock is granted by the server. > > > > > > > > Sorry, but that doesn't hold water. The client will release the VFS > > > > 'mirror' of the lock before it attempts to unlock. Otherwise, you could > > > > have some nasty races between the unlock thread and the recovery > > > > thread... > > > > Besides, the granted callback handler on the client only checks the list > > > > of blocked locks for a match. > > > > > > ok, then we have more than one NLM bug to resolve. > > > > > > > > > > > Oh, bugger, I know what this is... It's the same thing that happened to > > > > the NFSv4 callback server. If you compile with CONFIG_IPV6 or > > > > CONFIG_IPV6_MODULE enabled, and also set CONFIG_SUNRPC_REGISTER_V4, then > > > > the NLM server will listen on an IPv6 socket, and so the RPC request > > > > come in with their IPv4 address mapped into the IPv6 namespace. > > > > > > Nope: > > > > > > $ zgrep IPV6 /proc/config.gz > > > # CONFIG_IPV6 is not set > > > $ zgrep SUNRPC /proc/config.gz > > > CONFIG_SUNRPC=y > > > CONFIG_SUNRPC_GSS=y > > > # CONFIG_SUNRPC_BIND34 is not set > > > > Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses bug is > > specific to 2.6.29. Chuck, are you planning on fixing this before > > 2.6.29-final comes out? > > > > > And remember this is not a recent regression. > > > > It would help if you sent us the full binary tcpdump, instead of just > > the summary. That should enable us to figure out which of the tests is > > failing in nlmclnt_grant(). > > I posted the link already. Anyway, see attachment. Yeah... It looks alright. The one thing that looks a bit odd is the GRANTED lock has a 'caller_name' field that is set to the name of the server. I pretty sure we don't care about that, though... Hmm... I wonder if the problem isn't just that we're failing to cancel the lock request when the process is signalled. Can you try the following patch? -------------------------------------------------------------------- From: Trond Myklebust <Trond.Myklebust@netapp.com> NLM/lockd: Always cancel blocked locks when exiting early from nlmclnt_lock Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- fs/lockd/clntproc.c | 9 +++++++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c index 31668b6..f956d1e 100644 --- a/fs/lockd/clntproc.c +++ b/fs/lockd/clntproc.c @@ -542,9 +542,14 @@ again: status = nlmclnt_call(cred, req, NLMPROC_LOCK); if (status < 0) break; - /* Did a reclaimer thread notify us of a server reboot? */ - if (resp->status == nlm_lck_denied_grace_period) + /* Is the server in a grace period state? + * If so, we need to reset the resp->status, and + * retry... + */ + if (resp->status == nlm_lck_denied_grace_period) { + resp->status = nlm_lck_blocked; continue; + } if (resp->status != nlm_lck_blocked) break; /* Wait on an NLM blocking lock */ ^ permalink raw reply related [flat|nested] 25+ messages in thread
[parent not found: <1234470251.7190.102.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234470251.7190.102.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-13 11:04 ` Frank van Maarseveen 0 siblings, 0 replies; 25+ messages in thread From: Frank van Maarseveen @ 2009-02-13 11:04 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, Mr. Charles Edward Lever, J. Bruce Fields, Linux NFS mailing list On Thu, Feb 12, 2009 at 03:24:11PM -0500, Trond Myklebust wrote: > > Hmm... I wonder if the problem isn't just that we're failing to cancel > the lock request when the process is signalled. Can you try the > following patch? > > -------------------------------------------------------------------- > From: Trond Myklebust <Trond.Myklebust@netapp.com> > NLM/lockd: Always cancel blocked locks when exiting early from nlmclnt_lock > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > --- > > fs/lockd/clntproc.c | 9 +++++++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > > diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c > index 31668b6..f956d1e 100644 > --- a/fs/lockd/clntproc.c > +++ b/fs/lockd/clntproc.c > @@ -542,9 +542,14 @@ again: > status = nlmclnt_call(cred, req, NLMPROC_LOCK); > if (status < 0) > break; > - /* Did a reclaimer thread notify us of a server reboot? */ > - if (resp->status == nlm_lck_denied_grace_period) > + /* Is the server in a grace period state? > + * If so, we need to reset the resp->status, and > + * retry... > + */ > + if (resp->status == nlm_lck_denied_grace_period) { > + resp->status = nlm_lck_blocked; > continue; > + } > if (resp->status != nlm_lck_blocked) > break; > /* Wait on an NLM blocking lock */ Patch tried but didn't make any difference. Note that there isn't any ^C or any other signal involved. The client runs three loops in the shell while :; do lck -w /mnt/locktest 2; done and every "lck" opens the file, obtains an exclusive write lock (waits if necessary), calls sleep(2), closes the fd (releasing the lock) and goes exit. The "lck" which ends up unlocking during grace terminates normally but one of the others gets a "fcntl: No locks available" when trying to obtain the lock. Question: shouldn't the server drop the lock after a sequence like: 201 122.033767 server: NLM V4 GRANTED_MSG Call (Reply In 202) FH:0xcafa61cc svid:116 pos:0-0 202 122.034066 client: NLM V4 GRANTED_MSG Reply (Call In 201) 205 122.034665 client: NLM V4 GRANTED_RES Call (Reply In 206) NLM_DENIED 206 122.034753 server: NLM V4 GRANTED_RES Reply (Call In 205) ? -- Frank ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234465837.7190.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2009-02-12 19:16 ` Frank van Maarseveen @ 2009-02-12 19:35 ` Chuck Lever 2009-02-12 19:43 ` Trond Myklebust 1 sibling, 1 reply; 25+ messages in thread From: Chuck Lever @ 2009-02-12 19:35 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Feb 12, 2009, at 2:10 PM, Trond Myklebust wrote: > On Thu, 2009-02-12 at 19:29 +0100, Frank van Maarseveen wrote: >> On Thu, Feb 12, 2009 at 01:17:27PM -0500, Trond Myklebust wrote: >>> On Thu, 2009-02-12 at 16:36 +0100, Frank van Maarseveen wrote: >>>> A little theorizing: >>>> If the unlock of a yet unrecovered lock has failed up to that >>>> point then >>>> the client sure must remember the lock somehow. That might >>>> explain the >>>> secondary error when a conflicting lock is granted by the server. >>> >>> Sorry, but that doesn't hold water. The client will release the VFS >>> 'mirror' of the lock before it attempts to unlock. Otherwise, you >>> could >>> have some nasty races between the unlock thread and the recovery >>> thread... >>> Besides, the granted callback handler on the client only checks >>> the list >>> of blocked locks for a match. >> >> ok, then we have more than one NLM bug to resolve. >> >>> >>> Oh, bugger, I know what this is... It's the same thing that >>> happened to >>> the NFSv4 callback server. If you compile with CONFIG_IPV6 or >>> CONFIG_IPV6_MODULE enabled, and also set >>> CONFIG_SUNRPC_REGISTER_V4, then >>> the NLM server will listen on an IPv6 socket, and so the RPC request >>> come in with their IPv4 address mapped into the IPv6 namespace. >> >> Nope: >> >> $ zgrep IPV6 /proc/config.gz >> # CONFIG_IPV6 is not set >> $ zgrep SUNRPC /proc/config.gz >> CONFIG_SUNRPC=y >> CONFIG_SUNRPC_GSS=y >> # CONFIG_SUNRPC_BIND34 is not set > > Sorry, yes... 2.6.27.x should be OK. The lockd v4mapped addresses > bug is > specific to 2.6.29. Chuck, are you planning on fixing this before > 2.6.29-final comes out? I wasn't sure exactly where the compared addresses came from. I had assumed that they all came through the listener, so we wouldn't need this kind of translation. It shouldn't be difficult to map addresses passed in via nlmclnt_init() to AF_INET6. But this is the kind of thing that makes "falling back" to an AF_INET listener a little challenging. We will have to record what flavor the listener is and do a translation depending on what listener family was actually created. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 19:35 ` Chuck Lever @ 2009-02-12 19:43 ` Trond Myklebust [not found] ` <1234467795.7190.70.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 19:43 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: > I wasn't sure exactly where the compared addresses came from. I had > assumed that they all came through the listener, so we wouldn't need > this kind of translation. It shouldn't be difficult to map addresses > passed in via nlmclnt_init() to AF_INET6. > > But this is the kind of thing that makes "falling back" to an AF_INET > listener a little challenging. We will have to record what flavor the > listener is and do a translation depending on what listener family was > actually created. Why? Should we care whether we're receiving IPv4 addresses or IPv6 v4-mapped addresses? They're the same thing... We're already doing the mapping for the NFSv4 callback channel. See nfs_sockaddr_match_ipaddr() in fs/nfs/client.c Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234467795.7190.70.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234467795.7190.70.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 20:11 ` Chuck Lever 2009-02-12 20:27 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Chuck Lever @ 2009-02-12 20:11 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote: > On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: >> I wasn't sure exactly where the compared addresses came from. I had >> assumed that they all came through the listener, so we wouldn't need >> this kind of translation. It shouldn't be difficult to map addresses >> passed in via nlmclnt_init() to AF_INET6. >> >> But this is the kind of thing that makes "falling back" to an AF_INET >> listener a little challenging. We will have to record what flavor >> the >> listener is and do a translation depending on what listener family >> was >> actually created. > > Why? Should we care whether we're receiving IPv4 addresses or IPv6 > v4-mapped addresses? They're the same thing... The problem is the listener family is now decided at run-time. If an AF_INET6 listener can't be created, an AF_INET listener is created instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an AF_INET listener is created, we get only IPv4 addresses in svc_rqst- >rq_addr. So we can do it either way. Taking lockd as an example: 1. Have nlmclnt_init() map AF_INET mount addresses to AF_INET6 iff the lockd listener is AF_INET6, so nlm_cmp_addr() is always dealing with AF_INET6 in this case, or 2. If CONFIG_IPV6 || CONFIG_IPV6_MODULE, unconditionally map AF_INET addresses in nlmclnt_init and for incoming NLM requests (when lockd happens to have fallen back to an AF_INET listener) Personally I think solution 1. will be less confusing operationally and less invasive code-wise. I suppose IPv6 purists would prefer keeping the whole stack in AF_INET6, so they would like solution 2. Eventually we could map incoming addresses on AF_INET listeners in the RPC server code, but I prefer to wait until all kernel RPC services have IPv6 support. Since 2.6.29 has the CONFIG_SUNRPC_REGISTER_V4=N workaround, do we need to fix 2.6.29, or can this wait until 2.6.30? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 20:11 ` Chuck Lever @ 2009-02-12 20:27 ` Trond Myklebust [not found] ` <1234470457.7190.106.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 0 siblings, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 20:27 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote: > On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote: > > On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: > >> I wasn't sure exactly where the compared addresses came from. I had > >> assumed that they all came through the listener, so we wouldn't need > >> this kind of translation. It shouldn't be difficult to map addresses > >> passed in via nlmclnt_init() to AF_INET6. > >> > >> But this is the kind of thing that makes "falling back" to an AF_INET > >> listener a little challenging. We will have to record what flavor > >> the > >> listener is and do a translation depending on what listener family > >> was > >> actually created. > > > > Why? Should we care whether we're receiving IPv4 addresses or IPv6 > > v4-mapped addresses? They're the same thing... > > The problem is the listener family is now decided at run-time. If an > AF_INET6 listener can't be created, an AF_INET listener is created > instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an > AF_INET listener is created, we get only IPv4 addresses in svc_rqst- > >rq_addr. You're missing my point. Why should we care if it's one or the other? In the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it turns out that CONFIG_IPV6 is enabled. IOW: we always compare IPv6 addresses. Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234470457.7190.106.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234470457.7190.106.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 20:43 ` Chuck Lever 2009-02-12 20:54 ` Trond Myklebust 2009-02-12 22:02 ` Trond Myklebust 0 siblings, 2 replies; 25+ messages in thread From: Chuck Lever @ 2009-02-12 20:43 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote: > On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote: >> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote: >>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: >>>> I wasn't sure exactly where the compared addresses came from. I >>>> had >>>> assumed that they all came through the listener, so we wouldn't >>>> need >>>> this kind of translation. It shouldn't be difficult to map >>>> addresses >>>> passed in via nlmclnt_init() to AF_INET6. >>>> >>>> But this is the kind of thing that makes "falling back" to an >>>> AF_INET >>>> listener a little challenging. We will have to record what flavor >>>> the >>>> listener is and do a translation depending on what listener family >>>> was >>>> actually created. >>> >>> Why? Should we care whether we're receiving IPv4 addresses or IPv6 >>> v4-mapped addresses? They're the same thing... >> >> The problem is the listener family is now decided at run-time. If an >> AF_INET6 listener can't be created, an AF_INET listener is created >> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an >> AF_INET listener is created, we get only IPv4 addresses in svc_rqst- >>> rq_addr. > > You're missing my point. Why should we care if it's one or the > other? In > the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it > turns out that CONFIG_IPV6 is enabled. > > IOW: we always compare IPv6 addresses. The reason we might care in this case is nlm_cmp_addr() is executed more frequently than nfs_sockaddr_match_ipaddr(). Mapping the server address in nlmclnt_init() means we translate the server address once and are done with it. We never have to map incoming AF_INET addresses in NLM requests, and we don't have the extra conditionals every time we go through nlm_cmp_addr(). This keeps nlm_cmp_addr() as simple as it can be: it compares only two AF_INET addresses or two AF_INET6 addresses. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 20:43 ` Chuck Lever @ 2009-02-12 20:54 ` Trond Myklebust [not found] ` <1234472083.7190.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 2009-02-12 22:02 ` Trond Myklebust 1 sibling, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 20:54 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote: > On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote: > > On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote: > >> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote: > >>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: > >>>> I wasn't sure exactly where the compared addresses came from. I > >>>> had > >>>> assumed that they all came through the listener, so we wouldn't > >>>> need > >>>> this kind of translation. It shouldn't be difficult to map > >>>> addresses > >>>> passed in via nlmclnt_init() to AF_INET6. > >>>> > >>>> But this is the kind of thing that makes "falling back" to an > >>>> AF_INET > >>>> listener a little challenging. We will have to record what flavor > >>>> the > >>>> listener is and do a translation depending on what listener family > >>>> was > >>>> actually created. > >>> > >>> Why? Should we care whether we're receiving IPv4 addresses or IPv6 > >>> v4-mapped addresses? They're the same thing... > >> > >> The problem is the listener family is now decided at run-time. If an > >> AF_INET6 listener can't be created, an AF_INET listener is created > >> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. If an > >> AF_INET listener is created, we get only IPv4 addresses in svc_rqst- > >>> rq_addr. > > > > You're missing my point. Why should we care if it's one or the > > other? In > > the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it > > turns out that CONFIG_IPV6 is enabled. > > > > IOW: we always compare IPv6 addresses. > > The reason we might care in this case is nlm_cmp_addr() is executed > more frequently than nfs_sockaddr_match_ipaddr(). > > Mapping the server address in nlmclnt_init() means we translate the > server address once and are done with it. We never have to map > incoming AF_INET addresses in NLM requests, and we don't have the > extra conditionals every time we go through nlm_cmp_addr(). > > This keeps nlm_cmp_addr() as simple as it can be: it compares only two > AF_INET addresses or two AF_INET6 addresses. I don't see how that changes the general principle. All it means is that you should be caching v4 mapped addresses instead of ipv4 addresses. That would allow you to simplify nlm_cmp_addr() even further... Trond ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234472083.7190.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234472083.7190.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 21:43 ` Chuck Lever 2009-02-12 22:03 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Chuck Lever @ 2009-02-12 21:43 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Feb 12, 2009, at 3:54 PM, Trond Myklebust wrote: > On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote: >> On Feb 12, 2009, at 3:27 PM, Trond Myklebust wrote: >>> On Thu, 2009-02-12 at 15:11 -0500, Chuck Lever wrote: >>>> On Feb 12, 2009, at 2:43 PM, Trond Myklebust wrote: >>>>> On Thu, 2009-02-12 at 14:35 -0500, Chuck Lever wrote: >>>>>> I wasn't sure exactly where the compared addresses came from. I >>>>>> had >>>>>> assumed that they all came through the listener, so we wouldn't >>>>>> need >>>>>> this kind of translation. It shouldn't be difficult to map >>>>>> addresses >>>>>> passed in via nlmclnt_init() to AF_INET6. >>>>>> >>>>>> But this is the kind of thing that makes "falling back" to an >>>>>> AF_INET >>>>>> listener a little challenging. We will have to record what >>>>>> flavor >>>>>> the >>>>>> listener is and do a translation depending on what listener >>>>>> family >>>>>> was >>>>>> actually created. >>>>> >>>>> Why? Should we care whether we're receiving IPv4 addresses or IPv6 >>>>> v4-mapped addresses? They're the same thing... >>>> >>>> The problem is the listener family is now decided at run-time. >>>> If an >>>> AF_INET6 listener can't be created, an AF_INET listener is created >>>> instead, even if CONFIG_IPV6 || CONFIG_IPV6_MODULE is enabled. >>>> If an >>>> AF_INET listener is created, we get only IPv4 addresses in >>>> svc_rqst- >>>>> rq_addr. >>> >>> You're missing my point. Why should we care if it's one or the >>> other? In >>> the NFSv4 case, we v4map all IPv4 addresses _unconditionally_ if it >>> turns out that CONFIG_IPV6 is enabled. >>> >>> IOW: we always compare IPv6 addresses. >> >> The reason we might care in this case is nlm_cmp_addr() is executed >> more frequently than nfs_sockaddr_match_ipaddr(). >> >> Mapping the server address in nlmclnt_init() means we translate the >> server address once and are done with it. We never have to map >> incoming AF_INET addresses in NLM requests, and we don't have the >> extra conditionals every time we go through nlm_cmp_addr(). >> >> This keeps nlm_cmp_addr() as simple as it can be: it compares only >> two >> AF_INET addresses or two AF_INET6 addresses. > > I don't see how that changes the general principle. All it means is > that > you should be caching v4 mapped addresses instead of ipv4 addresses. > That would allow you to simplify nlm_cmp_addr() even further... Operationally we have to support both AF_INET and AF_INET6 addresses in the cache, because we don't know what kind of lockd listener can be created until runtime. So, I can't see how we can eliminate the AF_INET arm in nlm_cmp_addr() unless we unconditionally convert all incoming AF_INET addresses from putative PF_INET listeners _and_ convert incoming IPv4 server addresses in NFS mount requests to AF_INET6. Doesn't that add computational overhead to a fairly common case? This goes away if we ensure that the address family of the server address passed to nlmclnt_lookup_host() always matches the protocol family of lockd's listener sockets. Then address mapping overhead is entirely removed from the common cases involving PF_INET listeners. For PF_INET6 listeners, incoming IPv4 addresses are already mapped by the underlying network layer. Nothing can be done about that. But we can make sure the address family of the server address passed to nlmclnt_lookup_host() matches the incoming mapped addresses to eliminate the need for nlm_cmp_addr() to do the mapping every time it wants to compare an address. It should be fairly simple to record the listener's protocol family, check it against incoming server addresses in nlmclnt_init(), then map the address as needed. Having nlm_cmp_addr() do the mapping solves some problems, but at the cost of extra CPU time every time it is called; each loop iteration in nlm_lookup_host() for example. All I'm doing is removing a loop invariant, essentially. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 21:43 ` Chuck Lever @ 2009-02-12 22:03 ` Trond Myklebust 0 siblings, 0 replies; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 22:03 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 16:43 -0500, Chuck Lever wrote: > Having nlm_cmp_addr() do the mapping solves some problems, but at the > cost of extra CPU time every time it is called; each loop iteration in > nlm_lookup_host() for example. All I'm doing is removing a loop > invariant, essentially. nlm_lookup_host() shouldn't need to compare v4 mapped addresses and IPv4 addresses afaics. ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 20:43 ` Chuck Lever 2009-02-12 20:54 ` Trond Myklebust @ 2009-02-12 22:02 ` Trond Myklebust [not found] ` <1234476134.7190.187.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> 1 sibling, 1 reply; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 22:02 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote: > The reason we might care in this case is nlm_cmp_addr() is executed > more frequently than nfs_sockaddr_match_ipaddr(). Actually, I'm not sure this assertion is correct. The only users of nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and nlmsvc_unlock_all_by_ip(). AFAICS, the only one that needs to be v4 mapped should be nlmclnt_grant, which is not in a performance critical path... ^ permalink raw reply [flat|nested] 25+ messages in thread
[parent not found: <1234476134.7190.187.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>]
* Re: [NLM] 2.6.27.14 breakage when grace period expires [not found] ` <1234476134.7190.187.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org> @ 2009-02-12 22:11 ` Chuck Lever 2009-02-12 22:19 ` Trond Myklebust 0 siblings, 1 reply; 25+ messages in thread From: Chuck Lever @ 2009-02-12 22:11 UTC (permalink / raw) To: Trond Myklebust Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Feb 12, 2009, at 5:02 PM, Trond Myklebust wrote: > On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote: >> The reason we might care in this case is nlm_cmp_addr() is executed >> more frequently than nfs_sockaddr_match_ipaddr(). > > Actually, I'm not sure this assertion is correct. The only users of > nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and > nlmsvc_unlock_all_by_ip(). > > AFAICS, the only one that needs to be v4 mapped should be > nlmclnt_grant, > which is not in a performance critical path... So then your proposal is to ensure the two arguments of the nlm_cmp_addr() callsite in nlmclnt_grant() are both AF_INET6? That doesn't sound so bad. -- Chuck Lever chuck[dot]lever[at]oracle[dot]com ^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [NLM] 2.6.27.14 breakage when grace period expires 2009-02-12 22:11 ` Chuck Lever @ 2009-02-12 22:19 ` Trond Myklebust 0 siblings, 0 replies; 25+ messages in thread From: Trond Myklebust @ 2009-02-12 22:19 UTC (permalink / raw) To: Chuck Lever; +Cc: Frank van Maarseveen, J. Bruce Fields, Linux NFS mailing list On Thu, 2009-02-12 at 17:11 -0500, Chuck Lever wrote: > On Feb 12, 2009, at 5:02 PM, Trond Myklebust wrote: > > On Thu, 2009-02-12 at 15:43 -0500, Chuck Lever wrote: > >> The reason we might care in this case is nlm_cmp_addr() is executed > >> more frequently than nfs_sockaddr_match_ipaddr(). > > > > Actually, I'm not sure this assertion is correct. The only users of > > nlm_cmp_addr() are nlmclnt_grant(), nlm_lookup_host() and > > nlmsvc_unlock_all_by_ip(). > > > > AFAICS, the only one that needs to be v4 mapped should be > > nlmclnt_grant, > > which is not in a performance critical path... > > So then your proposal is to ensure the two arguments of the > nlm_cmp_addr() callsite in nlmclnt_grant() are both AF_INET6? Yup... I can't see that the other two callsites need anything like that. ^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2009-02-13 11:04 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-11 11:23 [NLM] 2.6.27.14 breakage when grace period expires Frank van Maarseveen
2009-02-11 20:35 ` J. Bruce Fields
2009-02-11 20:37 ` Frank van Maarseveen
2009-02-11 20:39 ` J. Bruce Fields
2009-02-11 20:57 ` Frank van Maarseveen
2009-02-12 14:28 ` Frank van Maarseveen
2009-02-12 15:16 ` Trond Myklebust
[not found] ` <1234451789.7190.38.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 15:36 ` Frank van Maarseveen
2009-02-12 18:17 ` Trond Myklebust
[not found] ` <1234462647.7190.53.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 18:29 ` Frank van Maarseveen
2009-02-12 19:10 ` Trond Myklebust
[not found] ` <1234465837.7190.62.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 19:16 ` Frank van Maarseveen
2009-02-12 20:24 ` Trond Myklebust
[not found] ` <1234470251.7190.102.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-13 11:04 ` Frank van Maarseveen
2009-02-12 19:35 ` Chuck Lever
2009-02-12 19:43 ` Trond Myklebust
[not found] ` <1234467795.7190.70.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 20:11 ` Chuck Lever
2009-02-12 20:27 ` Trond Myklebust
[not found] ` <1234470457.7190.106.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 20:43 ` Chuck Lever
2009-02-12 20:54 ` Trond Myklebust
[not found] ` <1234472083.7190.124.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 21:43 ` Chuck Lever
2009-02-12 22:03 ` Trond Myklebust
2009-02-12 22:02 ` Trond Myklebust
[not found] ` <1234476134.7190.187.camel-rJ7iovZKK19ZJLDQqaL3InhyD016LWXt@public.gmane.org>
2009-02-12 22:11 ` Chuck Lever
2009-02-12 22:19 ` Trond Myklebust
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.