* 2.6.19.3 client locking bug upon server reboot
@ 2007-02-26 14:15 Frank van Maarseveen
2007-03-02 20:15 ` 2.6.x client locking bug upon server reboot (2) Frank van Maarseveen
0 siblings, 1 reply; 2+ messages in thread
From: Frank van Maarseveen @ 2007-02-26 14:15 UTC (permalink / raw)
To: Linux NFS mailing list
2.6.19.3, NFS V3, portmap V2, stat V1, nlm, all UDP
one server, three clients:
client1 has the lock
client2 wants the lock (fcntl blocks)
client3 will try to lock right after server has rebooted and started everything
On the server, /etc/rc2.d/S06xxxx is created to start a
tcpdump -i eth0 -p -w /tmp/log -s 1500 >/dev/null 2>&1 &
upon the next reboot right after eth0 becomes up.
I type alt-sysrq-b on the server after a few "sync" commands. After >5
minutes client1 releases the lock and client3 obtains the lock. A few
seconds later client3 releases the lock.
nothing happens
client2 did not try to obtain a lock anyhow since server reboot. Instead,
it hangs in rpc_wait_bit_interruptible(), only kill -9 could kill it.
The log written by tcpdump has been analyzed afterwards using wireshark
(t=<time since start>):
t=10 server->(all 3 clients): portmap getport STAT, all clients reply
server->(all 3 clients): STAT notify, all clients reply
client1->server: portmap getport NLM, server replies
client1->server: NLM lock, server replies.
t=15 client3->server: portmap getport NLM, server replies
client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=20 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=25 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=30 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=35 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=40 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=45 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=50 client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=55 client3->server: NLM lock, server says: NLM_BLOCKED
t=85 client3->server: portmap getport NLM, server replies
t=115 client3->server: NLM lock, server says: NLM_BLOCKED
[...]
t=383 client1->server: portmap getport NLM, server replies
client1->server: NLM unlock, server replies
server->client3: portmap getport NLM, client replies
server->client3: NLM NULL, client replies
server->client3: NLM GRANTED_MSG
client3->server: portmap getport NLM, server replies
client3->server: NLM NULL, server replies.
client3->server: NLM GRANTED_RES, server replies.
client3->server: reply for NLM GRANTED_MSG
t=390 client3->server: portmap getport NLM, server replies
client3->server: NLM unlock, server replies.
Notice the absence of any client2 traffic for t>10. There is no
interesting traffic around t=10 whatsoever for client2 other than what
has been mentioned. This bug is probably reproducable without the third
client but anyway, the above is what happened during the test.
--
Frank
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
* 2.6.x client locking bug upon server reboot (2)
2007-02-26 14:15 2.6.19.3 client locking bug upon server reboot Frank van Maarseveen
@ 2007-03-02 20:15 ` Frank van Maarseveen
0 siblings, 0 replies; 2+ messages in thread
From: Frank van Maarseveen @ 2007-03-02 20:15 UTC (permalink / raw)
To: Linux NFS mailing list
On Mon, Feb 26, 2007 at 03:15:17PM +0100, Frank van Maarseveen wrote:
> 2.6.19.3, NFS V3, portmap V2, stat V1, nlm, all UDP
nlm V4
>
> one server, three clients:
>
> client1 has the lock
> client2 wants the lock (fcntl blocks)
> client3 will try to lock right after server has rebooted and started everything
>
> On the server, /etc/rc2.d/S06xxxx is created to start a
>
> tcpdump -i eth0 -p -w /tmp/log -s 1500 >/dev/null 2>&1 &
>
> upon the next reboot right after eth0 becomes up.
>
> I type alt-sysrq-b on the server after a few "sync" commands. After >5
> minutes client1 releases the lock and client3 obtains the lock. A few
> seconds later client3 releases the lock.
>
> nothing happens
>
> client2 did not try to obtain a lock anyhow since server reboot. Instead,
> it hangs in rpc_wait_bit_interruptible(), only kill -9 could kill it.
I tried something like this again but now using 2.6.20.1 + nfs-all
patch from Trond for 2.6.20 on server and two clients
The results are disturbing
2 clients try to lock the same file on the server. After things
settle down the server is rebooted in a crashy fashion:
echo b >/proc/sysrq-trigger
The NFS client which was still waiting in fcntl to obtain the lock now
returns an error to userspace:
lck: fcntl: Input/output error
Note that these are NFSv3 hard UDP mounts. EIO should not happen for this IMO.
wireshark trace done on the failing client:
No. Time Source Destination Protocol Info
34 3.611153 client server NLM V4 LOCK Call (Reply In 35) FH:0x73a8a272 svid:2 pos:0-0
35 3.611372 server client NLM V4 LOCK Reply (Call In 34) NLM_BLOCKED
206 33.601700 client server Portmap V2 GETPORT Call NLM(100021) V:4 UDP
215 38.600612 client server Portmap [RPC retransmission of #206]V2 GETPORT Call NLM(100021) V:4 UDP
392 83.586328 client server Portmap V2 GETPORT Call (Reply In 393) NLM(100021) V:4 UDP
393 83.597514 server client Portmap V2 GETPORT Reply (Call In 392) Port:32768
394 83.597609 client server NLM V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
396 83.597847 client server NLM [RPC retransmission of #394]V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
398 83.597994 client server NLM [RPC retransmission of #394]V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
452 92.524680 server client Portmap V2 GETPORT Call (Reply In 453) STAT(100024) V:1 UDP
453 92.525074 client server Portmap V2 GETPORT Reply (Call In 452) Port:32771
454 92.527572 server client STAT V1 NOTIFY Call (Reply In 457)
457 92.528957 client server STAT V1 NOTIFY Reply (Call In 454)
517 113.588958 client server Portmap V2 GETPORT Call (Reply In 518) NLM(100021) V:4 UDP
518 113.589382 server client Portmap V2 GETPORT Reply (Call In 517) Port:32768
519 113.589466 client server NLM V4 CANCEL Call (Reply In 520) FH:0x73a8a272 svid:2 pos:0-0
--
Frank
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2007-03-02 20:15 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-26 14:15 2.6.19.3 client locking bug upon server reboot Frank van Maarseveen
2007-03-02 20:15 ` 2.6.x client locking bug upon server reboot (2) Frank van Maarseveen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.