All of lore.kernel.org
 help / color / mirror / Atom feed
* 2.6.19.3 client locking bug upon server reboot
@ 2007-02-26 14:15 Frank van Maarseveen
  2007-03-02 20:15 ` 2.6.x client locking bug upon server reboot (2) Frank van Maarseveen
  0 siblings, 1 reply; 2+ messages in thread
From: Frank van Maarseveen @ 2007-02-26 14:15 UTC (permalink / raw)
  To: Linux NFS mailing list

2.6.19.3, NFS V3, portmap V2, stat V1, nlm, all UDP

one server, three clients:

client1 has the lock
client2 wants the lock (fcntl blocks)
client3 will try to lock right after server has rebooted and started everything

On the server, /etc/rc2.d/S06xxxx is created to start a

	tcpdump -i eth0 -p -w /tmp/log -s 1500 >/dev/null 2>&1 &

upon the next reboot right after eth0 becomes up.

I type alt-sysrq-b on the server after a few "sync" commands. After >5
minutes client1 releases the lock and client3 obtains the lock. A few
seconds later client3 releases the lock.

	nothing happens

client2 did not try to obtain a lock anyhow since server reboot. Instead,
it hangs in rpc_wait_bit_interruptible(), only kill -9 could kill it.

The log written by tcpdump has been analyzed afterwards using wireshark
(t=<time since start>):

t=10	server->(all 3 clients): portmap getport STAT, all clients reply
	server->(all 3 clients): STAT notify, all clients reply
	client1->server: portmap getport NLM, server replies
	client1->server: NLM lock, server replies.

t=15	client3->server: portmap getport NLM, server replies
	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=20	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=25	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=30	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=35	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=40	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=45	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=50	client3->server: NLM lock, server says: NLM_DENIED_GRACE_PERIOD
t=55	client3->server: NLM lock, server says: NLM_BLOCKED
t=85	client3->server: portmap getport NLM, server replies

t=115	client3->server: NLM lock, server says: NLM_BLOCKED

[...]

t=383	client1->server: portmap getport NLM, server replies
	client1->server: NLM unlock, server replies
	server->client3: portmap getport NLM, client replies
	server->client3: NLM NULL, client replies
	server->client3: NLM GRANTED_MSG
	client3->server: portmap getport NLM, server replies
	client3->server: NLM NULL, server replies.
	client3->server: NLM GRANTED_RES, server replies.
	client3->server: reply for NLM GRANTED_MSG

t=390	client3->server: portmap getport NLM, server replies
	client3->server: NLM unlock, server replies.

Notice the absence of any client2 traffic for t>10. There is no
interesting traffic around t=10 whatsoever for client2 other than what
has been mentioned. This bug is probably reproducable without the third
client but anyway, the above is what happened during the test.

-- 
Frank

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

* 2.6.x client locking bug upon server reboot (2)
  2007-02-26 14:15 2.6.19.3 client locking bug upon server reboot Frank van Maarseveen
@ 2007-03-02 20:15 ` Frank van Maarseveen
  0 siblings, 0 replies; 2+ messages in thread
From: Frank van Maarseveen @ 2007-03-02 20:15 UTC (permalink / raw)
  To: Linux NFS mailing list

On Mon, Feb 26, 2007 at 03:15:17PM +0100, Frank van Maarseveen wrote:
> 2.6.19.3, NFS V3, portmap V2, stat V1, nlm, all UDP

nlm V4 

> 
> one server, three clients:
> 
> client1 has the lock
> client2 wants the lock (fcntl blocks)
> client3 will try to lock right after server has rebooted and started everything
> 
> On the server, /etc/rc2.d/S06xxxx is created to start a
> 
> 	tcpdump -i eth0 -p -w /tmp/log -s 1500 >/dev/null 2>&1 &
> 
> upon the next reboot right after eth0 becomes up.
> 
> I type alt-sysrq-b on the server after a few "sync" commands. After >5
> minutes client1 releases the lock and client3 obtains the lock. A few
> seconds later client3 releases the lock.
> 
> 	nothing happens
> 
> client2 did not try to obtain a lock anyhow since server reboot. Instead,
> it hangs in rpc_wait_bit_interruptible(), only kill -9 could kill it.

I tried something like this again but now using 2.6.20.1 + nfs-all
patch from Trond for 2.6.20 on server and two clients

	The results are disturbing

2 clients try to lock the same file on the server. After things
settle down the server is rebooted in a crashy fashion:

	echo b >/proc/sysrq-trigger

The NFS client which was still waiting in fcntl to obtain the lock now
returns an error to userspace:

	lck: fcntl: Input/output error

Note that these are NFSv3 hard UDP mounts. EIO should not happen for this IMO.


wireshark trace done on the failing client:
No.     Time        Source Destination    Protocol Info
     34 3.611153    client server         NLM      V4 LOCK Call (Reply In 35) FH:0x73a8a272 svid:2 pos:0-0
     35 3.611372    server client         NLM      V4 LOCK Reply (Call In 34) NLM_BLOCKED
    206 33.601700   client server         Portmap  V2 GETPORT Call NLM(100021) V:4 UDP
    215 38.600612   client server         Portmap  [RPC retransmission of #206]V2 GETPORT Call NLM(100021) V:4 UDP
    392 83.586328   client server         Portmap  V2 GETPORT Call (Reply In 393) NLM(100021) V:4 UDP
    393 83.597514   server client         Portmap  V2 GETPORT Reply (Call In 392) Port:32768
    394 83.597609   client server         NLM      V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
    396 83.597847   client server         NLM      [RPC retransmission of #394]V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
    398 83.597994   client server         NLM      [RPC retransmission of #394]V4 CANCEL Call (Reply In 395) FH:0x73a8a272 svid:2 pos:0-0
    452 92.524680   server client         Portmap  V2 GETPORT Call (Reply In 453) STAT(100024) V:1 UDP
    453 92.525074   client server         Portmap  V2 GETPORT Reply (Call In 452) Port:32771
    454 92.527572   server client         STAT     V1 NOTIFY Call (Reply In 457)
    457 92.528957   client server         STAT     V1 NOTIFY Reply (Call In 454)
    517 113.588958  client server         Portmap  V2 GETPORT Call (Reply In 518) NLM(100021) V:4 UDP
    518 113.589382  server client         Portmap  V2 GETPORT Reply (Call In 517) Port:32768
    519 113.589466  client server         NLM      V4 CANCEL Call (Reply In 520) FH:0x73a8a272 svid:2 pos:0-0

-- 
Frank

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist  -  NFS@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nfs

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2007-03-02 20:15 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-02-26 14:15 2.6.19.3 client locking bug upon server reboot Frank van Maarseveen
2007-03-02 20:15 ` 2.6.x client locking bug upon server reboot (2) Frank van Maarseveen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.