All of lore.kernel.org
 help / color / mirror / Atom feed
* Should NLM resends change the xid ??
@ 2016-03-27 23:40 NeilBrown
  2016-03-28 16:04 ` Frank Filz
  2016-03-29 14:40 ` Chuck Lever
  0 siblings, 2 replies; 10+ messages in thread
From: NeilBrown @ 2016-03-27 23:40 UTC (permalink / raw)
  To: Linux NFS mailing list

[-- Attachment #1: Type: text/plain, Size: 2159 bytes --]


I've always thought that NLM was a less-than-perfect locking protocol,
but I recently discovered as aspect of it that is worse than I imagined.

Suppose client-A holds a lock on some region of a file, and client-B
makes a non-blocking lock request for that region.
Now suppose as just before handling that request the lockd thread
on the server stalls - for example due to excessive memory pressure
causing a kmalloc to take 11 seconds (rare, but possible.  Such
allocations never fail, they just block until they can be served).

During this 11 seconds (say, at the 5 second mark), client-A releases
the lock - the UNLOCK request to the server queues up behind the
non-blocking LOCK from client-B

The default retry time for NLM in Linux is 10 seconds (even for TCP!) so
NLM on client-B resends the non-blocking LOCK request, and it queues up
behind the UNLOCK request.

Now finally the lockd thread gets some memory/CPU time and starts
handling requests:
 LOCK from client-B  - DENIED
 UNLOCK from client-A - OK
 LOCK from client-B - OK

Both replies to client-B have the same XID so client-B will believe
whichever one it gets first - DENIED.

So now we have the situation where client-B doesn't think it holds a
lock, but the server thinks it does.  This is not good.

I think this explains a locking problem that a customer is seeing.  The
application seems to busy-wait for the lock using non-blocking LOCK
requests.  Each LOCK request has a different 'svid' so I assume each
comes from a different process. If you busy-wait from the one process
this problem won't occur.

Having a reply-cache on the server lockd might help, but such things
easily fill up and cannot provide a guarantee.

Having a longer timeout on the client would probably help too.  At the
very least we should increase the maximum timeout beyond 20 seconds.
(assuming I reading the code correctly, the client resend timeout is
based on nlmsvc_timeout which is set from nlm_timeout which is
restricted to the range 3-20).

Forcing the xid to change on every retransmit (for NLM) would ensure
that we only accept the last reply, which I think is safe.

Thoughts?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2016-03-30 16:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-03-27 23:40 Should NLM resends change the xid ?? NeilBrown
2016-03-28 16:04 ` Frank Filz
2016-03-28 21:58   ` Tom Talpey
2016-03-29 22:35     ` NeilBrown
2016-03-29 14:40 ` Chuck Lever
2016-03-29 22:47   ` NeilBrown
2016-03-29 23:07     ` Chuck Lever
2016-03-30  1:02       ` NeilBrown
2016-03-30 15:53         ` Chuck Lever
2016-03-30 16:07         ` Frank Filz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.