Locking patches (generic & nfs)

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* Locking patches (generic & nfs)
@ 2002-07-19  8:19 Olaf Kirch
  2002-07-23 14:56 ` [NFS] " Trond Myklebust
  2002-07-25 12:43 ` Bill Davidsen
  0 siblings, 2 replies; 5+ messages in thread
From: Olaf Kirch @ 2002-07-19  8:19 UTC (permalink / raw)
  To: linux-kernel; +Cc: nfs

[-- Attachment #1: Type: text/plain, Size: 3341 bytes --]

Hi,

I've been investigating an NFS locking problem a customer
of SuSE has had between an OpenServer machine (oh boy)
acting as the NFS client and a Linux box acting as the server.

In the process of debugging this, I came across a number of
bugs in the 2.4.18 kernel.


fs/locks.c:
	When a program locks the entire file, and then does an unlock
	of just the first byte in the file, the kernel will not modify
	the existing lock because of an overflow/signedness problem.

fs/lockd/svclock.c, include/linux/lockd.h:
	Consider the following scenario:
	 client A locks a file
	 client B requests a conflicting lock, and asks
		for "blocking" mode.
		lockd creates a "struct block" and attaches
		it to the existing lock
	 client A unlocks the file
		This causes a call to nlmsvc_notify_blocked,
		which puts the blocked lock onto a list
		of locks which sould be retried, setting
		the b_when field to 0.
	The next time lockd comes around to inspecting this
	list, it should notice that the lock can now be granted,
	and send a NLM_GRANTED message to client B.

	However, due to a signedness problem, the lock is
	appended to the *end* of the list, where it's never
	picked up.

fs/lockd/svcproc.c:
	There's an interoperability problem with OpenServer and
	probably other lockd implementations when it comes to
	handling of blocked locks.
	
	The way Linux clients deal with blocked locks goes
	like this

		C->S: lock this range, block if already taken
	 (1)	S->C: blocked
		...
		(some other client removes the conflicting lock)
	 (2)	S->C: the lock has been granted
	 	C->S: ack
	 (3)	C->S: lock this range, block if already taken
		S->C: granted

	At (1), the server records the fact that there's a blocking
	lock request, and uses it at (2) to find out whom to
	notify that the previously blocked request can now
	be granted. When the client then follows up with a
	LOCK call, the server notices that there's a blocked
	lock around and destroys it.

	Now OSR and maybe other lockd implementations do not
	follow up on the GRANTED callback with another LOCK 
	call. According to the NLM spec this is sufficient,
	because the GRANTED callback actually says "the lock
	has been granted". The reason the Linux client does an
	additional LOCK call is for stability (the NLM protocol is
	full of race conditions).

	However, for this to work properly, the Linux lockd must interpret
	the client's response to the GRANTED callback. When receiving
	this "ack" (in fact, it's a GRANTED_RES call), it must look
	up the corresponding blocked lock and take if off the
	list of blocked locks. If it doesn't, server and client
	get out of sync wrt to who is blocking on what lock, and
	start timing out).
	(If you want details of what's exactly going wrong, mail me
	for a packet trace).

	At any rate, the above means that lockd needs to handle
	GRANTED_MSG properly. The functionality to do so is already
	there; it's just the handling of the RPC call itself that
	wasn't there (or has been removed for some reason).

	The second patch does this (even though for NFSv2 only; the
	NFSv3 case is analogous).

I'm also attaching the test program I used.

Cheers
Olaf
-- 
Olaf Kirch     |  Anyone who has had to work with X.509 has probably
okir@suse.de   |  experienced what can best be described as
---------------+  ISO water torture. -- Peter Gutmann

[-- Attachment #2: linux-2.4.18-locks.patch --]
[-- Type: text/plain, Size: 2508 bytes --]


	This patch addresses two problems.

	 fs/locks.c:
	 	When a program locked the entire file, and
		then did an unlock of just the first byte
		in the file, the kernel would not modify the
		existing lock because of an overflow/signedness
		problem.

	fs/lockd/*.c:
		Consider the following scenario:
		 client A locks a file
		 client B requests a conflicting lock, and asks
		 	for "blocking" mode.
			lockd creates a "struct block" and attaches
			it to the existing lock
		 client A unlocks the file
		 	This causes a call to nlmsvc_notify_blocked,
			which puts the blocked lock onto a list
			of locks which sould be retried, setting
			the b_when field to 0.
		The next time lockd comes around to inspecting this
		list, it should notice that the lock can now be granted,
		and send a NLM_GRANTED message to client B.

		However, due to a signedness problem, the lock is
		appended to the *end* of the list, where it's never
		picked up.

				Olaf Kirch -okir@suse.de

--- linux/fs/lockd/svclock.c.locks	Mon Jun 17 13:32:21 2002
+++ linux/fs/lockd/svclock.c	Mon Jun 17 13:37:36 2002
@@ -62,8 +62,8 @@
 		nlmsvc_remove_block(block);
 	bp = &nlm_blocked;
 	if (when != NLM_NEVER) {
-		if ((when += jiffies) == NLM_NEVER)
-			when ++;
+		if ((when += jiffies) > NLM_NEVER)
+			when = NLM_NEVER;
 		while ((b = *bp) && time_before_eq(b->b_when,when))
 			bp = &b->b_next;
 	} else
--- linux/fs/locks.c.locks	Thu Oct 11 16:52:18 2001
+++ linux/fs/locks.c	Mon Jun 17 13:32:35 2002
@@ -926,8 +926,11 @@
 				goto next_lock;
 			/* If the next lock in the list has entirely bigger
 			 * addresses than the new one, insert the lock here.
+			 *
+			 * be careful if fl_end == OFFSET_MAX --okir
 			 */
-			if (fl->fl_start > caller->fl_end + 1)
+			if (fl->fl_start > caller->fl_end + 1
+			 && caller->fl_end != OFFSET_MAX)
 				break;
 
 			/* If we come here, the new and old lock are of the
--- linux/include/linux/lockd/lockd.h.locks	Thu Nov 22 20:47:20 2001
+++ linux/include/linux/lockd/lockd.h	Mon Jun 17 13:38:51 2002
@@ -89,8 +89,11 @@
 /*
  * This is a server block (i.e. a lock requested by some client which
  * couldn't be granted because of a conflicting lock).
+ *
+ * XXX: Beware of signedness errors. b_when is passed as a signed long
+ * into time_before_eq et al. --okir
  */
-#define NLM_NEVER		(~(unsigned long) 0)
+#define NLM_NEVER		(0x7FFFFFF)
 struct nlm_block {
 	struct nlm_block *	b_next;		/* linked list (all blocks) */
 	struct nlm_block *	b_fnext;	/* linked list (per file) */

[-- Attachment #3: linux-2.4.18-lockd2.patch --]
[-- Type: text/plain, Size: 1449 bytes --]

diff -ur linux/fs/lockd.orig/svcproc.c linux/fs/lockd/svcproc.c
--- linux/fs/lockd.orig/svcproc.c	Thu Jul 18 10:47:35 2002
+++ linux/fs/lockd/svcproc.c	Thu Jul 18 11:19:32 2002
@@ -344,6 +344,15 @@
 	return stat;
 }
 
+static int
+nlmsvc_proc_granted_res(struct svc_rqst *rqstp, struct nlm_res *argp,
+						void           *resp)
+{
+	dprintk("lockd: GRANTED_RES   called\n");
+	nlmsvc_grant_reply(&argp->cookie, argp->status);
+	return 0;
+}
+
 /*
  * SHARE: create a DOS share or alter existing share.
  */
@@ -546,14 +555,12 @@
 #define nlmsvc_decode_lockres	nlmsvc_decode_void
 #define nlmsvc_decode_unlockres	nlmsvc_decode_void
 #define nlmsvc_decode_cancelres	nlmsvc_decode_void
-#define nlmsvc_decode_grantedres	nlmsvc_decode_void
 
 #define nlmsvc_proc_none	nlmsvc_proc_null
 #define nlmsvc_proc_test_res	nlmsvc_proc_null
 #define nlmsvc_proc_lock_res	nlmsvc_proc_null
 #define nlmsvc_proc_cancel_res	nlmsvc_proc_null
 #define nlmsvc_proc_unlock_res	nlmsvc_proc_null
-#define nlmsvc_proc_granted_res	nlmsvc_proc_null
 
 struct nlm_void			{ int dummy; };
 
@@ -583,7 +590,7 @@
   PROC(lock_res,	lockres,	norep,		res,	void),
   PROC(cancel_res,	cancelres,	norep,		res,	void),
   PROC(unlock_res,	unlockres,	norep,		res,	void),
-  PROC(granted_res,	grantedres,	norep,		res,	void),
+  PROC(granted_res,	res,		norep,		res,	void),
   /* statd callback */
   PROC(sm_notify,	reboot,		void,		reboot,	void),
   PROC(none,		void,		void,		void,	void),

[-- Attachment #4: mlock.c --]
[-- Type: text/plain, Size: 1624 bytes --]

/* Small locking test program. Run 2 or more copies on the client
 * in an NFS mounted directory.
 * Invoke as "mlock N" to test with N files. I tried both N = 2
 * and N = 20 (which produces more conflicts, but the logs are
 * also harder to track). --okir
 */
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <fcntl.h>
#include <string.h>
#include <unistd.h>

#define MAXFILES 255

static int fds[MAXFILES];

static void
lockit(int num, int len, int unlock)
{
    if (lockf(fds[num], unlock? F_ULOCK : F_LOCK, len) < 0) {
      fprintf(stderr, "Failed to %slock tfile%03d: %s\n",
      			unlock? "un" : "", num, strerror(errno));
      exit(1);
    }
}

int
main(int argc, char **argv)
{
    int	nfiles = 0, counter, rounds = 0;

    if (argc >= 2)
        nfiles = atoi(argv[1]);

    if (nfiles <= 0 || nfiles > MAXFILES)
      nfiles = MAXFILES;

    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);

    printf("Opening %d files...  ", nfiles);
    for ( counter = 0; counter < nfiles; counter++ ) {
      char filename[64];

      sprintf( filename, "tfile%03d", counter );
      fds[counter] = open(filename, O_CREAT | O_RDWR, 0644);
      if (fds[counter] < 0) {
	perror(filename);
	return 1;
      }
    }
    printf("done.\n");

    while (1) {
      printf("\r%d", rounds++);

      for (counter = 0; counter < nfiles; counter++)
        lockit(counter, 0, 0);

      /* Change 0 to 1 in the lockit call to test for the
       * lock all/unlock at offset 1 bug */
      for (counter = 0; counter < nfiles; counter++)
        lockit(counter, 0, 1);
    }

    exit(0);
}

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [NFS] Locking patches (generic & nfs)
  2002-07-19  8:19 Locking patches (generic & nfs) Olaf Kirch
@ 2002-07-23 14:56 ` Trond Myklebust
  2002-07-23 15:06   ` Olaf Kirch
  2002-07-25 12:43 ` Bill Davidsen
  1 sibling, 1 reply; 5+ messages in thread
From: Trond Myklebust @ 2002-07-23 14:56 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, nfs

Hi Olaf,

>>>>> " " == Olaf Kirch <okir@suse.de> writes:

     > --- linux/fs/lockd/svclock.c.locks Mon Jun 17 13:32:21 2002
     > +++ linux/fs/lockd/svclock.c Mon Jun 17 13:37:36 2002
     > @@ -62,8 +62,8 @@
     >  		nlmsvc_remove_block(block);
     >  	bp = &nlm_blocked; if (when != NLM_NEVER) {
     > - if ((when += jiffies) == NLM_NEVER)
     > - when ++;
     > + if ((when += jiffies) > NLM_NEVER)
     > + when = NLM_NEVER;
     >  		while ((b = *bp) &&
     >  		time_before_eq(b->b_when,when))
     >  			bp = &b->b_next;
     >  	} else

I disagree. As it stands, NLM_NEVER == (~(unsigned long)0), and "when"
is unsigned long, so the only thing we need to protect against is if
we hit the 'magic value' NLM_NEVER. Note that the time_before_eq()
comparison ensures that we cope well with jiffy wraparound etc, so the
entry should *not* in fact get put at the end of the list as you
claimed.

With the above change (plus your change to set NLM_NEVER=0x7fffffff),
we end up never retrying locks that just happen to have been put on
the list at a time when the value of 'jiffies' happens to be > 0x7fffffff.

-

The other fix for fs/locks.c looks reasonable AFAICS (but perhaps
Matthew wants to take a look?)

-

Concerning the fix implementing GRANTED_RES: I fully agree we need
it. I've just never had the time, and it's the sort of thing that
the Connectathon tests don't keep nagging at you with ;-)...

Patrice Dumas recently did some work on implementing this both for
NLMv1,2,3 and NLM4, so I was planning on integrating his changes into
2.4.20.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [NFS] Locking patches (generic & nfs)
  2002-07-23 14:56 ` [NFS] " Trond Myklebust
@ 2002-07-23 15:06   ` Olaf Kirch
  2002-07-23 15:27     ` Trond Myklebust
  0 siblings, 1 reply; 5+ messages in thread
From: Olaf Kirch @ 2002-07-23 15:06 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-kernel, nfs

On Tue, Jul 23, 2002 at 04:56:32PM +0200, Trond Myklebust wrote:
> I disagree. As it stands, NLM_NEVER == (~(unsigned long)0), and "when"
> is unsigned long, so the only thing we need to protect against is if
> we hit the 'magic value' NLM_NEVER. Note that the time_before_eq()
> comparison ensures that we cope well with jiffy wraparound etc, so the
> entry should *not* in fact get put at the end of the list as you
> claimed.
> 
> With the above change (plus your change to set NLM_NEVER=0x7fffffff),
> we end up never retrying locks that just happen to have been put on
> the list at a time when the value of 'jiffies' happens to be > 0x7fffffff.

But as it is today, all blocked locks get inserted at the end of the
list because time_before_eq does a signed comparison! With the unpatched
code, when you have a blocked lock, and the conflicting lock is removed,
lockd will never send out a GRANTED_MSG. Because the blocked lock is at the
end of the list, and never picked up.

That's the real reason for changing NLM_NEVER to the largest _signed_
quantity. And if you do that, you need to deal with jiffy wraparound.
Maybe the way I did it is not optiomal, I concede. But you can't leave
it at ~0UL.

> Patrice Dumas recently did some work on implementing this both for
> NLMv1,2,3 and NLM4, so I was planning on integrating his changes into
> 2.4.20.

As you can see from the patch, it's not really much you need to add.
The functionality is all there, one only needs to decode the GRANTED_RES
call rather than dropping it.

Olaf
-- 
Olaf Kirch     |  Anyone who has had to work with X.509 has probably
okir@suse.de   |  experienced what can best be described as
---------------+  ISO water torture. -- Peter Gutmann

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [NFS] Locking patches (generic & nfs)
  2002-07-23 15:06   ` Olaf Kirch
@ 2002-07-23 15:27     ` Trond Myklebust
  0 siblings, 0 replies; 5+ messages in thread
From: Trond Myklebust @ 2002-07-23 15:27 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, nfs

[-- Attachment #1: Type: text/plain, Size: 644 bytes --]

On Tuesday 23 July 2002 17:06, Olaf Kirch wrote:

> But as it is today, all blocked locks get inserted at the end of the
> list because time_before_eq does a signed comparison! With the unpatched
> code, when you have a blocked lock, and the conflicting lock is removed,
> lockd will never send out a GRANTED_MSG. Because the blocked lock is at the
> end of the list, and never picked up.

Fair enough: I see the bug now.

So why could we not do something like the following instead? This just ensures 
that we always leave the NLM_NEVER stuff at the end of the list which should 
suffice to keep nlmsvc_retry_blocked() happy.

Cheers,
  Trond

[-- Attachment #2: fix_svclock.dif --]
[-- Type: text/plain, Size: 397 bytes --]

--- linux/fs/lockd/svclock.c.orig	Tue Feb  5 08:52:37 2002
+++ linux/fs/lockd/svclock.c	Tue Jul 23 17:15:12 2002
@@ -64,7 +64,7 @@
 	if (when != NLM_NEVER) {
 		if ((when += jiffies) == NLM_NEVER)
 			when ++;
-		while ((b = *bp) && time_before_eq(b->b_when,when))
+		while ((b = *bp) && time_before_eq(b->b_when,when) && b->b_when != NLM_NEVER)
 			bp = &b->b_next;
 	} else
 		while ((b = *bp))

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Locking patches (generic & nfs)
  2002-07-19  8:19 Locking patches (generic & nfs) Olaf Kirch
  2002-07-23 14:56 ` [NFS] " Trond Myklebust
@ 2002-07-25 12:43 ` Bill Davidsen
  1 sibling, 0 replies; 5+ messages in thread
From: Bill Davidsen @ 2002-07-25 12:43 UTC (permalink / raw)
  To: Olaf Kirch; +Cc: linux-kernel, nfs

On Fri, 19 Jul 2002, Olaf Kirch wrote:

> I've been investigating an NFS locking problem a customer
> of SuSE has had between an OpenServer machine (oh boy)
> acting as the NFS client and a Linux box acting as the server.
> 
> In the process of debugging this, I came across a number of
> bugs in the 2.4.18 kernel.

When NFSv4 gets in the new kernel I invite you to put your eyeballs on
that! Good catch!

Clearly not a lot of users go through this code or it would have shown up
sooner.

-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2002-07-25 12:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2002-07-19  8:19 Locking patches (generic & nfs) Olaf Kirch
2002-07-23 14:56 ` [NFS] " Trond Myklebust
2002-07-23 15:06   ` Olaf Kirch
2002-07-23 15:27     ` Trond Myklebust
2002-07-25 12:43 ` Bill Davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox