[PATCH] locks: fix possible infinite loop in posix deadlock detection

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] locks: fix possible infinite loop in posix deadlock detection
       [not found]     ` <20071026224707.GO13033@fieldses.org>
@ 2007-10-28 17:31       ` J. Bruce Fields
  2007-10-28 17:43         ` [RFC, PATCH] locks: remove " J. Bruce Fields
  2007-10-30 15:20         ` [PATCH, RESEND] locks: fix possible infinite loop in " J. Bruce Fields
  0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-28 17:31 UTC (permalink / raw)
  To: Linus Torvalds, stable
  Cc: linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

From: J. Bruce Fields <bfields@citi.umich.edu>

I think the real solution is to remove deadlock detection completely;
it's hard to imaagine applications really depend on it anyway.

For now, though, just bail out after a few iterations.

Thanks to George Davis for reporting the problem.

Cc: "George G. Davis" <gdavis@mvista.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
 fs/locks.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0127a28..131aa88 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -696,17 +696,29 @@ EXPORT_SYMBOL(posix_test_lock);
  * Note: the above assumption may not be true when handling lock requests
  * from a broken NFS client. But broken NFS clients have a lot more to
  * worry about than proper deadlock detection anyway... --okir
+ *
+ * However, the failure of this assumption (also possible in the case of
+ * multiple tasks sharing the same open file table) also means there's no
+ * guarantee that the loop below will terminate.  As a hack, we give up
+ * after a few iterations.  We don't bother returning EDEADLK in that case;
+ * the deadlock has probably already happened anyway.
  */
+
+#define MAX_DEADLK_ITERATIONS 10
+
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
 	struct file_lock *fl;
+	int i = 0;
 
 next_task:
 	if (posix_same_owner(caller_fl, block_fl))
 		return 1;
 	list_for_each_entry(fl, &blocked_list, fl_link) {
 		if (posix_same_owner(fl, block_fl)) {
+			if (i++ > MAX_DEADLK_ITERATIONS)
+				return 0;
 			fl = fl->fl_next;
 			block_fl = fl;
 			goto next_task;
-- 
1.5.3.4.208.gc990


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 17:31       ` [PATCH] locks: fix possible infinite loop in posix deadlock detection J. Bruce Fields
@ 2007-10-28 17:43         ` J. Bruce Fields
  2007-10-28 18:27           ` Matthew Wilcox
  2007-10-29  8:06           ` Alan Cox
  2007-10-30 15:20         ` [PATCH, RESEND] locks: fix possible infinite loop in " J. Bruce Fields
  1 sibling, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-28 17:43 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

From: J. Bruce Fields <bfields@citi.umich.edu>

We currently attempt to return -EDEALK to blocking fcntl() file locking
requests that would create a cycle in the graph of tasks waiting on
locks.

This is inefficient: in the general case it requires us determining
whether we're adding a cycle to an arbitrary directed acyclic graph.
And this calculation has to be performed while holding a lock (currently
the BKL) that prevents that graph from changing.

It has historically been a source of bugs; most recently it was noticed
that it could loop indefinitely while holding the BKL.

It seems unlikely to be useful to applications:
	- The difficulty of implementation has kept standards from
	  requiring it.  (E.g. SUSv3 : "Since implementation of full
	  deadlock detection is not always feasible, the [EDEADLK] error
	  was made optional.")  So portable applications may not be able to
	  depend on it.
	- It only detects deadlocks that involve nothing but local posix
	  file locks; deadlocks involving network filesystems or other kinds
	  of locks or resources are missed.

It therefore seems best to remove deadlock detection.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
 fs/locks.c |   48 ------------------------------------------------
 1 files changed, 0 insertions(+), 48 deletions(-)

This also solves the problem addressed by the previous patch.  But this
patch would require more discussion, and the problem needs to be fixed
now.

Of course, we shouldn't apply this if there's a chance that real
applications may depend on the existing (imperfect) deadlock detection.

diff --git a/fs/locks.c b/fs/locks.c
index 131aa88..564b85d 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -683,50 +683,6 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 
 EXPORT_SYMBOL(posix_test_lock);
 
-/* This function tests for deadlock condition before putting a process to
- * sleep. The detection scheme is no longer recursive. Recursive was neat,
- * but dangerous - we risked stack corruption if the lock data was bad, or
- * if the recursion was too deep for any other reason.
- *
- * We rely on the fact that a task can only be on one lock's wait queue
- * at a time. When we find blocked_task on a wait queue we can re-search
- * with blocked_task equal to that queue's owner, until either blocked_task
- * isn't found, or blocked_task is found on a queue owned by my_task.
- *
- * Note: the above assumption may not be true when handling lock requests
- * from a broken NFS client. But broken NFS clients have a lot more to
- * worry about than proper deadlock detection anyway... --okir
- *
- * However, the failure of this assumption (also possible in the case of
- * multiple tasks sharing the same open file table) also means there's no
- * guarantee that the loop below will terminate.  As a hack, we give up
- * after a few iterations.  We don't bother returning EDEADLK in that case;
- * the deadlock has probably already happened anyway.
- */
-
-#define MAX_DEADLK_ITERATIONS 10
-
-static int posix_locks_deadlock(struct file_lock *caller_fl,
-				struct file_lock *block_fl)
-{
-	struct file_lock *fl;
-	int i = 0;
-
-next_task:
-	if (posix_same_owner(caller_fl, block_fl))
-		return 1;
-	list_for_each_entry(fl, &blocked_list, fl_link) {
-		if (posix_same_owner(fl, block_fl)) {
-			if (i++ > MAX_DEADLK_ITERATIONS)
-				return 0;
-			fl = fl->fl_next;
-			block_fl = fl;
-			goto next_task;
-		}
-	}
-	return 0;
-}
-
 /* Try to create a FLOCK lock on filp. We always insert new FLOCK locks
  * after any leases, but before any posix locks.
  *
@@ -846,10 +802,6 @@ static int __posix_lock_file(struct inode *inode, struct file_lock *request, str
 			error = -EAGAIN;
 			if (!(request->fl_flags & FL_SLEEP))
 				goto out;
-			error = -EDEADLK;
-			if (posix_locks_deadlock(request, fl))
-				goto out;
-			error = -EAGAIN;
 			locks_insert_block(fl, request);
 			goto out;
   		}
-- 
1.5.3.4.208.gc990


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 17:43         ` [RFC, PATCH] locks: remove " J. Bruce Fields
@ 2007-10-28 18:27           ` Matthew Wilcox
  2007-10-28 18:40             ` Alan Cox
  2007-10-29  8:06           ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 18:27 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Linus Torvalds, linux-kernel, George G. Davis, Andrew Morton,
	linux-fsdevel

On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> We currently attempt to return -EDEALK to blocking fcntl() file locking
> requests that would create a cycle in the graph of tasks waiting on
> locks.
> 
> This is inefficient: in the general case it requires us determining
> whether we're adding a cycle to an arbitrary directed acyclic graph.
> And this calculation has to be performed while holding a lock (currently
> the BKL) that prevents that graph from changing.
> 
> It has historically been a source of bugs; most recently it was noticed
> that it could loop indefinitely while holding the BKL.

It can also return -EDEADLK spuriously.  So yeah, just kill it.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 18:27           ` Matthew Wilcox
@ 2007-10-28 18:40             ` Alan Cox
  2007-10-28 20:11               ` Matthew Wilcox
  2007-10-29  1:13               ` J. Bruce Fields
  0 siblings, 2 replies; 28+ messages in thread
From: Alan Cox @ 2007-10-28 18:40 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Sun, 28 Oct 2007 12:27:32 -0600
Matthew Wilcox <matthew@wil.cx> wrote:

> On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> > We currently attempt to return -EDEALK to blocking fcntl() file locking
> > requests that would create a cycle in the graph of tasks waiting on
> > locks.
> > 
> > This is inefficient: in the general case it requires us determining
> > whether we're adding a cycle to an arbitrary directed acyclic graph.
> > And this calculation has to be performed while holding a lock (currently
> > the BKL) that prevents that graph from changing.
> > 
> > It has historically been a source of bugs; most recently it was noticed
> > that it could loop indefinitely while holding the BKL.
> 
> It can also return -EDEADLK spuriously.  So yeah, just kill it.

NAK. This is an ABI change. It was also comprehensively rejected before
because

- EDEADLK behaviour is ABI
- EDEADLK behaviour is required by SuSv3
- We have no idea what applications may rely on this behaviour.

and also SuSv3 is required by LSB

See the thread
	http://osdir.com/ml/file-systems/2004-06/msg00017.html

so we need to fix the bugs - the lock usage and the looping. At that
point it merely becomes a performance concern to those who use it, which
is the proper behaviour. If you want a faster non-checking one use
flock(), or add another flag that is a Linux "don't check for deadlock"

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 18:40             ` Alan Cox
@ 2007-10-28 20:11               ` Matthew Wilcox
  2007-10-28 21:38                 ` Alan Cox
  2007-10-28 21:50                 ` Trond Myklebust
  2007-10-29  1:13               ` J. Bruce Fields
  1 sibling, 2 replies; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 20:11 UTC (permalink / raw)
  To: Alan Cox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> NAK. This is an ABI change. It was also comprehensively rejected before
> because
> 
> - EDEADLK behaviour is ABI

Not in any meaningful way.

> - EDEADLK behaviour is required by SuSv3

What SuSv3 actually says is:

	If the system detects that sleeping until a locked region is
	unlocked would cause a deadlock, fcntl() shall fail with an
	[EDEADLK] error.

It doesn't require the system to detect it, only mandate what to return
if it does detect it.

> - We have no idea what applications may rely on this behaviour.

I've never heard of one that does.

> so we need to fix the bugs - the lock usage and the looping. At that
> point it merely becomes a performance concern to those who use it, which
> is the proper behaviour. If you want a faster non-checking one use
> flock(), or add another flag that is a Linux "don't check for deadlock"

You can't fix the false EDEADLK detection without solving the halting
problem.  Best of luck with that.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 20:11               ` Matthew Wilcox
@ 2007-10-28 21:38                 ` Alan Cox
  2007-10-28 21:45                   ` Jiri Kosina
  2007-10-28 23:38                   ` Matthew Wilcox
  2007-10-28 21:50                 ` Trond Myklebust
  1 sibling, 2 replies; 28+ messages in thread
From: Alan Cox @ 2007-10-28 21:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

> > - EDEADLK behaviour is ABI
> 
> Not in any meaningful way.

I've seen SYS5 software that relies on it so we should be careful. Again
see the 2004 discussion where the conclusion was that EDEADLK should stay
> 
> > - EDEADLK behaviour is required by SuSv3
> 
> What SuSv3 actually says is:
> 
> 	If the system detects that sleeping until a locked region is
> 	unlocked would cause a deadlock, fcntl() shall fail with an
> 	[EDEADLK] error.
> 
> It doesn't require the system to detect it, only mandate what to return
> if it does detect it.

We should be detecting at least the obvious case.

> > - We have no idea what applications may rely on this behaviour.
> 
> I've never heard of one that does.

Very scientific. I have on SYS5 though not afaik Linux

> > so we need to fix the bugs - the lock usage and the looping. At that
> > point it merely becomes a performance concern to those who use it, which
> > is the proper behaviour. If you want a faster non-checking one use
> > flock(), or add another flag that is a Linux "don't check for deadlock"
> 
> You can't fix the false EDEADLK detection without solving the halting
> problem.  Best of luck with that.

A good question to ask here would be what subset of deadlock loops on
flock does SYS5 Unix error. I also don't see why you need to solve the
halting problem

If SYSV only spots simple AB - BA deadlocks or taking the same lock twice
yourself then that ought to be sufficient for us too.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 21:38                 ` Alan Cox
@ 2007-10-28 21:45                   ` Jiri Kosina
  2007-10-28 23:38                   ` Matthew Wilcox
  1 sibling, 0 replies; 28+ messages in thread
From: Jiri Kosina @ 2007-10-28 21:45 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel, Alan Cox

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> You can't fix the false EDEADLK detection without solving the halting
> problem.  Best of luck with that.

Could you please elaborate a little bit more on this? I don't see how 
detecting loops in graph relates to solving halting problem.

Of course the halting problem can be transformed to deadlock-detection 
problem, but this relates to static code analysis, right? Not anything we 
are interested in, i.e. tracking things in runtime and detecting loops in 
simple dependency graphs.

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 20:11               ` Matthew Wilcox
  2007-10-28 21:38                 ` Alan Cox
@ 2007-10-28 21:50                 ` Trond Myklebust
  2007-10-28 22:41                   ` Matthew Wilcox
  1 sibling, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2007-10-28 21:50 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alan Cox, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel


On Sun, 2007-10-28 at 14:11 -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> > so we need to fix the bugs - the lock usage and the looping. At that
> > point it merely becomes a performance concern to those who use it, which
> > is the proper behaviour. If you want a faster non-checking one use
> > flock(), or add another flag that is a Linux "don't check for deadlock"
> 
> You can't fix the false EDEADLK detection without solving the halting
> problem.  Best of luck with that.

I can see that it would be difficult to do efficiently, but basically,
this boils down to finding a circular path in a graph. That is hardly an
unsolvable issue...

Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 21:50                 ` Trond Myklebust
@ 2007-10-28 22:41                   ` Matthew Wilcox
  2007-10-28 22:48                     ` Alan Cox
                                       ` (3 more replies)
  0 siblings, 4 replies; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 22:41 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Alan Cox, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > You can't fix the false EDEADLK detection without solving the halting
> > problem.  Best of luck with that.
> 
> I can see that it would be difficult to do efficiently, but basically,
> this boils down to finding a circular path in a graph. That is hardly an
> unsolvable issue...

Bzzt.  You get a false deadlock with multiple threads like so:

Thread A of task B takes lock 1
Thread C of task D takes lock 2
Thread C of task D blocks on lock 1
Thread E of task B blocks on lock 2

We currently declare deadlock at this point (unless the deadlock detection
code has changed since I last looked at it), despite thread A being about
to release lock 1.  Oh, and by the way, thread E is capable of releasing
lock 1, so you can't just say "well, detect by thread instead of by task".

So the only way I can see to accurately detect deadlock is to simulate
the future execution of all threads in task B to see if any of them
will release lock 1 without first gaining lock 2.  Which, I believe,
is halting-equivalent.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:41                   ` Matthew Wilcox
@ 2007-10-28 22:48                     ` Alan Cox
  2007-10-28 22:55                       ` Matthew Wilcox
  2007-10-28 22:55                     ` Jiri Kosina
                                       ` (2 subsequent siblings)
  3 siblings, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-10-28 22:48 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Trond Myklebust, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

> Bzzt.  You get a false deadlock with multiple threads like so:
> 
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

The spec and SYSV certainly ignore threading in this situation and you
know that perfectly well (or did in 2004)

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:41                   ` Matthew Wilcox
  2007-10-28 22:48                     ` Alan Cox
@ 2007-10-28 22:55                     ` Jiri Kosina
  2007-10-28 23:31                       ` Matthew Wilcox
  2007-10-29  2:10                     ` J. Bruce Fields
  2007-10-29  3:26                     ` Trond Myklebust
  3 siblings, 1 reply; 28+ messages in thread
From: Jiri Kosina @ 2007-10-28 22:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Trond Myklebust, Alan Cox, J. Bruce Fields, Linus Torvalds,
	linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> Bzzt.  You get a false deadlock with multiple threads like so:
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

	A potential for deadlock occurs if a process controlling a locked 
	region is put to sleep by attempting to lock another process' 
	locked region. If the system detects that sleeping until a locked 
	region is unlocked would cause a deadlock, fcntl() shall fail with 
	an [EDEADLK] error.

This is what POSIX says [1], even after being modified with respect to 
POSIX Threads Extension, right?

So it doesn't deal with threads at all, just processess are taken into 
account. Probably for a reason :)

[1] http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:48                     ` Alan Cox
@ 2007-10-28 22:55                       ` Matthew Wilcox
  2007-10-28 23:38                         ` Alan Cox
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 22:55 UTC (permalink / raw)
  To: Alan Cox
  Cc: Trond Myklebust, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 10:48:33PM +0000, Alan Cox wrote:
> > Bzzt.  You get a false deadlock with multiple threads like so:
> > 
> > Thread A of task B takes lock 1
> > Thread C of task D takes lock 2
> > Thread C of task D blocks on lock 1
> > Thread E of task B blocks on lock 2
> 
> The spec and SYSV certainly ignore threading in this situation and you
> know that perfectly well (or did in 2004)

The discussion petered out (or that mailing list archive lost articles
from the thread) without any kind of resolution, or indeed interest.
What is your suggestion for handling this problem?  As it is now, the
kernel 'detects' deadlock where there is none, which doesn't seem
allowed by SuSv3 either.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:55                     ` Jiri Kosina
@ 2007-10-28 23:31                       ` Matthew Wilcox
  2007-10-29  9:11                         ` Jiri Kosina
  0 siblings, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 23:31 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Trond Myklebust, Alan Cox, J. Bruce Fields, Linus Torvalds,
	linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 11:55:52PM +0100, Jiri Kosina wrote:
> On Sun, 28 Oct 2007, Matthew Wilcox wrote:
> 
> > Bzzt.  You get a false deadlock with multiple threads like so:
> > Thread A of task B takes lock 1
> > Thread C of task D takes lock 2
> > Thread C of task D blocks on lock 1
> > Thread E of task B blocks on lock 2
> 
> 	A potential for deadlock occurs if a process controlling a locked 
> 	region is put to sleep by attempting to lock another process' 
> 	locked region. If the system detects that sleeping until a locked 
> 	region is unlocked would cause a deadlock, fcntl() shall fail with 
> 	an [EDEADLK] error.
> 
> This is what POSIX says [1], even after being modified with respect to 
> POSIX Threads Extension, right?
> 
> So it doesn't deal with threads at all, just processess are taken into 
> account. Probably for a reason :)

Did you have a concrete suggestion, or are you just quoting the spec?

The problem is that it's nonsense -- processes don't sleep, threads do.
I think the key is "would deadlock", not "might deadlock".  Our current
behaviour is clearly in violation of SuSv3.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 21:38                 ` Alan Cox
  2007-10-28 21:45                   ` Jiri Kosina
@ 2007-10-28 23:38                   ` Matthew Wilcox
  2007-10-28 23:44                     ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: Matthew Wilcox @ 2007-10-28 23:38 UTC (permalink / raw)
  To: Alan Cox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 09:38:55PM +0000, Alan Cox wrote:
> > It doesn't require the system to detect it, only mandate what to return
> > if it does detect it.
> 
> We should be detecting at least the obvious case.

What is the obvious case?  A task that has never called clone()?

> If SYSV only spots simple AB - BA deadlocks or taking the same lock twice
> yourself then that ought to be sufficient for us too.

You can't deadlock against yourself -- either with POSIX locks or BSD
locks (POSIX simple upgrades/downgrades the lock; each byte in a file
can only be in either read-locked state or write-locked state for a
given process.  BSD locks release the lock and wake all waiters before
attempting to acquire the lock in its new state).  In my other post, I
showed a simple AB-BA deadlock that can't be easily detected.

-- 
Intel are signing my paycheques ... these opinions are still mine
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:55                       ` Matthew Wilcox
@ 2007-10-28 23:38                         ` Alan Cox
  2007-10-29  2:29                           ` J. Bruce Fields
  0 siblings, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-10-28 23:38 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Trond Myklebust, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

> > The spec and SYSV certainly ignore threading in this situation and you
> > know that perfectly well (or did in 2004)
> 
> The discussion petered out (or that mailing list archive lost articles
> from the thread) without any kind of resolution, or indeed interest.

I think the resolution was that the EDEADLK stayed.

> What is your suggestion for handling this problem?  As it is now, the
> kernel 'detects' deadlock where there is none, which doesn't seem
> allowed by SuSv3 either

Re-read the spec. The EDEADLK doesn't account for threads, only processes.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 23:38                   ` Matthew Wilcox
@ 2007-10-28 23:44                     ` Alan Cox
  0 siblings, 0 replies; 28+ messages in thread
From: Alan Cox @ 2007-10-28 23:44 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: J. Bruce Fields, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Sun, 28 Oct 2007 17:38:14 -0600
Matthew Wilcox <matthew@wil.cx> wrote:

> On Sun, Oct 28, 2007 at 09:38:55PM +0000, Alan Cox wrote:
> > > It doesn't require the system to detect it, only mandate what to return
> > > if it does detect it.
> > 
> > We should be detecting at least the obvious case.
> 
> What is the obvious case?  A task that has never called clone()?

Simple AB BA I would have thought obvious. Clone as has been said several
times now is irrelevant as the standard is about *processes* [in the SuS
sense]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 18:40             ` Alan Cox
  2007-10-28 20:11               ` Matthew Wilcox
@ 2007-10-29  1:13               ` J. Bruce Fields
  1 sibling, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-29  1:13 UTC (permalink / raw)
  To: Alan Cox
  Cc: Matthew Wilcox, Linus Torvalds, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 06:40:52PM +0000, Alan Cox wrote:
> On Sun, 28 Oct 2007 12:27:32 -0600
> Matthew Wilcox <matthew@wil.cx> wrote:
> 
> > On Sun, Oct 28, 2007 at 01:43:21PM -0400, J. Bruce Fields wrote:
> > > We currently attempt to return -EDEALK to blocking fcntl() file locking
> > > requests that would create a cycle in the graph of tasks waiting on
> > > locks.
> > > 
> > > This is inefficient: in the general case it requires us determining
> > > whether we're adding a cycle to an arbitrary directed acyclic graph.
> > > And this calculation has to be performed while holding a lock (currently
> > > the BKL) that prevents that graph from changing.
> > > 
> > > It has historically been a source of bugs; most recently it was noticed
> > > that it could loop indefinitely while holding the BKL.
> > 
> > It can also return -EDEADLK spuriously.  So yeah, just kill it.
> 
> NAK. This is an ABI change. It was also comprehensively rejected before
> because
> 
> - EDEADLK behaviour is ABI
> - EDEADLK behaviour is required by SuSv3
> - We have no idea what applications may rely on this behaviour.

On the second point I think you're mistaken; see

	http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html

	"Since implementation of full deadlock detection is not
	always feasible, the [EDEADLK] error was made optional."

The third objection is the one that concerns me most, and is the one I'd
like to understand better.  So I'd be interested in any evidence of
applications that do actually depend on the current behavior.  (Even
hypothetical use cases might be interesting.)  I'm also curious what
other OS's do.

> See the thread
> 	http://osdir.com/ml/file-systems/2004-06/msg00017.html
> 
> so we need to fix the bugs - the lock usage and the looping. At that
> point it merely becomes a performance concern to those who use it, which
> is the proper behaviour. If you want a faster non-checking one use
> flock(), or add another flag that is a Linux "don't check for deadlock"

That's an interesting idea.  The flag might not help as much, since
looking for a cycle in the graph of blocked requests may be more
difficult in the case where we don't know whether the graph is acyclic
to start out with.  (But I don't know--I haven't thought about it much.)

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:41                   ` Matthew Wilcox
  2007-10-28 22:48                     ` Alan Cox
  2007-10-28 22:55                     ` Jiri Kosina
@ 2007-10-29  2:10                     ` J. Bruce Fields
  2007-10-29  3:26                     ` Trond Myklebust
  3 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-29  2:10 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Trond Myklebust, Alan Cox, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 04:41:57PM -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > > You can't fix the false EDEADLK detection without solving the halting
> > > problem.  Best of luck with that.
> > 
> > I can see that it would be difficult to do efficiently, but basically,
> > this boils down to finding a circular path in a graph. That is hardly an
> > unsolvable issue...
> 
> Bzzt.  You get a false deadlock with multiple threads like so:
> 
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2

Oh neat, I missed that case, thanks for pointing it out.

> We currently declare deadlock at this point (unless the deadlock detection
> code has changed since I last looked at it), despite thread A being about
> to release lock 1.  Oh, and by the way, thread E is capable of releasing
> lock 1, so you can't just say "well, detect by thread instead of by task".
> 
> So the only way I can see to accurately detect deadlock is to simulate
> the future execution of all threads in task B to see if any of them
> will release lock 1 without first gaining lock 2.

Hm.  It's annoying, but I'm not convinced it's *that* annoying.  We're
not trying to predict whether a deadlock could arise as the result of
future behavior.  We're just trying to determine whether granting the
current lock request results in an immediate deadlock consisting purely
of posix file locks.

But yes, I'm assume it's possible, for example, that a thread-exit could
race with a lock request, with the result that we see no deadlock at the
time we handle the lock request, even though at that point the last task
with the ability to solve the problem is already exiting.

Supposing that we're willing to permit the request in such cases and
return EDEADLK only in cases where we're positive there's a deadlock, is
there still some useful subset of cases where we could return EDEADLK?
For example, could we take note of tasks that, when they block on a
lock, have a current->files with reference count one, and only follow
cycles consisting of such blocks?

I'm still not convinced it's worth the trouble, but I suspect you're
overstating the difficulty.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 23:38                         ` Alan Cox
@ 2007-10-29  2:29                           ` J. Bruce Fields
  2007-10-29  8:08                             ` Alan Cox
  2007-10-29  9:15                             ` Jiri Kosina
  0 siblings, 2 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-29  2:29 UTC (permalink / raw)
  To: Alan Cox
  Cc: Matthew Wilcox, Trond Myklebust, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

On Sun, Oct 28, 2007 at 11:38:26PM +0000, Alan Cox wrote:
> > > The spec and SYSV certainly ignore threading in this situation and you
> > > know that perfectly well (or did in 2004)
> > 
> > The discussion petered out (or that mailing list archive lost articles
> > from the thread) without any kind of resolution, or indeed interest.
> 
> I think the resolution was that the EDEADLK stayed.
> 
> > What is your suggestion for handling this problem?  As it is now, the
> > kernel 'detects' deadlock where there is none, which doesn't seem
> > allowed by SuSv3 either
> 
> Re-read the spec. The EDEADLK doesn't account for threads, only processes.

Do you have in mind a way to take advantage of that distinction?

The practical requirement I see here is that our deadlock detection
should never give false positives.  If we return EDEADLK to applications
with locking schemes that don't actually deadlock, then we're telling
application designers that avoiding deadlock in itself isn't
sufficient--they also have to know our particular deadlock detection
algorithm, so as to avoid giving even the appearance of deadlock.

And if posix file locks are to be useful to threaded applications, then
we have to preserve the same no-false-positives requirement for them as
well.

But, OK, if we can identify unshared current->files at the time we put a
task to sleep, then a slight modification of our current algorithm might
be sufficient to detect any deadlock that involves purely posix file
locks and processes.  And we can tell people that avoiding deadlock is
their problem as soon as any task with a shared current->files is
involved.  (Ditto, I assume, if nfsd/lockd acquires a lock.)

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 22:41                   ` Matthew Wilcox
                                       ` (2 preceding siblings ...)
  2007-10-29  2:10                     ` J. Bruce Fields
@ 2007-10-29  3:26                     ` Trond Myklebust
  3 siblings, 0 replies; 28+ messages in thread
From: Trond Myklebust @ 2007-10-29  3:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Alan Cox, J. Bruce Fields, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel


On Sun, 2007-10-28 at 16:41 -0600, Matthew Wilcox wrote:
> On Sun, Oct 28, 2007 at 05:50:30PM -0400, Trond Myklebust wrote:
> > > You can't fix the false EDEADLK detection without solving the halting
> > > problem.  Best of luck with that.
> > 
> > I can see that it would be difficult to do efficiently, but basically,
> > this boils down to finding a circular path in a graph. That is hardly an
> > unsolvable issue...
> 
> Bzzt.  You get a false deadlock with multiple threads like so:
> 
> Thread A of task B takes lock 1
> Thread C of task D takes lock 2
> Thread C of task D blocks on lock 1
> Thread E of task B blocks on lock 2
> 
> We currently declare deadlock at this point (unless the deadlock detection
> code has changed since I last looked at it), despite thread A being about
> to release lock 1.  Oh, and by the way, thread E is capable of releasing
> lock 1, so you can't just say "well, detect by thread instead of by task".
> 
> So the only way I can see to accurately detect deadlock is to simulate
> the future execution of all threads in task B to see if any of them
> will release lock 1 without first gaining lock 2.  Which, I believe,
> is halting-equivalent.

As several people have told you, the SUSv3 section on fcntl and
deadlocks reads as follows:

        "A potential for deadlock occurs if a process controlling a
        locked region is put to sleep by attempting to lock another
        process' locked region. If the system detects that sleeping
        until a locked region is unlocked would cause a deadlock,
        fcntl() shall fail with an [EDEADLK] error."

There is no mention there or anywhere else of a need to make exceptions
when dealing with threads. The posix locking model is _process_ based,
and so our deadlock detection only needs to take that into account.
If programmers choose to play tricksy little games with threads, then it
is their responsibility to ensure that the application doesn't get into
a situation where the posix deadlock detection model breaks down.

Trond


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 17:43         ` [RFC, PATCH] locks: remove " J. Bruce Fields
  2007-10-28 18:27           ` Matthew Wilcox
@ 2007-10-29  8:06           ` Alan Cox
  2007-10-30 15:51             ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: Alan Cox @ 2007-10-29  8:06 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Linus Torvalds, linux-kernel, George G. Davis, Andrew Morton,
	linux-fsdevel

On Sun, 28 Oct 2007 13:43:21 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> From: J. Bruce Fields <bfields@citi.umich.edu>
> 
> We currently attempt to return -EDEALK to blocking fcntl() file locking
> requests that would create a cycle in the graph of tasks waiting on
> locks.
> 
> This is inefficient: in the general case it requires us determining
> whether we're adding a cycle to an arbitrary directed acyclic graph.
> And this calculation has to be performed while holding a lock (currently
> the BKL) that prevents that graph from changing.
> 
> It has historically been a source of bugs; most recently it was noticed
> that it could loop indefinitely while holding the BKL.
> 
> It seems unlikely to be useful to applications:
> 	- The difficulty of implementation has kept standards from
> 	  requiring it.  (E.g. SUSv3 : "Since implementation of full
> 	  deadlock detection is not always feasible, the [EDEADLK] error
> 	  was made optional.")  So portable applications may not be able to
> 	  depend on it.
> 	- It only detects deadlocks that involve nothing but local posix
> 	  file locks; deadlocks involving network filesystems or other kinds
> 	  of locks or resources are missed.
> 
> It therefore seems best to remove deadlock detection.
> 
> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>


NAK. This is an ABI change and one that was rejected before when this was
last discussed in detail. Moving it out of BKL makes a ton of sense, even
adding a "don't check" flag makes a lot of sense. Removing the checking
does not.

I'd much rather see


	if (flags & FL_NODLCHECK)
		posix_deadlock_detect(....)


The failure case for removing this feature is obscure and hard to debug
application hangs for the afflicted programs - not nice for users at all.

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-29  2:29                           ` J. Bruce Fields
@ 2007-10-29  8:08                             ` Alan Cox
  2007-10-29  9:15                             ` Jiri Kosina
  1 sibling, 0 replies; 28+ messages in thread
From: Alan Cox @ 2007-10-29  8:08 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Matthew Wilcox, Trond Myklebust, Linus Torvalds, linux-kernel,
	George G. Davis, Andrew Morton, linux-fsdevel

> And if posix file locks are to be useful to threaded applications, then
> we have to preserve the same no-false-positives requirement for them as
> well.

It isn't useful to threaded applications. The specification requires
this. Which is another reason for having an additional Linux (for now)
flag to say "don't bother"

Alan

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-28 23:31                       ` Matthew Wilcox
@ 2007-10-29  9:11                         ` Jiri Kosina
  0 siblings, 0 replies; 28+ messages in thread
From: Jiri Kosina @ 2007-10-29  9:11 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Trond Myklebust, Alan Cox, J. Bruce Fields, Linus Torvalds,
	linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

On Sun, 28 Oct 2007, Matthew Wilcox wrote:

> > 	A potential for deadlock occurs if a process controlling a locked 
> > 	region is put to sleep by attempting to lock another process' 
> > 	locked region. If the system detects that sleeping until a locked 
> > 	region is unlocked would cause a deadlock, fcntl() shall fail with 
> > 	an [EDEADLK] error.
> > This is what POSIX says [1], even after being modified with respect to 
> > POSIX Threads Extension, right? So it doesn't deal with threads at 
> > all, just processess are taken into account. Probably for a reason :)
> Did you have a concrete suggestion, or are you just quoting the spec?

I was quoting the spec and I thought that the suggestion is implicit -- 
the specification defines what happens only when processess are in place. 
If the application uses POSIX threads in combination with locks, it is 
going to receive undefined behavior.

> The problem is that it's nonsense -- processes don't sleep, threads do. 
> I think the key is "would deadlock", not "might deadlock".  Our current 
> behaviour is clearly in violation of SuSv3.

- either we can add a special fcntl() Linux-specific flag, that will ask 
  the kernel not to perform deadlock detection. Any application that is 
  combining threads and posix locks (and thus IMHO asking for undefined 
  behavior) could use this flag and not receive EDEADLK any more

- or we can add some heuristics here-and-there to track which 
  current->files are shared and which are not, and do not return EDEADLK 
  in the case of shared ->files (could be a little bit tricky)

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-29  2:29                           ` J. Bruce Fields
  2007-10-29  8:08                             ` Alan Cox
@ 2007-10-29  9:15                             ` Jiri Kosina
  2007-10-30 15:35                               ` J. Bruce Fields
  1 sibling, 1 reply; 28+ messages in thread
From: Jiri Kosina @ 2007-10-29  9:15 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Alan Cox, Matthew Wilcox, Trond Myklebust, Linus Torvalds,
	linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

On Sun, 28 Oct 2007, J. Bruce Fields wrote:

> But, OK, if we can identify unshared current->files at the time we put a 
> task to sleep, then a slight modification of our current algorithm might 
> be sufficient to detect any deadlock that involves purely posix file 
> locks and processes.  And we can tell people that avoiding deadlock is 
> their problem as soon as any task with a shared current->files is 
> involved.  (Ditto, I assume, if nfsd/lockd acquires a lock.)

Don't forget that comparing file_lock->fl_owner (i.e. current->files) is 
not the only way how lock ownership could be computed (there could be 
specific file_lock->fl_lmops->fl_compare_owner() and all of them should 
be teached this new semantics, right?).

-- 
Jiri Kosina

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH, RESEND] locks: fix possible infinite loop in posix deadlock detection
  2007-10-28 17:31       ` [PATCH] locks: fix possible infinite loop in posix deadlock detection J. Bruce Fields
  2007-10-28 17:43         ` [RFC, PATCH] locks: remove " J. Bruce Fields
@ 2007-10-30 15:20         ` J. Bruce Fields
  2007-10-30 15:35           ` Alan Cox
  1 sibling, 1 reply; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-30 15:20 UTC (permalink / raw)
  To: Linus Torvalds, stable
  Cc: linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

From: J. Bruce Fields <bfields@citi.umich.edu>

It's currently possible to send posix_locks_deadlock() into an infinite
loop (under the BKL).

For now, fix this just by bailing out after a few iterations.  We may
want to fix this in a way that better clarifies the semantics of
deadlock detection.  But that will take more time, and this minimal fix
is probably adequate for any realistic scenario, and is simple enough to
be appropriate for applying to stable kernels now.

Thanks to George Davis for reporting the problem.

Cc: "George G. Davis" <gdavis@mvista.com>
Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
---
 fs/locks.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

I didn't see objections to this quick fix (just to the followup that
attempts to rip out posix deadlock detection entirely), so I'm
resending with just comment modifications.

I haven't given up on a more comprehensive solution, but I think we
really need to apply some fix now.

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 0127a28..8b8388e 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -696,17 +696,28 @@ EXPORT_SYMBOL(posix_test_lock);
  * Note: the above assumption may not be true when handling lock requests
  * from a broken NFS client. But broken NFS clients have a lot more to
  * worry about than proper deadlock detection anyway... --okir
+ *
+ * However, the failure of this assumption (also possible in the case of
+ * multiple tasks sharing the same open file table) also means there's no
+ * guarantee that the loop below will terminate.  As a hack, we give up
+ * after a few iterations.
  */
+
+#define MAX_DEADLK_ITERATIONS 10
+
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
 	struct file_lock *fl;
+	int i = 0;
 
 next_task:
 	if (posix_same_owner(caller_fl, block_fl))
 		return 1;
 	list_for_each_entry(fl, &blocked_list, fl_link) {
 		if (posix_same_owner(fl, block_fl)) {
+			if (i++ > MAX_DEADLK_ITERATIONS)
+				return 0;
 			fl = fl->fl_next;
 			block_fl = fl;
 			goto next_task;
-- 
1.5.3.4.208.gc990


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-29  9:15                             ` Jiri Kosina
@ 2007-10-30 15:35                               ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-30 15:35 UTC (permalink / raw)
  To: Jiri Kosina
  Cc: Alan Cox, Matthew Wilcox, Trond Myklebust, Linus Torvalds,
	linux-kernel, George G. Davis, Andrew Morton, linux-fsdevel

On Mon, Oct 29, 2007 at 10:15:19AM +0100, Jiri Kosina wrote:
> On Sun, 28 Oct 2007, J. Bruce Fields wrote:
> 
> > But, OK, if we can identify unshared current->files at the time we put a 
> > task to sleep, then a slight modification of our current algorithm might 
> > be sufficient to detect any deadlock that involves purely posix file 
> > locks and processes.  And we can tell people that avoiding deadlock is 
> > their problem as soon as any task with a shared current->files is 
> > involved.  (Ditto, I assume, if nfsd/lockd acquires a lock.)
> 
> Don't forget that comparing file_lock->fl_owner (i.e. current->files) is 
> not the only way how lock ownership could be computed (there could be 
> specific file_lock->fl_lmops->fl_compare_owner() and all of them should 
> be teached this new semantics, right?).

Right.  So, for example, this would handle both cases as described
above.

We're turning off deadlock detection in the problematic cases just by
neglecting to add such waiting lockers to the blocked_list (which is
what we search to look for lock cycles).

(It'd be nice if we didn't have to search that list at all.  There
should be some way to set up the data structures so we can follow the
lock dependencies without having to scan through all the blocked locks
at each step.  But I haven't quite figured out how to do that yet.  And
perhaps it's not important to optimize cases where there are lots of
blocks.)

--b.

diff --git a/fs/locks.c b/fs/locks.c
index 8b8388e..5446305 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -512,6 +512,8 @@ static void locks_delete_block(struct file_lock *waiter)
 	unlock_kernel();
 }
 
+static int posix_owner_shared(struct file_lock *caller_fl);
+
 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
@@ -523,7 +525,7 @@ static void locks_insert_block(struct file_lock *blocker,
 	BUG_ON(!list_empty(&waiter->fl_block));
 	list_add_tail(&waiter->fl_block, &blocker->fl_block);
 	waiter->fl_next = blocker;
-	if (IS_POSIX(blocker))
+	if (IS_POSIX(blocker) && !posix_owner_shared(waiter))
 		list_add(&waiter->fl_link, &blocked_list);
 }
 
@@ -683,46 +685,79 @@ posix_test_lock(struct file *filp, struct file_lock *fl)
 
 EXPORT_SYMBOL(posix_test_lock);
 
-/* This function tests for deadlock condition before putting a process to
- * sleep. The detection scheme is no longer recursive. Recursive was neat,
- * but dangerous - we risked stack corruption if the lock data was bad, or
- * if the recursion was too deep for any other reason.
- *
- * We rely on the fact that a task can only be on one lock's wait queue
- * at a time. When we find blocked_task on a wait queue we can re-search
- * with blocked_task equal to that queue's owner, until either blocked_task
- * isn't found, or blocked_task is found on a queue owned by my_task.
- *
- * Note: the above assumption may not be true when handling lock requests
- * from a broken NFS client. But broken NFS clients have a lot more to
- * worry about than proper deadlock detection anyway... --okir
- *
- * However, the failure of this assumption (also possible in the case of
- * multiple tasks sharing the same open file table) also means there's no
- * guarantee that the loop below will terminate.  As a hack, we give up
- * after a few iterations.
+/*
+ * Deadlock detection: 
+ *
+ * We attempt to detect deadlocks that are due purely to posix file
+ * locks.
+ *
+ * We assume that a task can be waiting for at most one lock at a time.
+ * So for any acquired lock, the process holding that lock may be
+ * waiting on at most one other lock.  That lock in turns may be held by
+ * someone waiting for at most one other lock.  Given a requested lock
+ * caller_fl which is about to wait for a conflicting lock block_fl, we
+ * follow this chain of waiters to ensure we are not about to create a
+ * cycle.
+ *
+ * Since we do this before we ever put a process to sleep on a lock, we
+ * are ensured that there is never a cycle; that is what guarantees that
+ * the while() loop in posix_locks_deadlock() eventually completes.
+ *
+ * Note: the above assumption may not be true when handling lock
+ * requests from a broken NFS client. It may also fail in the presence
+ * of tasks (such as posix threads) sharing the same open file table.
+ *
+ * We don't necessarily care about returning EDEALK correctly in such
+ * cases, but we do need to avoid cycles in the lock dependency graph in
+ * order to ensure the loop in posix_locks_deadlock eventually
+ * terminates.  To that end, we enforce the assumption above by refusing
+ * to return EDEADLK or add to the list of blocked locks in any case
+ * where a lock owner might be able to block on more than one lock.
  */
 
-#define MAX_DEADLK_ITERATIONS 10
+static int posix_owner_shared(struct file_lock *caller_fl)
+{
+	/*
+	 * The caller is a lock manager (lockd/nfsd), and won't
+	 * necessarily guarantee that a single lock owner won't block on
+	 * two locks at once:
+	 */
+	if (caller_fl->fl_lmops && caller_fl->fl_lmops->fl_compare_owner)
+		return 1;
+	/*
+	 * Multiple tasks share current->files, also allowing the same
+	 * "owner" to block on two locks at once:
+	 */
+	if (current->files == NULL || atomic_read(&current->files->count) > 1)
+		return 1;
+	/*
+	 * The lock is not on behalf of a file manager, and no other
+	 * tasks share this file owner (and, as long as this task is
+	 * stuck waiting for a lock, that's not going to change):
+	 */
+	return 0;
+}
+
+/* Find a lock that the owner of the given block_fl is blocking on. */
+static struct file_lock * what_owner_is_waiting_for(struct file_lock *block_fl)
+{
+	struct file_lock *fl;
+
+	list_for_each_entry(fl, &blocked_list, fl_link)
+		if (posix_same_owner(fl, block_fl))
+			return fl->fl_next;
+	return NULL;
+}
 
 static int posix_locks_deadlock(struct file_lock *caller_fl,
 				struct file_lock *block_fl)
 {
-	struct file_lock *fl;
-	int i = 0;
+	if (posix_owner_shared(caller_fl))
+		return 0;
 
-next_task:
-	if (posix_same_owner(caller_fl, block_fl))
-		return 1;
-	list_for_each_entry(fl, &blocked_list, fl_link) {
-		if (posix_same_owner(fl, block_fl)) {
-			if (i++ > MAX_DEADLK_ITERATIONS)
-				return 0;
-			fl = fl->fl_next;
-			block_fl = fl;
-			goto next_task;
-		}
-	}
+	while ((block_fl = what_owner_is_waiting_for(block_fl)))
+		if (posix_same_owner(caller_fl, block_fl))
+			return 1;
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH, RESEND] locks: fix possible infinite loop in posix deadlock detection
  2007-10-30 15:20         ` [PATCH, RESEND] locks: fix possible infinite loop in " J. Bruce Fields
@ 2007-10-30 15:35           ` Alan Cox
  0 siblings, 0 replies; 28+ messages in thread
From: Alan Cox @ 2007-10-30 15:35 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Linus Torvalds, stable, linux-kernel, George G. Davis,
	Andrew Morton, linux-fsdevel

On Tue, 30 Oct 2007 11:20:02 -0400
"J. Bruce Fields" <bfields@fieldses.org> wrote:

> From: J. Bruce Fields <bfields@citi.umich.edu>
> 
> It's currently possible to send posix_locks_deadlock() into an infinite
> loop (under the BKL).
> 
> For now, fix this just by bailing out after a few iterations.  We may
> want to fix this in a way that better clarifies the semantics of
> deadlock detection.  But that will take more time, and this minimal fix
> is probably adequate for any realistic scenario, and is simple enough to
> be appropriate for applying to stable kernels now.
> 
> Thanks to George Davis for reporting the problem.
> 
> Cc: "George G. Davis" <gdavis@mvista.com>
> Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>

Acked-by: Alan Cox <alan@redhat.com>

Its a good fix for now and I doubt any real world user has that complex a
locking pattern to break.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [RFC, PATCH] locks: remove posix deadlock detection
  2007-10-29  8:06           ` Alan Cox
@ 2007-10-30 15:51             ` J. Bruce Fields
  0 siblings, 0 replies; 28+ messages in thread
From: J. Bruce Fields @ 2007-10-30 15:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: Linus Torvalds, linux-kernel, George G. Davis, Andrew Morton,
	linux-fsdevel

On Mon, Oct 29, 2007 at 08:06:04AM +0000, Alan Cox wrote:
> On Sun, 28 Oct 2007 13:43:21 -0400
> "J. Bruce Fields" <bfields@fieldses.org> wrote:
> 
> > From: J. Bruce Fields <bfields@citi.umich.edu>
> > 
> > We currently attempt to return -EDEALK to blocking fcntl() file locking
> > requests that would create a cycle in the graph of tasks waiting on
> > locks.
> > 
> > This is inefficient: in the general case it requires us determining
> > whether we're adding a cycle to an arbitrary directed acyclic graph.
> > And this calculation has to be performed while holding a lock (currently
> > the BKL) that prevents that graph from changing.
> > 
> > It has historically been a source of bugs; most recently it was noticed
> > that it could loop indefinitely while holding the BKL.
> > 
> > It seems unlikely to be useful to applications:
> > 	- The difficulty of implementation has kept standards from
> > 	  requiring it.  (E.g. SUSv3 : "Since implementation of full
> > 	  deadlock detection is not always feasible, the [EDEADLK] error
> > 	  was made optional.")  So portable applications may not be able to
> > 	  depend on it.
> > 	- It only detects deadlocks that involve nothing but local posix
> > 	  file locks; deadlocks involving network filesystems or other kinds
> > 	  of locks or resources are missed.
> > 
> > It therefore seems best to remove deadlock detection.
> > 
> > Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
> 
> 
> NAK. This is an ABI change

OK, fair enough.

> and one that was rejected before when this was last discussed in
> detail.

That previous discussion (http://marc.info/?t=108652857400003&r=1&w=2)
read the spec wrong and--now that I reread it--failed to address the
original bug report.  In fact it looks to me like the actual bug
reported:

	http://bugzilla.kernel.org/show_bug.cgi?id=2829

was probably a minor variant of the one which George Davis stumbled on
again here.  That's a little depressing.

> Moving it out of BKL makes a ton of sense,

That would at least reduce the impact of bugs here, yeah.  It would be
great to figure out how to start getting locks.c and lockd out from
under the BKL, but I don't personally have the time now.  (And I believe
there's been a failed attempt or two so it's not completely easy.)

> even adding a "don't check" flag makes a lot of sense.

That's an idea I'll keep in mind, though I'm not convinced it'll help
much (or that applications would actually use it).

> The failure case for removing this feature is obscure and hard to debug
> application hangs for the afflicted programs - not nice for users at all.

Yeah.  I'd still be curious for any data about applications that
actually rely on posix deadlock detection, though.

--b.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2007-10-30 15:51 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20071017185157.GC3785@mvista.com>
     [not found] ` <20071018185759.GU3785@mvista.com>
     [not found]   ` <20071026170750.GC13033@fieldses.org>
     [not found]     ` <20071026224707.GO13033@fieldses.org>
2007-10-28 17:31       ` [PATCH] locks: fix possible infinite loop in posix deadlock detection J. Bruce Fields
2007-10-28 17:43         ` [RFC, PATCH] locks: remove " J. Bruce Fields
2007-10-28 18:27           ` Matthew Wilcox
2007-10-28 18:40             ` Alan Cox
2007-10-28 20:11               ` Matthew Wilcox
2007-10-28 21:38                 ` Alan Cox
2007-10-28 21:45                   ` Jiri Kosina
2007-10-28 23:38                   ` Matthew Wilcox
2007-10-28 23:44                     ` Alan Cox
2007-10-28 21:50                 ` Trond Myklebust
2007-10-28 22:41                   ` Matthew Wilcox
2007-10-28 22:48                     ` Alan Cox
2007-10-28 22:55                       ` Matthew Wilcox
2007-10-28 23:38                         ` Alan Cox
2007-10-29  2:29                           ` J. Bruce Fields
2007-10-29  8:08                             ` Alan Cox
2007-10-29  9:15                             ` Jiri Kosina
2007-10-30 15:35                               ` J. Bruce Fields
2007-10-28 22:55                     ` Jiri Kosina
2007-10-28 23:31                       ` Matthew Wilcox
2007-10-29  9:11                         ` Jiri Kosina
2007-10-29  2:10                     ` J. Bruce Fields
2007-10-29  3:26                     ` Trond Myklebust
2007-10-29  1:13               ` J. Bruce Fields
2007-10-29  8:06           ` Alan Cox
2007-10-30 15:51             ` J. Bruce Fields
2007-10-30 15:20         ` [PATCH, RESEND] locks: fix possible infinite loop in " J. Bruce Fields
2007-10-30 15:35           ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).