[RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
@ 2008-12-07  2:14 Brice Goglin
  2008-12-07  2:50 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Brice Goglin @ 2008-12-07  2:14 UTC (permalink / raw)
  To: Andrew Morton, Christoph Lameter; +Cc: LKML

Hello,

I have been seeing some deadlocks that seem to be related to do_pages_stat()
page-faulting while holding the mmap_sem. Part of my sys_move_pages() rework
has been applied to 2.6.28-rc. So do_pages_stat() now gets page addresses
from user-space (and puts the result back to user-space) while holding the
mmap_sem for read. If there's a page-fault there, the page-fault handler
grabs the mmap_sem for read again. But if another thread took it for write
in the meantime (for instance in mprotect), it deadlocks since rwsem readers
are blocked if a writer is already waiting.

Reading the archives, I see some similar deadlocks a couple years ago but
I can't find the final answer/fix. From what I understand, the mmap_sem
fairness could not be changed easily. So I am not sure whether accessing
user-space while holding mmap_sem for read is still valid/recommended today.
But the behavior of do_pages_stat() changed between 2.6.27 and 2.6.28-rc
because of my patch, and this deadlock seems to be happening for real.
So I would like to fix this small regression in 2.6.28. The patch below seems
to make my do_pages_stat() deadlock disappear here.

Brice

[PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat

Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.

This patch adds a temporary address and status buffer so as to only hold
mmap_sem while working on these kernel buffers. This is implemented by
extracting do_pages_stat_array() out of do_pages_stat().

Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>

diff --git a/mm/migrate.c b/mm/migrate.c
index 1e0d6b2..4350101 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -987,25 +987,18 @@ out:
 /*
  * Determine the nodes of an array of pages and store it in an array of status.
  */
-static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
-			 const void __user * __user *pages,
-			 int __user *status)
+static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
+				const void * __user *pages, int *status)
 {
 	unsigned long i;
-	int err;

 	down_read(&mm->mmap_sem);

-	for (i = 0; i < nr_pages; i++) {
-		const void __user *p;
-		unsigned long addr;
+	for (i = 0; i < nr_pages; i++, pages++, status++) {
+		unsigned long addr = (unsigned long)(*pages);
 		struct vm_area_struct *vma;
 		struct page *page;
-
-		err = -EFAULT;
-		if (get_user(p, pages+i))
-			goto out;
-		addr = (unsigned long) p;
+		int err;

 		vma = find_vma(mm, addr);
 		if (!vma)
@@ -1024,12 +1017,59 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,

 		err = page_to_nid(page);
 set_status:
-		put_user(err, status+i);
+		*status = err;
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+/*
+ * Determine the nodes of a user array of pages and store it in
+ * a user array of status.
+ */
+static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
+			 const void __user * __user *pages,
+			 int __user *status)
+{
+	const void * __user *chunk_pages;
+	int *chunk_status;
+	unsigned long i,chunk_nr;
+	int err;
+
+	err = -ENOMEM;
+	chunk_pages = (const void * __user *)__get_free_page(GFP_KERNEL);
+	if (!chunk_pages)
+		goto out;
+	chunk_status = (int *)__get_free_page(GFP_KERNEL);
+	if (!chunk_status)
+		goto out_with_chunk_pages;
+
+	chunk_nr = PAGE_SIZE/max(sizeof(*chunk_pages), sizeof(*chunk_status));
+	for(i = 0; i < nr_pages; i += chunk_nr, pages += chunk_nr, status += chunk_nr) {
+		if (chunk_nr + i > nr_pages)
+			chunk_nr = nr_pages - i;
+
+		err = copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages));
+		if (err) {
+			err = -EFAULT;
+			goto out_with_chunk_status;
+		}
+
+		do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);
+
+		err = copy_to_user(status, chunk_status, chunk_nr * sizeof(*chunk_status));
+		if (err) {
+			err = -EFAULT;
+			goto out_with_chunk_status;
+		}
 	}
 	err = 0;

+out_with_chunk_status:
+	free_page((unsigned long)chunk_status);
+out_with_chunk_pages:
+	free_page((unsigned long)chunk_pages);
 out:
-	up_read(&mm->mmap_sem);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-07  2:14 [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat? Brice Goglin
@ 2008-12-07  2:50 ` Andrew Morton
  2008-12-07 14:21   ` Brice Goglin
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2008-12-07  2:50 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Christoph Lameter, LKML

On Sun, 07 Dec 2008 03:14:17 +0100 Brice Goglin <Brice.Goglin@inria.fr> wrote:

> Hello,
> 
> I have been seeing some deadlocks that seem to be related to do_pages_stat()
> page-faulting while holding the mmap_sem. Part of my sys_move_pages() rework
> has been applied to 2.6.28-rc. So do_pages_stat() now gets page addresses
> from user-space (and puts the result back to user-space) while holding the
> mmap_sem for read. If there's a page-fault there, the page-fault handler
> grabs the mmap_sem for read again. But if another thread took it for write
> in the meantime (for instance in mprotect), it deadlocks since rwsem readers
> are blocked if a writer is already waiting.
> 
> Reading the archives, I see some similar deadlocks a couple years ago but
> I can't find the final answer/fix. From what I understand, the mmap_sem
> fairness could not be changed easily. So I am not sure whether accessing
> user-space while holding mmap_sem for read is still valid/recommended today.
> But the behavior of do_pages_stat() changed between 2.6.27 and 2.6.28-rc
> because of my patch, and this deadlock seems to be happening for real.

Yes, that's still a bug.

Was lockdep able to tell you about this in any way?

> So I would like to fix this small regression in 2.6.28.

s/small/fairly large/:)

> The patch below seems
> to make my do_pages_stat() deadlock disappear here.
> 
> Brice
> 
> 
> 
> [PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat
>    
> Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
> gets the page address from user-space and puts the corresponding status
> back while holding the mmap_sem for read. There is no need to hold
> mmap_sem there while some page-faults may occur.
> 
> This patch adds a temporary address and status buffer so as to only hold
> mmap_sem while working on these kernel buffers. This is implemented by
> extracting do_pages_stat_array() out of do_pages_stat().
> 
> Signed-off-by: Brice Goglin <Brice.Goglin@inria.fr>
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 1e0d6b2..4350101 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -987,25 +987,18 @@ out:
>  /*
>   * Determine the nodes of an array of pages and store it in an array of status.
>   */
> -static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
> -			 const void __user * __user *pages,
> -			 int __user *status)
> +static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
> +				const void * __user *pages, int *status)
>  {
>  	unsigned long i;
> -	int err;
>  
>  	down_read(&mm->mmap_sem);
>  
> -	for (i = 0; i < nr_pages; i++) {
> -		const void __user *p;
> -		unsigned long addr;
> +	for (i = 0; i < nr_pages; i++, pages++, status++) {
> +		unsigned long addr = (unsigned long)(*pages);

Directly dereferencing a user pointer is very bad.  Fortunately, it's
just that the above __user annotation is now wrong.

>  		struct vm_area_struct *vma;
>  		struct page *page;
> -
> -		err = -EFAULT;
> -		if (get_user(p, pages+i))
> -			goto out;
> -		addr = (unsigned long) p;
> +		int err;
>  
>  		vma = find_vma(mm, addr);
>  		if (!vma)
> @@ -1024,12 +1017,59 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
>  
>  		err = page_to_nid(page);
>  set_status:
> -		put_user(err, status+i);
> +		*status = err;
> +	}
> +
> +	up_read(&mm->mmap_sem);
> +}
> +
> +/*
> + * Determine the nodes of a user array of pages and store it in
> + * a user array of status.
> + */
> +static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
> +			 const void __user * __user *pages,
> +			 int __user *status)
> +{
> +	const void * __user *chunk_pages;

This is not a userspace pointer.

> +	int *chunk_status;
> +	unsigned long i,chunk_nr;
> +	int err;
> +
> +	err = -ENOMEM;
> +	chunk_pages = (const void * __user *)__get_free_page(GFP_KERNEL);
> +	if (!chunk_pages)
> +		goto out;
> +	chunk_status = (int *)__get_free_page(GFP_KERNEL);
> +	if (!chunk_status)
> +		goto out_with_chunk_pages;

Given that this code use to perform acceptably (presumably) doing one
page at a time, I suspect that you could have retained that behaviour,
avoiding the complexity of those two arrays.  Would an additional
down_read()/up_read() per page have been unacceptably costly?

The arrays could have been allocated on the stack, I expect.  16 slots
is enough?

> +	chunk_nr = PAGE_SIZE/max(sizeof(*chunk_pages), sizeof(*chunk_status));
> +	for(i = 0; i < nr_pages; i += chunk_nr, pages += chunk_nr, status += chunk_nr) {

Please try to make this more checkpatch-friendly.  Moving the
alteration of `pages' and `status' to the end of the loop would fix
that, and would result in clearer (IMO) code.

And simply using pages[chunk_nr] everywhere would clean stuff up (IMO).

> +		if (chunk_nr + i > nr_pages)
> +			chunk_nr = nr_pages - i;
> +
> +		err = copy_from_user(chunk_pages, pages, chunk_nr * sizeof(*chunk_pages));
> +		if (err) {
> +			err = -EFAULT;
> +			goto out_with_chunk_status;
> +		}
> +
> +		do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);
> +
> +		err = copy_to_user(status, chunk_status, chunk_nr * sizeof(*chunk_status));
> +		if (err) {
> +			err = -EFAULT;
> +			goto out_with_chunk_status;
> +		}
>  	}
>  	err = 0;
>  
> +out_with_chunk_status:
> +	free_page((unsigned long)chunk_status);
> +out_with_chunk_pages:
> +	free_page((unsigned long)chunk_pages);
>  out:
> -	up_read(&mm->mmap_sem);
>  	return err;
>  }

It's a small semantic change, isn't it?  We release the semaphore in
the middle of the operation, thus presenting possibly
non-internally-consistent results to userspace.  Why does this not matter?

Given the number of __user errors this patch added, I'd recommend that
v2 be checked with sparse, please.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-07  2:50 ` Andrew Morton
@ 2008-12-07 14:21   ` Brice Goglin
  2008-12-09 14:19     ` Christoph Lameter
  2008-12-09 16:35     ` Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Brice Goglin @ 2008-12-07 14:21 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Christoph Lameter, LKML

Andrew Morton wrote:
> Was lockdep able to tell you about this in any way?
>   

With CONFIG_PROVE_LOCKING (assuming that it's enough), it doesn't detect
the problem for real. It just says "possible recursive locking detected"
between do_page_fault and sys_move_pages. I actually understood the
problem after hitting sysrq-t and always getting the following backtraces:
* thread 1
[<ffffffff80430cb8>] __down_write_nested+0x9c/0xb4
[<ffffffff8028c2c4>] sys_mprotect+0xcc/0x230
* thread 2
[<ffffffff80430d73>] __down_read+0x9c/0xb4
[<ffffffff80223f5c>] do_page_fault+0x551/0x9d9
[...]
[<ffffffff8043111a>] error_exit+0x0/0x70
[<ffffffff802f5bec>] cap_task_movememory+0x0/0x3
[<ffffffff8032b99d>] __put_user_4+0x1d/0x30
[<ffffffff802a091f>] sys_move_pages+0x453/0x4c0

> Directly dereferencing a user pointer is very bad.  Fortunately, it's
> just that the above __user annotation is now wrong.
>   

Sorry, I totally messed up my annotations. I fixed up the annotations
and applied your other changes. The new patch (below) makes sparse and
checkpatch.pl happy.

> Given that this code use to perform acceptably (presumably) doing one
> page at a time, I suspect that you could have retained that behaviour,
> avoiding the complexity of those two arrays.  Would an additional
> down_read()/up_read() per page have been unacceptably costly?
>
> The arrays could have been allocated on the stack, I expect.  16 slots
> is enough?
>   

I thought down/up_read would be more expensive but it does not seem to
be that bad:
* original code: do_pages_stat stats pages at 30-45GB/s (depending on
buffer size) on a quad-quad-core 1.9GHz opteron
* if moving down/up inside the main loop in do_pages_stat() so as to
down/up once per page, we get to 20-30GB/s
* if working on __get_free_page-based arrays (my first patch), we reach
35GB/s for large buffers (32, 128MB, or so), but it is very slow for
small buffers (about 500MB/s for 32kB).
* if allocating 16-slots arrays on the stack, we get back to 30-45GB/s
(new patch below)
So 16-slots on the stack looks like a good code-complexity/performance
compromise to me. do_pages_stat() performance doesn't look critical
anyway, as long as it's not very slow.

> It's a small semantic change, isn't it?  We release the semaphore in
> the middle of the operation, thus presenting possibly
> non-internally-consistent results to userspace.  Why does this not matter?
>   

Unless I am mistaken, both do_pages_stat() and do_pages_move() only take
the mmap_sem for read, they can happen concurrently. I think you never
had any guarantee such as migrating in one thread being seen as "atomic"
from another thread stat'ing the same buffer.

By the way, in 2.6.29, sys_move_pages() will release the semaphore in
the middle of page migration as well. My rework patch removed the need
to vmalloc a huge array, get_user everything in there, and then migrate
all pages at once. The new code will get_user a page-size array, migrate
it, and switch to the next slots (releasing the semaphore in-between).

If we look at do_pages_stat() "atomicity" versus another operation that
takes the mmap_sem for write, I don't see any actual problem. Either
do_pages_stat() doesn't care much about the operation occuring while it
released the mmap_sem (mprotect or so), or user-threads should be
synchronized instead of stat'ing some VMAs that are removed/reduced in
parallel (mmap/munmap/...).

Brice

[PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat

Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.

This patch adds a temporary address and status buffer so as to only hold
mmap_sem while working on these kernel buffers. This is implemented by
extracting do_pages_stat_array() out of do_pages_stat().

diff --git a/mm/migrate.c b/mm/migrate.c
index ded190d..bc2c773 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1018,25 +1018,18 @@ out:
 /*
  * Determine the nodes of an array of pages and store it in an array of status.
  */
-static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
-			 const void __user * __user *pages,
-			 int __user *status)
+static void do_pages_stat_array(struct mm_struct *mm, unsigned long nr_pages,
+				const void __user **pages, int *status)
 {
 	unsigned long i;
-	int err;

 	down_read(&mm->mmap_sem);

 	for (i = 0; i < nr_pages; i++) {
-		const void __user *p;
-		unsigned long addr;
+		unsigned long addr = (unsigned long)(*pages);
 		struct vm_area_struct *vma;
 		struct page *page;
-
-		err = -EFAULT;
-		if (get_user(p, pages+i))
-			goto out;
-		addr = (unsigned long) p;
+		int err;

 		vma = find_vma(mm, addr);
 		if (!vma)
@@ -1055,12 +1048,52 @@ static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,

 		err = page_to_nid(page);
 set_status:
-		put_user(err, status+i);
+		*status = err;
+
+		pages++;
+		status++;
+	}
+
+	up_read(&mm->mmap_sem);
+}
+
+/*
+ * Determine the nodes of a user array of pages and store it in
+ * a user array of status.
+ */
+static int do_pages_stat(struct mm_struct *mm, unsigned long nr_pages,
+			 const void __user * __user *pages,
+			 int __user *status)
+{
+#define DO_PAGES_STAT_CHUNK_NR 16
+	const void __user *chunk_pages[DO_PAGES_STAT_CHUNK_NR];
+	int chunk_status[DO_PAGES_STAT_CHUNK_NR];
+	unsigned long i, chunk_nr = DO_PAGES_STAT_CHUNK_NR;
+	int err;
+
+	for (i = 0; i < nr_pages; i += chunk_nr) {
+		if (chunk_nr + i > nr_pages)
+			chunk_nr = nr_pages - i;
+
+		err = copy_from_user(chunk_pages, &pages[i],
+				     chunk_nr * sizeof(*chunk_pages));
+		if (err) {
+			err = -EFAULT;
+			goto out;
+		}
+
+		do_pages_stat_array(mm, chunk_nr, chunk_pages, chunk_status);
+
+		err = copy_to_user(&status[i], chunk_status,
+				   chunk_nr * sizeof(*chunk_status));
+		if (err) {
+			err = -EFAULT;
+			goto out;
+		}
 	}
 	err = 0;

 out:
-	up_read(&mm->mmap_sem);
 	return err;
 }

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-07 14:21   ` Brice Goglin
@ 2008-12-09 14:19     ` Christoph Lameter
  2008-12-09 16:35     ` Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: Christoph Lameter @ 2008-12-09 14:19 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Andrew Morton, LKML

I guess the simplest solution would be to move the taking of mmap_sem into
the loop. Mean taking mmap_sem for every page that we determine the status of.

Signed-off-by: Christoph Lameter <clameter@sgi.com>

Index: linux-2.6/mm/migrate.c
===================================================================
--- linux-2.6.orig/mm/migrate.c	2008-12-09 08:07:08.796603952 -0600
+++ linux-2.6/mm/migrate.c	2008-12-09 08:08:36.400116263 -0600
@@ -994,8 +994,6 @@
 	unsigned long i;
 	int err;

-	down_read(&mm->mmap_sem);
-
 	for (i = 0; i < nr_pages; i++) {
 		const void __user *p;
 		unsigned long addr;
@@ -1007,12 +1005,17 @@
 			goto out;
 		addr = (unsigned long) p;

+		down_read(&mm->mmap_sem);
+
 		vma = find_vma(mm, addr);
-		if (!vma)
+		if (!vma) {
+			up_read(&mm->mmap_sem);
 			goto set_status;
-
+		}
 		page = follow_page(vma, addr, 0);

+		up_read(&mm->mmap_sem);
+
 		err = PTR_ERR(page);
 		if (IS_ERR(page))
 			goto set_status;
@@ -1029,7 +1032,6 @@
 	err = 0;

 out:
-	up_read(&mm->mmap_sem);
 	return err;
 }


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-07 14:21   ` Brice Goglin
  2008-12-09 14:19     ` Christoph Lameter
@ 2008-12-09 16:35     ` Peter Zijlstra
  2008-12-09 16:47       ` Brice Goglin
  1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2008-12-09 16:35 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Andrew Morton, Christoph Lameter, LKML

On Sun, 2008-12-07 at 15:21 +0100, Brice Goglin wrote:
> Andrew Morton wrote:
> > Was lockdep able to tell you about this in any way?
> >   
> 
> With CONFIG_PROVE_LOCKING (assuming that it's enough), it doesn't detect
> the problem for real. It just says "possible recursive locking detected"
> between do_page_fault and sys_move_pages.

That is real - how much more real do you need a description of a
recursive deadlock to be?

>  I actually understood the
> problem after hitting sysrq-t and always getting the following backtraces:
> * thread 1
> [<ffffffff80430cb8>] __down_write_nested+0x9c/0xb4
> [<ffffffff8028c2c4>] sys_mprotect+0xcc/0x230
> * thread 2
> [<ffffffff80430d73>] __down_read+0x9c/0xb4
> [<ffffffff80223f5c>] do_page_fault+0x551/0x9d9
> [...]
> [<ffffffff8043111a>] error_exit+0x0/0x70
> [<ffffffff802f5bec>] cap_task_movememory+0x0/0x3
> [<ffffffff8032b99d>] __put_user_4+0x1d/0x30
> [<ffffffff802a091f>] sys_move_pages+0x453/0x4c0


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-09 16:35     ` Peter Zijlstra
@ 2008-12-09 16:47       ` Brice Goglin
  2008-12-09 17:46         ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Brice Goglin @ 2008-12-09 16:47 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Andrew Morton, Christoph Lameter, LKML

Peter Zijlstra wrote:
> On Sun, 2008-12-07 at 15:21 +0100, Brice Goglin wrote:
>   
>> Andrew Morton wrote:
>>     
>>> Was lockdep able to tell you about this in any way?
>>>   
>>>       
>> With CONFIG_PROVE_LOCKING (assuming that it's enough), it doesn't detect
>> the problem for real. It just says "possible recursive locking detected"
>> between do_page_fault and sys_move_pages.
>>     
>
> That is real - how much more real do you need a description of a
> recursive deadlock to be?
>   

Well, it's a recursive down_read. It could be ok if we had the guarantee
that nobody else would be doing down_write in the middle. lockdep only
complained about this recursive down_read when there was a down_write
actually causing the deadlock, but it didn't say anything about this
down_write in the log.

It would be great if lockdep could say "recursive read-lock is
deadlocking because this other guy (with its backtrace) took for write
in the middle". I needed sysrq-t to get this info.

Brice

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat?
  2008-12-09 16:47       ` Brice Goglin
@ 2008-12-09 17:46         ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2008-12-09 17:46 UTC (permalink / raw)
  To: Brice Goglin; +Cc: Andrew Morton, Christoph Lameter, LKML

On Tue, 2008-12-09 at 17:47 +0100, Brice Goglin wrote:
> Peter Zijlstra wrote:
> > On Sun, 2008-12-07 at 15:21 +0100, Brice Goglin wrote:
> >   
> >> Andrew Morton wrote:
> >>     
> >>> Was lockdep able to tell you about this in any way?
> >>>   
> >>>       
> >> With CONFIG_PROVE_LOCKING (assuming that it's enough), it doesn't detect
> >> the problem for real. It just says "possible recursive locking detected"
> >> between do_page_fault and sys_move_pages.
> >>     
> >
> > That is real - how much more real do you need a description of a
> > recursive deadlock to be?
> >   
> 
> Well, it's a recursive down_read. It could be ok if we had the guarantee
> that nobody else would be doing down_write in the middle. lockdep only
> complained about this recursive down_read when there was a down_write
> actually causing the deadlock, but it didn't say anything about this
> down_write in the log.
> 
> It would be great if lockdep could say "recursive read-lock is
> deadlocking because this other guy (with its backtrace) took for write
> in the middle". I needed sysrq-t to get this info.

rwsem does not support recursive read-locks, so irrespective of write
side locks, its a bug.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-12-09 17:46 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-12-07  2:14 [RFC/PATCH] No get_user/put_user while holding mmap_sem in do_pages_stat? Brice Goglin
2008-12-07  2:50 ` Andrew Morton
2008-12-07 14:21   ` Brice Goglin
2008-12-09 14:19     ` Christoph Lameter
2008-12-09 16:35     ` Peter Zijlstra
2008-12-09 16:47       ` Brice Goglin
2008-12-09 17:46         ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox