linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] acquire mmap semaphore in pagemap_read.
@ 2009-03-12 10:33 Martin Schwidefsky
  2009-03-12 11:45 ` Alexey Dobriyan
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-12 10:33 UTC (permalink / raw)
  To: linux-kernel, linux-mm; +Cc: Matt Mackall, Gerald Schaefer, akpm

From: Martin Schwidefsky <schwidefsky@de.ibm.com>

The walk_page_range function may only be called while holding the mmap
semaphore. Otherwise a concurrent munmap could free a page table that
is read by the generic page table walker.

Cc: Matt Mackall <mpm@selenic.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
---

 fs/proc/task_mmu.c |    2 ++
 1 file changed, 2 insertions(+)

diff -urpN linux-2.6/fs/proc/task_mmu.c linux-2.6-patched/fs/proc/task_mmu.c
--- linux-2.6/fs/proc/task_mmu.c	2009-03-12 11:32:51.000000000 +0100
+++ linux-2.6-patched/fs/proc/task_mmu.c	2009-03-12 11:33:16.000000000 +0100
@@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
 	 * user buffer is tracked in "pm", and the walk
 	 * will stop when we hit the end of the buffer.
 	 */
+	down_read(&mm->mmap_sem);
 	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
+	up_read(&mm->mmap_sem);
 	if (ret == PM_END_OF_BUFFER)
 		ret = 0;
 	/* don't need mmap_sem for these, but this looks cleaner */

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 10:33 [PATCH] acquire mmap semaphore in pagemap_read Martin Schwidefsky
@ 2009-03-12 11:45 ` Alexey Dobriyan
  2009-03-12 11:54   ` Martin Schwidefsky
  0 siblings, 1 reply; 10+ messages in thread
From: Alexey Dobriyan @ 2009-03-12 11:45 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: linux-kernel, linux-mm, Matt Mackall, Gerald Schaefer, akpm

On Thu, Mar 12, 2009 at 11:33:08AM +0100, Martin Schwidefsky wrote:
> --- linux-2.6/fs/proc/task_mmu.c
> +++ linux-2.6-patched/fs/proc/task_mmu.c
> @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
>  	 * user buffer is tracked in "pm", and the walk
>  	 * will stop when we hit the end of the buffer.
>  	 */
> +	down_read(&mm->mmap_sem);
>  	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
> +	up_read(&mm->mmap_sem);

This will introduce "put_user under mmap_sem" which is deadlockable.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 11:45 ` Alexey Dobriyan
@ 2009-03-12 11:54   ` Martin Schwidefsky
  2009-03-12 15:23     ` Matt Mackall
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-12 11:54 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: linux-kernel, linux-mm, Matt Mackall, Gerald Schaefer, akpm

On Thu, 12 Mar 2009 14:45:33 +0300
Alexey Dobriyan <adobriyan@gmail.com> wrote:

> On Thu, Mar 12, 2009 at 11:33:08AM +0100, Martin Schwidefsky wrote:
> > --- linux-2.6/fs/proc/task_mmu.c
> > +++ linux-2.6-patched/fs/proc/task_mmu.c
> > @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
> >  	 * user buffer is tracked in "pm", and the walk
> >  	 * will stop when we hit the end of the buffer.
> >  	 */
> > +	down_read(&mm->mmap_sem);
> >  	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
> > +	up_read(&mm->mmap_sem);
> 
> This will introduce "put_user under mmap_sem" which is deadlockable.

Hmm, interesting. In this case the pagemap interface is fundamentally broken.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 11:54   ` Martin Schwidefsky
@ 2009-03-12 15:23     ` Matt Mackall
  2009-03-12 15:27       ` Martin Schwidefsky
  2009-03-12 15:54       ` Martin Schwidefsky
  0 siblings, 2 replies; 10+ messages in thread
From: Matt Mackall @ 2009-03-12 15:23 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Alexey Dobriyan, linux-kernel, linux-mm, Gerald Schaefer, akpm

On Thu, 2009-03-12 at 12:54 +0100, Martin Schwidefsky wrote:
> On Thu, 12 Mar 2009 14:45:33 +0300
> Alexey Dobriyan <adobriyan@gmail.com> wrote:
> 
> > On Thu, Mar 12, 2009 at 11:33:08AM +0100, Martin Schwidefsky wrote:
> > > --- linux-2.6/fs/proc/task_mmu.c
> > > +++ linux-2.6-patched/fs/proc/task_mmu.c
> > > @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
> > >  	 * user buffer is tracked in "pm", and the walk
> > >  	 * will stop when we hit the end of the buffer.
> > >  	 */
> > > +	down_read(&mm->mmap_sem);
> > >  	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
> > > +	up_read(&mm->mmap_sem);
> > 
> > This will introduce "put_user under mmap_sem" which is deadlockable.
> 
> Hmm, interesting. In this case the pagemap interface is fundamentally broken.

Well it means we may have to reintroduce the very annoying double
buffering from various earlier implementations. But let's leave this
discussion until after we've figured out what to do about the walker
code.

-- 
http://selenic.com : development and support for Mercurial and Linux


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 15:23     ` Matt Mackall
@ 2009-03-12 15:27       ` Martin Schwidefsky
  2009-03-12 15:41         ` Brice Goglin
  2009-03-12 15:54       ` Martin Schwidefsky
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-12 15:27 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Alexey Dobriyan, linux-kernel, linux-mm, Gerald Schaefer, akpm

On Thu, 12 Mar 2009 10:23:34 -0500
Matt Mackall <mpm@selenic.com> wrote:

> On Thu, 2009-03-12 at 12:54 +0100, Martin Schwidefsky wrote:
> > On Thu, 12 Mar 2009 14:45:33 +0300
> > Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > 
> > > On Thu, Mar 12, 2009 at 11:33:08AM +0100, Martin Schwidefsky wrote:
> > > > --- linux-2.6/fs/proc/task_mmu.c
> > > > +++ linux-2.6-patched/fs/proc/task_mmu.c
> > > > @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
> > > >  	 * user buffer is tracked in "pm", and the walk
> > > >  	 * will stop when we hit the end of the buffer.
> > > >  	 */
> > > > +	down_read(&mm->mmap_sem);
> > > >  	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
> > > > +	up_read(&mm->mmap_sem);
> > > 
> > > This will introduce "put_user under mmap_sem" which is deadlockable.
> > 
> > Hmm, interesting. In this case the pagemap interface is fundamentally broken.
> 
> Well it means we may have to reintroduce the very annoying double
> buffering from various earlier implementations. But let's leave this
> discussion until after we've figured out what to do about the walker
> code.

Which would be really ugly. I still have not grasped why this will
introduce a deadlock though. The worst the put_user can do is to cause
a page fault, no? I do not see where the fault handler acquires the
mmap_sem as writer. It takes the mmap_sem as reader and two readers
should be fine.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 15:27       ` Martin Schwidefsky
@ 2009-03-12 15:41         ` Brice Goglin
  2009-03-12 15:46           ` Martin Schwidefsky
  0 siblings, 1 reply; 10+ messages in thread
From: Brice Goglin @ 2009-03-12 15:41 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Matt Mackall, Alexey Dobriyan, linux-kernel, linux-mm,
	Gerald Schaefer, akpm

Martin Schwidefsky wrote:
> On Thu, 12 Mar 2009 10:23:34 -0500
> Matt Mackall <mpm@selenic.com> wrote:
> 
>> On Thu, 2009-03-12 at 12:54 +0100, Martin Schwidefsky wrote:
>>> On Thu, 12 Mar 2009 14:45:33 +0300
>>> Alexey Dobriyan <adobriyan@gmail.com> wrote:
>>>
>>>> On Thu, Mar 12, 2009 at 11:33:08AM +0100, Martin Schwidefsky wrote:
>>>>> --- linux-2.6/fs/proc/task_mmu.c
>>>>> +++ linux-2.6-patched/fs/proc/task_mmu.c
>>>>> @@ -716,7 +716,9 @@ static ssize_t pagemap_read(struct file 
>>>>>  	 * user buffer is tracked in "pm", and the walk
>>>>>  	 * will stop when we hit the end of the buffer.
>>>>>  	 */
>>>>> +	down_read(&mm->mmap_sem);
>>>>>  	ret = walk_page_range(start_vaddr, end_vaddr, &pagemap_walk);
>>>>> +	up_read(&mm->mmap_sem);
>>>> This will introduce "put_user under mmap_sem" which is deadlockable.
>>> Hmm, interesting. In this case the pagemap interface is fundamentally broken.
>> Well it means we may have to reintroduce the very annoying double
>> buffering from various earlier implementations. But let's leave this
>> discussion until after we've figured out what to do about the walker
>> code.
> 
> Which would be really ugly. I still have not grasped why this will
> introduce a deadlock though. The worst the put_user can do is to cause
> a page fault, no? I do not see where the fault handler acquires the
> mmap_sem as writer. It takes the mmap_sem as reader and two readers
> should be fine.

Somebody else can acquire for write in the meantime, for instance
another thread doing mprotect. This writer is blocked by the first
reader, and the second reader is blocked by the writer. So both
tasks are blocked.

Brice

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 15:41         ` Brice Goglin
@ 2009-03-12 15:46           ` Martin Schwidefsky
  0 siblings, 0 replies; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-12 15:46 UTC (permalink / raw)
  To: Brice Goglin
  Cc: Matt Mackall, Alexey Dobriyan, linux-kernel, linux-mm,
	Gerald Schaefer, akpm

On Thu, 12 Mar 2009 16:41:31 +0100
Brice Goglin <Brice.Goglin@ens-lyon.org> wrote:

> > Which would be really ugly. I still have not grasped why this will
> > introduce a deadlock though. The worst the put_user can do is to cause
> > a page fault, no? I do not see where the fault handler acquires the
> > mmap_sem as writer. It takes the mmap_sem as reader and two readers
> > should be fine.
> 
> Somebody else can acquire for write in the meantime, for instance
> another thread doing mprotect. This writer is blocked by the first
> reader, and the second reader is blocked by the writer. So both
> tasks are blocked.

I see, fair r/w locks. So nested down_read is a no-no.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 15:23     ` Matt Mackall
  2009-03-12 15:27       ` Martin Schwidefsky
@ 2009-03-12 15:54       ` Martin Schwidefsky
  2009-03-17 12:04         ` Martin Schwidefsky
  1 sibling, 1 reply; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-12 15:54 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Alexey Dobriyan, linux-kernel, linux-mm, Gerald Schaefer, akpm

On Thu, 12 Mar 2009 10:23:34 -0500
Matt Mackall <mpm@selenic.com> wrote:

> Well it means we may have to reintroduce the very annoying double
> buffering from various earlier implementations. But let's leave this
> discussion until after we've figured out what to do about the walker
> code.

About the walker code. I've realized that there is another way to fix
this. The TASK_SIZE definition is currently used for two things: 1) as
a maximum mappable address, 2) the size of the address space for a
process. And there lies a problem: while a process is using a reduced
page table 1) and 2) differ. If I make TASK_SIZE give you the current
size of the address space then it is not possible to mmap an object
beyond 4TB and the page table upgrade never happens. If I make
TASK_SIZE return the maximum mappable address the page table walker
breaks. The solution could be to introduce MAX_TASK_SIZE and use that
in the mmap code to find out what can be mapped.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-12 15:54       ` Martin Schwidefsky
@ 2009-03-17 12:04         ` Martin Schwidefsky
  2009-03-17 16:21           ` Matt Mackall
  0 siblings, 1 reply; 10+ messages in thread
From: Martin Schwidefsky @ 2009-03-17 12:04 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Matt Mackall, Alexey Dobriyan, linux-kernel, linux-mm,
	Gerald Schaefer, akpm

On Thu, 12 Mar 2009 16:54:51 +0100
Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:

> On Thu, 12 Mar 2009 10:23:34 -0500
> Matt Mackall <mpm@selenic.com> wrote:
> 
> > Well it means we may have to reintroduce the very annoying double
> > buffering from various earlier implementations. But let's leave this
> > discussion until after we've figured out what to do about the walker
> > code.
> 
> About the walker code. I've realized that there is another way to fix
> this. The TASK_SIZE definition is currently used for two things: 1) as
> a maximum mappable address, 2) the size of the address space for a
> process. And there lies a problem: while a process is using a reduced
> page table 1) and 2) differ. If I make TASK_SIZE give you the current
> size of the address space then it is not possible to mmap an object
> beyond 4TB and the page table upgrade never happens. If I make
> TASK_SIZE return the maximum mappable address the page table walker
> breaks. The solution could be to introduce MAX_TASK_SIZE and use that
> in the mmap code to find out what can be mapped.
 
I got around the TASK_SIZE checks with arch code only. In total I'll
need two fixes, one that makes TASK_SIZE to reflect the size of the
current address space and another one to get the page table upgrades
working again. I'll push these patches via git390 as no common code
changes are required.

Which leaves the mmap_sem issue as the only problem left.

-- 
blue skies,
   Martin.

"Reality continues to ruin my life." - Calvin.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] acquire mmap semaphore in pagemap_read.
  2009-03-17 12:04         ` Martin Schwidefsky
@ 2009-03-17 16:21           ` Matt Mackall
  0 siblings, 0 replies; 10+ messages in thread
From: Matt Mackall @ 2009-03-17 16:21 UTC (permalink / raw)
  To: Martin Schwidefsky
  Cc: Alexey Dobriyan, linux-kernel, linux-mm, Gerald Schaefer, akpm

On Tue, 2009-03-17 at 13:04 +0100, Martin Schwidefsky wrote:
> On Thu, 12 Mar 2009 16:54:51 +0100
> Martin Schwidefsky <schwidefsky@de.ibm.com> wrote:
> 
> > On Thu, 12 Mar 2009 10:23:34 -0500
> > Matt Mackall <mpm@selenic.com> wrote:
> > 
> > > Well it means we may have to reintroduce the very annoying double
> > > buffering from various earlier implementations. But let's leave this
> > > discussion until after we've figured out what to do about the walker
> > > code.
> > 
> > About the walker code. I've realized that there is another way to fix
> > this. The TASK_SIZE definition is currently used for two things: 1) as
> > a maximum mappable address, 2) the size of the address space for a
> > process. And there lies a problem: while a process is using a reduced
> > page table 1) and 2) differ. If I make TASK_SIZE give you the current
> > size of the address space then it is not possible to mmap an object
> > beyond 4TB and the page table upgrade never happens. If I make
> > TASK_SIZE return the maximum mappable address the page table walker
> > breaks. The solution could be to introduce MAX_TASK_SIZE and use that
> > in the mmap code to find out what can be mapped.
>  
> I got around the TASK_SIZE checks with arch code only. In total I'll
> need two fixes, one that makes TASK_SIZE to reflect the size of the
> current address space and another one to get the page table upgrades
> working again. I'll push these patches via git390 as no common code
> changes are required.

Thanks, Martin.

> Which leaves the mmap_sem issue as the only problem left.

Yeah, this one needs more thought. I'd really rather not go back to
double-buffering here as it was much more complicated, not to mention
slow.

-- 
http://selenic.com : development and support for Mercurial and Linux


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2009-03-17 16:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-12 10:33 [PATCH] acquire mmap semaphore in pagemap_read Martin Schwidefsky
2009-03-12 11:45 ` Alexey Dobriyan
2009-03-12 11:54   ` Martin Schwidefsky
2009-03-12 15:23     ` Matt Mackall
2009-03-12 15:27       ` Martin Schwidefsky
2009-03-12 15:41         ` Brice Goglin
2009-03-12 15:46           ` Martin Schwidefsky
2009-03-12 15:54       ` Martin Schwidefsky
2009-03-17 12:04         ` Martin Schwidefsky
2009-03-17 16:21           ` Matt Mackall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).