[PATCH] [ARM] Do not call flush_cache_user_range with mmap

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
@ 2010-04-28  7:32 Dima Zavin
  2010-04-28  7:35 ` Dima Zavin
                   ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Dima Zavin @ 2010-04-28  7:32 UTC (permalink / raw)
  To: linux-arm-kernel

We can't be holding the mmap_sem while calling flush_cache_user_range
because the flush can fault. If we fault on a user address, the
page fault handler will try to take mmap_sem again. Since both places
acquire the read lock, most of the time it succeeds. However, if another
thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
between the call to flush_cache_user_range and the fault, the down_read
in do_page_fault will deadlock.

Also, since we really can't be holding the mmap_sem while calling
flush_cache_user_range AND vma is actually unused by the flush itself,
get rid of vma as an argument.

Signed-off-by: Dima Zavin <dima@android.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Arve Hj?nnev?g <arve@android.com>
---
 arch/arm/include/asm/cacheflush.h |    2 +-
 arch/arm/kernel/traps.c           |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 0d08d41..b68a2b4 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -336,7 +336,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * Harvard caches are synchronised for the user space address range.
  * This is used for the ARM private sys_cacheflush system call.
  */
-#define flush_cache_user_range(vma,start,end) \
+#define flush_cache_user_range(start,end) \
 	__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
 
 /*
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 1621e53..2455fd3 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -453,7 +453,9 @@ do_cache_op(unsigned long start, unsigned long end, int flags)
 		if (end > vma->vm_end)
 			end = vma->vm_end;
 
-		flush_cache_user_range(vma, start, end);
+		up_read(&mm->mmap_sem);
+		flush_cache_user_range(start, end);
+		return;
 	}
 	up_read(&mm->mmap_sem);
 }
-- 
1.6.6

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-28  7:32 [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Dima Zavin
@ 2010-04-28  7:35 ` Dima Zavin
  2010-04-29 13:00 ` Russell King - ARM Linux
  2010-05-06 15:08 ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Catalin Marinas
  2 siblings, 0 replies; 36+ messages in thread
From: Dima Zavin @ 2010-04-28  7:35 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is probably not as efficient as it could be since the flush
will end up faulting in pages the user never touched. I think we
should print a warning regarding that and skip over the unmapped
pages, I just wasn't sure how to do that correctly.

--Dima

2010/4/28 Dima Zavin <dima@android.com>:
> We can't be holding the mmap_sem while calling flush_cache_user_range
> because the flush can fault. If we fault on a user address, the
> page fault handler will try to take mmap_sem again. Since both places
> acquire the read lock, most of the time it succeeds. However, if another
> thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> between the call to flush_cache_user_range and the fault, the down_read
> in do_page_fault will deadlock.
>
> Also, since we really can't be holding the mmap_sem while calling
> flush_cache_user_range AND vma is actually unused by the flush itself,
> get rid of vma as an argument.
>
> Signed-off-by: Dima Zavin <dima@android.com>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Arve Hj?nnev?g <arve@android.com>
> ---
> ?arch/arm/include/asm/cacheflush.h | ? ?2 +-
> ?arch/arm/kernel/traps.c ? ? ? ? ? | ? ?4 +++-
> ?2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
> index 0d08d41..b68a2b4 100644
> --- a/arch/arm/include/asm/cacheflush.h
> +++ b/arch/arm/include/asm/cacheflush.h
> @@ -336,7 +336,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
> ?* Harvard caches are synchronised for the user space address range.
> ?* This is used for the ARM private sys_cacheflush system call.
> ?*/
> -#define flush_cache_user_range(vma,start,end) \
> +#define flush_cache_user_range(start,end) \
> ? ? ? ?__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
>
> ?/*
> diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
> index 1621e53..2455fd3 100644
> --- a/arch/arm/kernel/traps.c
> +++ b/arch/arm/kernel/traps.c
> @@ -453,7 +453,9 @@ do_cache_op(unsigned long start, unsigned long end, int flags)
> ? ? ? ? ? ? ? ?if (end > vma->vm_end)
> ? ? ? ? ? ? ? ? ? ? ? ?end = vma->vm_end;
>
> - ? ? ? ? ? ? ? flush_cache_user_range(vma, start, end);
> + ? ? ? ? ? ? ? up_read(&mm->mmap_sem);
> + ? ? ? ? ? ? ? flush_cache_user_range(start, end);
> + ? ? ? ? ? ? ? return;
> ? ? ? ?}
> ? ? ? ?up_read(&mm->mmap_sem);
> ?}
> --
> 1.6.6
>
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-28  7:32 [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Dima Zavin
  2010-04-28  7:35 ` Dima Zavin
@ 2010-04-29 13:00 ` Russell King - ARM Linux
  2010-04-29 18:16   ` Jamie Lokier
  2010-05-06 15:08 ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Catalin Marinas
  2 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2010-04-29 13:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 28, 2010 at 12:32:11AM -0700, Dima Zavin wrote:
> We can't be holding the mmap_sem while calling flush_cache_user_range
> because the flush can fault. If we fault on a user address, the
> page fault handler will try to take mmap_sem again. Since both places
> acquire the read lock, most of the time it succeeds. However, if another
> thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> between the call to flush_cache_user_range and the fault, the down_read
> in do_page_fault will deadlock.

That's a non-issue.  If do_cache_op is holding a read lock, _nothing_
_else_ can be holding that lock in write mode.  So, holding the lock in
read mode ensures that when faults occur, the fault handler can be sure
that its read lock will succeed.

Other threads trying to get a write lock will be prevented from doing
so until the entire cache flush operation has completed, which ensures
that the mapping can not be altered via munmap/mmap - and that is
completely desirable behaviour.

Please explain why you think this leads to a deadlock.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-29 13:00 ` Russell King - ARM Linux
@ 2010-04-29 18:16   ` Jamie Lokier
  2010-04-29 18:24     ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Jamie Lokier @ 2010-04-29 18:16 UTC (permalink / raw)
  To: linux-arm-kernel

Russell King - ARM Linux wrote:
> On Wed, Apr 28, 2010 at 12:32:11AM -0700, Dima Zavin wrote:
> > We can't be holding the mmap_sem while calling flush_cache_user_range
> > because the flush can fault. If we fault on a user address, the
> > page fault handler will try to take mmap_sem again. Since both places
> > acquire the read lock, most of the time it succeeds. However, if another
> > thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> > between the call to flush_cache_user_range and the fault, the down_read
> > in do_page_fault will deadlock.
> 
> That's a non-issue.  If do_cache_op is holding a read lock, _nothing_
> _else_ can be holding that lock in write mode.  So, holding the lock in
> read mode ensures that when faults occur, the fault handler can be sure
> that its read lock will succeed.

read-write locks will block a reader when there is a already blocked
writer.  Otherwise the writer can be permanently starved due to new
readers always arriving so reader count doesn't reach zero.

In other words, readers queue behind writers if there's a waiting writer.

-- Jamie

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-29 18:16   ` Jamie Lokier
@ 2010-04-29 18:24     ` Russell King - ARM Linux
  2010-04-29 19:23       ` Dima Zavin
  2010-05-06 15:00       ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld Catalin Marinas
  0 siblings, 2 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2010-04-29 18:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Apr 29, 2010 at 07:16:36PM +0100, Jamie Lokier wrote:
> Russell King - ARM Linux wrote:
> > On Wed, Apr 28, 2010 at 12:32:11AM -0700, Dima Zavin wrote:
> > > We can't be holding the mmap_sem while calling flush_cache_user_range
> > > because the flush can fault. If we fault on a user address, the
> > > page fault handler will try to take mmap_sem again. Since both places
> > > acquire the read lock, most of the time it succeeds. However, if another
> > > thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> > > between the call to flush_cache_user_range and the fault, the down_read
> > > in do_page_fault will deadlock.
> > 
> > That's a non-issue.  If do_cache_op is holding a read lock, _nothing_
> > _else_ can be holding that lock in write mode.  So, holding the lock in
> > read mode ensures that when faults occur, the fault handler can be sure
> > that its read lock will succeed.
> 
> read-write locks will block a reader when there is a already blocked
> writer.  Otherwise the writer can be permanently starved due to new
> readers always arriving so reader count doesn't reach zero.

Hmm, true, and rather unfortunate.

As I've already said, we can not do this cache maintainence outside of
the lock.

The point of this code is to first validate that the region we're working
on is valid.  As soon as we drop the lock, we lose the guarantee that
the region is valid while we operate on it - indeed, the region could be
unmapped and remapped by a different thread.

I think the only reasonable solution is to also walk the page tables
and do the cache handling on a per-page basis, which will make this
interface quite a bit slower - but that's the price we pay for
correctness.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-29 18:24     ` Russell King - ARM Linux
@ 2010-04-29 19:23       ` Dima Zavin
  2010-05-04  4:07         ` Dima Zavin
  2010-05-06 15:00       ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld Catalin Marinas
  1 sibling, 1 reply; 36+ messages in thread
From: Dima Zavin @ 2010-04-29 19:23 UTC (permalink / raw)
  To: linux-arm-kernel

Jamie/Russell,

Thanks for the feedback.

>> read-write locks will block a reader when there is a already blocked
>> writer. ?Otherwise the writer can be permanently starved due to new
>> readers always arriving so reader count doesn't reach zero.

That's precisely the problem.

> Hmm, true, and rather unfortunate.
>
> As I've already said, we can not do this cache maintainence outside of
> the lock.
>
> The point of this code is to first validate that the region we're working
> on is valid. ?As soon as we drop the lock, we lose the guarantee that
> the region is valid while we operate on it - indeed, the region could be
> unmapped and remapped by a different thread.

So what if it was remapped? The worst case scenario in this case is
that we needlessly flush a region of memory, but its not "wrong". It
can't be any worse than just doing a full cache flush. If another
thread unmapped the region, then we should (and will) segfault the
flushing thread, which is the correct behavior IMHO.

--Dima

> I think the only reasonable solution is to also walk the page tables
> and do the cache handling on a per-page basis, which will make this
> interface quite a bit slower - but that's the price we pay for
> correctness.


>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-29 19:23       ` Dima Zavin
@ 2010-05-04  4:07         ` Dima Zavin
  2010-05-04  7:40           ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Dima Zavin @ 2010-05-04  4:07 UTC (permalink / raw)
  To: linux-arm-kernel

>> The point of this code is to first validate that the region we're working
>> on is valid. ?As soon as we drop the lock, we lose the guarantee that
>> the region is valid while we operate on it - indeed, the region could be
>> unmapped and remapped by a different thread.
>
> So what if it was remapped? The worst case scenario in this case is
> that we needlessly flush a region of memory, but its not "wrong". It
> can't be any worse than just doing a full cache flush. If another
> thread unmapped the region, then we should (and will) segfault the
> flushing thread, which is the correct behavior IMHO.

ping.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-05-04  4:07         ` Dima Zavin
@ 2010-05-04  7:40           ` Russell King - ARM Linux
  0 siblings, 0 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2010-05-04  7:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, May 03, 2010 at 09:07:03PM -0700, Dima Zavin wrote:
> >> The point of this code is to first validate that the region we're working
> >> on is valid. ?As soon as we drop the lock, we lose the guarantee that
> >> the region is valid while we operate on it - indeed, the region could be
> >> unmapped and remapped by a different thread.
> >
> > So what if it was remapped? The worst case scenario in this case is
> > that we needlessly flush a region of memory, but its not "wrong". It
> > can't be any worse than just doing a full cache flush. If another
> > thread unmapped the region, then we should (and will) segfault the
> > flushing thread, which is the correct behavior IMHO.
> 
> ping.

I still do not think this is a good idea for the reasons I've already
mentioned.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld
  2010-04-29 18:24     ` Russell King - ARM Linux
  2010-04-29 19:23       ` Dima Zavin
@ 2010-05-06 15:00       ` Catalin Marinas
  2010-05-06 16:01         ` Jamie Lokier
  1 sibling, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2010-05-06 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-04-29 at 19:24 +0100, Russell King - ARM Linux wrote:
> On Thu, Apr 29, 2010 at 07:16:36PM +0100, Jamie Lokier wrote:
> > Russell King - ARM Linux wrote:
> > > On Wed, Apr 28, 2010 at 12:32:11AM -0700, Dima Zavin wrote:
> > > > We can't be holding the mmap_sem while calling flush_cache_user_range
> > > > because the flush can fault. If we fault on a user address, the
> > > > page fault handler will try to take mmap_sem again. Since both places
> > > > acquire the read lock, most of the time it succeeds. However, if another
> > > > thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> > > > between the call to flush_cache_user_range and the fault, the down_read
> > > > in do_page_fault will deadlock.
> > >
> > > That's a non-issue.  If do_cache_op is holding a read lock, _nothing_
> > > _else_ can be holding that lock in write mode.  So, holding the lock in
> > > read mode ensures that when faults occur, the fault handler can be sure
> > > that its read lock will succeed.
> >
> > read-write locks will block a reader when there is a already blocked
> > writer.  Otherwise the writer can be permanently starved due to new
> > readers always arriving so reader count doesn't reach zero.
> 
> Hmm, true, and rather unfortunate.
> 
> As I've already said, we can not do this cache maintainence outside of
> the lock.
> 
> The point of this code is to first validate that the region we're working
> on is valid.  As soon as we drop the lock, we lose the guarantee that
> the region is valid while we operate on it - indeed, the region could be
> unmapped and remapped by a different thread.

The flush_cache_user_range operation cannot actually damage the data. If
the application is so badly written that one of its threads remaps a
page range while another thread writes to it and flushes the caches,
then it deserves the memory corruption.

Personally, I would go even further and remove the find_vma() call (of
course with an access_ok() call to make sure the address isn't a kernel
one). I actually did some tests but the performance improvement was too
significant to be worth arguing the case on the list. But the app I was
using was a simple test where the vma tree was small. Things may be
different for a fully featured Java VM for example.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held
  2010-04-28  7:32 [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Dima Zavin
  2010-04-28  7:35 ` Dima Zavin
  2010-04-29 13:00 ` Russell King - ARM Linux
@ 2010-05-06 15:08 ` Catalin Marinas
  2 siblings, 0 replies; 36+ messages in thread
From: Catalin Marinas @ 2010-05-06 15:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2010-04-28 at 08:32 +0100, Dima Zavin wrote:
> We can't be holding the mmap_sem while calling flush_cache_user_range
> because the flush can fault. If we fault on a user address, the
> page fault handler will try to take mmap_sem again. Since both places
> acquire the read lock, most of the time it succeeds. However, if another
> thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> between the call to flush_cache_user_range and the fault, the down_read
> in do_page_fault will deadlock.
> 
> Also, since we really can't be holding the mmap_sem while calling
> flush_cache_user_range AND vma is actually unused by the flush itself,
> get rid of vma as an argument.
> 
> Signed-off-by: Dima Zavin <dima@android.com>
> Cc: Russell King <rmk+kernel@arm.linux.org.uk>
> Cc: Arve Hj?nnev?g <arve@android.com>

FWIW (and since I added the USER() macros to the *_coherent_user_range()
functions - commit 32cfb1b1), my view is that this is a safe patch (of
course, Russell has to agree with this as well):

Acked-by: Catalin Marinas <catalin.marinas@arm.com>

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld
  2010-05-06 15:00       ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld Catalin Marinas
@ 2010-05-06 16:01         ` Jamie Lokier
  2010-05-06 16:07           ` Jamie Lokier
  2010-05-06 16:21           ` Catalin Marinas
  0 siblings, 2 replies; 36+ messages in thread
From: Jamie Lokier @ 2010-05-06 16:01 UTC (permalink / raw)
  To: linux-arm-kernel

Catalin Marinas wrote:
> The flush_cache_user_range operation cannot actually damage the data. If
> the application is so badly written that one of its threads remaps a
> page range while another thread writes to it and flushes the caches,
> then it deserves the memory corruption.

It may deserve corruption, but doing corruption silently is cruel.

Moreover, calling mprotect(PROT_READ) in one thread while another
thread is writing to the same regions is a valid, and used, garbage
collector dirty-tracking technique.  (Page faults provide the info,
and the fault handler uses PROT_WRITE to let the faulting thread
continue on each tracked page.)

Is it possible to percolate EFAULT to the right places when the cache
flush faults?

> Personally, I would go even further and remove the find_vma() call (of
> course with an access_ok() call to make sure the address isn't a kernel
> one). I actually did some tests but the performance improvement was too
> significant to be worth arguing the case on the list. But the app I was
> using was a simple test where the vma tree was small. Things may be
> different for a fully featured Java VM for example.

Seems a reasonable thing to do, and a fully-featured use-lots-of-VMAs
app sounds like the sort of app which wants to flush part of caches
quickly.

-- Jamie

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld
  2010-05-06 16:01         ` Jamie Lokier
@ 2010-05-06 16:07           ` Jamie Lokier
  2010-05-06 16:24             ` Catalin Marinas
  2010-05-06 16:21           ` Catalin Marinas
  1 sibling, 1 reply; 36+ messages in thread
From: Jamie Lokier @ 2010-05-06 16:07 UTC (permalink / raw)
  To: linux-arm-kernel

Jamie Lokier wrote:
> Catalin Marinas wrote:
> > The flush_cache_user_range operation cannot actually damage the data. If
> > the application is so badly written that one of its threads remaps a
> > page range while another thread writes to it and flushes the caches,
> > then it deserves the memory corruption.
> 
> It may deserve corruption, but doing corruption silently is cruel.
> 
> Moreover, calling mprotect(PROT_READ) in one thread while another
> thread is writing to the same regions is a valid, and used, garbage
> collector dirty-tracking technique.  (Page faults provide the info,
> and the fault handler uses PROT_WRITE to let the faulting thread
> continue on each tracked page.)
> 
> Is it possible to percolate EFAULT to the right places when the cache
> flush faults?

Scratch that idea.  How about just doing a full mm
(address-independent) cache flush if a fault occurs?

Performance doesn't matter (it's not a normal situation), and it's
better than corruption (no week-long debugging session surprise).

-- Jamie

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld
  2010-05-06 16:01         ` Jamie Lokier
  2010-05-06 16:07           ` Jamie Lokier
@ 2010-05-06 16:21           ` Catalin Marinas
  1 sibling, 0 replies; 36+ messages in thread
From: Catalin Marinas @ 2010-05-06 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-05-06 at 17:01 +0100, Jamie Lokier wrote:
> Catalin Marinas wrote:
> > The flush_cache_user_range operation cannot actually damage the data. If
> > the application is so badly written that one of its threads remaps a
> > page range while another thread writes to it and flushes the caches,
> > then it deserves the memory corruption.
> 
> It may deserve corruption, but doing corruption silently is cruel.
> 
> Moreover, calling mprotect(PROT_READ) in one thread while another
> thread is writing to the same regions is a valid, and used, garbage
> collector dirty-tracking technique.  (Page faults provide the info,
> and the fault handler uses PROT_WRITE to let the faulting thread
> continue on each tracked page.)

For this kind of situation, mprotect() could make the page not
accessible for a brief period of time (but with the mm semaphore held).
A coherent_user_range call would fault and do_page_fault() handles it
correctly by restarting the cache line operation once the PTE is
reinstated. So I don't think the app would miss any cache flushing
operation.

> Is it possible to percolate EFAULT to the right places when the cache
> flush faults?

If the cache flush faults because the vma is no longer there (though a
few cycles before there was one), the kernel calls __do_kernel_fault()
which tries to fix up the exception by calling the 9001 label in
cache-v6/v7.S. This label just tries the following page ignoring the
cache flushing operation.

That's a place where we can return -EFAULT from
v6/v7_coherent_user_range and change do_cache_op() to make use of this
value.

But that's a special case when one thread of an application unmaps an
address range while another thread writes to it.

> > Personally, I would go even further and remove the find_vma() call (of
> > course with an access_ok() call to make sure the address isn't a kernel
> > one). I actually did some tests but the performance improvement was too

BTW, I meant "was not too" above.

> > significant to be worth arguing the case on the list. But the app I was
> > using was a simple test where the vma tree was small. Things may be
> > different for a fully featured Java VM for example.
> 
> Seems a reasonable thing to do, and a fully-featured use-lots-of-VMAs
> app sounds like the sort of app which wants to flush part of caches
> quickly.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld
  2010-05-06 16:07           ` Jamie Lokier
@ 2010-05-06 16:24             ` Catalin Marinas
  0 siblings, 0 replies; 36+ messages in thread
From: Catalin Marinas @ 2010-05-06 16:24 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2010-05-06 at 17:07 +0100, Jamie Lokier wrote:
> Jamie Lokier wrote:
> > Catalin Marinas wrote:
> > > The flush_cache_user_range operation cannot actually damage the data. If
> > > the application is so badly written that one of its threads remaps a
> > > page range while another thread writes to it and flushes the caches,
> > > then it deserves the memory corruption.
> >
> > It may deserve corruption, but doing corruption silently is cruel.
> >
> > Moreover, calling mprotect(PROT_READ) in one thread while another
> > thread is writing to the same regions is a valid, and used, garbage
> > collector dirty-tracking technique.  (Page faults provide the info,
> > and the fault handler uses PROT_WRITE to let the faulting thread
> > continue on each tracked page.)
> >
> > Is it possible to percolate EFAULT to the right places when the cache
> > flush faults?
> 
> Scratch that idea.  How about just doing a full mm
> (address-independent) cache flush if a fault occurs?
> 
> Performance doesn't matter (it's not a normal situation), and it's
> better than corruption (no week-long debugging session surprise).

I already replied on this specific case. I don't think we get any
corruption with the current implementation (and coherent_user_range
called outside the mm semaphore).

What if we had user-accessible cache flushing operations (well, a subset
of them)? We wouldn't have been able to take any semaphore and I'm sure
JIT people would have made use of them.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
@ 2011-11-07 17:33 Catalin Marinas
  2011-11-16 21:23 ` Olof Johansson
  0 siblings, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2011-11-07 17:33 UTC (permalink / raw)
  To: linux-arm-kernel

From: Dima Zavin <dima@android.com>

We can't be holding the mmap_sem while calling flush_cache_user_range
because the flush can fault. If we fault on a user address, the
page fault handler will try to take mmap_sem again. Since both places
acquire the read lock, most of the time it succeeds. However, if another
thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
between the call to flush_cache_user_range and the fault, the down_read
in do_page_fault will deadlock.

Also, since we really can't be holding the mmap_sem while calling
flush_cache_user_range AND vma is actually unused by the flush itself,
get rid of vma as an argument.

Signed-off-by: Dima Zavin <dima@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
---

Russell, this patch has been around for a while but still no decision
made, so I'm reposting it. It solves a real bug in the ARM kernels and
I'd like to be considered for upstream.

If you think there is some cache flushing race that this semaphore is
protecting against, the ARMv8 allows (some) user-space cache maintenance
and that's no different from running the cache flush outside the
mmap_sem.

Thanks.

 arch/arm/include/asm/cacheflush.h |    2 +-
 arch/arm/kernel/traps.c           |    4 +++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index d5d8d5c..1252a26 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -249,7 +249,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * Harvard caches are synchronised for the user space address range.
  * This is used for the ARM private sys_cacheflush system call.
  */
-#define flush_cache_user_range(vma,start,end) \
+#define flush_cache_user_range(start,end) \
 	__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))

 /*
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index e8fd69e..d9b59d0 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -466,7 +466,9 @@ do_cache_op(unsigned long start, unsigned long end, int flags)
 		if (end > vma->vm_end)
 			end = vma->vm_end;

-		flush_cache_user_range(vma, start, end);
+		up_read(&mm->mmap_sem);
+		flush_cache_user_range(start, end);
+		return;
 	}
 	up_read(&mm->mmap_sem);
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-07 17:33 [PATCH] ARM: " Catalin Marinas
@ 2011-11-16 21:23 ` Olof Johansson
  2011-11-16 23:50   ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Olof Johansson @ 2011-11-16 21:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Nov 7, 2011 at 9:33 AM, Catalin Marinas <catalin.marinas@arm.com> wrote:
> From: Dima Zavin <dima@android.com>
>
> We can't be holding the mmap_sem while calling flush_cache_user_range
> because the flush can fault. If we fault on a user address, the
> page fault handler will try to take mmap_sem again. Since both places
> acquire the read lock, most of the time it succeeds. However, if another
> thread tries to acquire the write lock on the mmap_sem (e.g. mmap) in
> between the call to flush_cache_user_range and the fault, the down_read
> in do_page_fault will deadlock.
>
> Also, since we really can't be holding the mmap_sem while calling
> flush_cache_user_range AND vma is actually unused by the flush itself,
> get rid of vma as an argument.

I've spent some time recently debugging lockups exactly because of this issue.

The case I have here is that one Chrome thread will call cacheflush
(v8 jit, most likely -- reason is unimportant), while another happens
to call brk(). The down_write() in sys_brk races with the first
down_read in do_cache_op and the one in do_page_fault.

The stacks of the race are:

[ 1715.420401] exe             D c0441894     0  2950   2849 0x00000000
[ 1715.426896] [<c0441894>] (__schedule+0x768/0x83c) from [<c0441d08>]
(schedule+0x80/0x84)
[ 1715.435081] [<c0441d08>] (schedule+0x80/0x84) from [<c0443a40>]
(__down_read+0xc8/0xe0)
[ 1715.443161] [<c0443a40>] (__down_read+0xc8/0xe0) from [<c04430e8>]
(down_read+0x34/0x3c)
[ 1715.451360] [<c04430e8>] (down_read+0x34/0x3c) from [<c00565cc>]
(do_page_fault+0xd0/0x2e4)
[ 1715.459816] [<c00565cc>] (do_page_fault+0xd0/0x2e4) from
[<c0046174>] (do_DataAbort+0x44/0xa8)
[ 1715.468503] [<c0046174>] (do_DataAbort+0x44/0xa8) from [<c004b96c>]
(__dabt_svc+0x4c/0x60)
[ 1715.476805] Exception stack(0xedbf5ea8 to 0xedbf5ef0)
[ 1715.481902] 5ea0:                   55308000 55400000 00000020
0000001f 00000001 e62fae44
[ 1715.490138] 5ec0: 55308080 55400fef edbf4000 e62fae00 00000000
edbf5fa4 5530a000 edbf5ef0
[ 1715.498359] 5ee0: c005009c c0058f18 80000013 ffffffff
[ 1715.503486] [<c004b96c>] (__dabt_svc+0x4c/0x60) from [<c0058f18>]
(v7_coherent_kern_range+0x1c/0x7c)
[ 1715.512689] [<c0058f18>] (v7_coherent_kern_range+0x1c/0x7c) from
[<c005009c>] (arm_syscall+0x190/0x288)
[ 1715.522173] [<c005009c>] (arm_syscall+0x190/0x288) from
[<c004be00>] (ret_fast_syscall+0x0/0x30)
[ 1715.531009] exe             D c0441894     0  2952   2849 0x00000000
[ 1715.537502] [<c0441894>] (__schedule+0x768/0x83c) from [<c0441d08>]
(schedule+0x80/0x84)
[ 1715.545668] [<c0441d08>] (schedule+0x80/0x84) from [<c0443b1c>]
(__down_write_nested+0xc4/0xdc)
[ 1715.554464] [<c0443b1c>] (__down_write_nested+0xc4/0xdc) from
[<c0443b48>] (__down_write+0x14/0x18)
[ 1715.563609] [<c0443b48>] (__down_write+0x14/0x18) from [<c04430ac>]
(down_write+0x34/0x3c)
[ 1715.571964] [<c04430ac>] (down_write+0x34/0x3c) from [<c01132f8>]
(sys_brk+0x38/0x120)
[ 1715.579990] [<c01132f8>] (sys_brk+0x38/0x120) from [<c004be00>]
(ret_fast_syscall+0x0/0x30)

> Signed-off-by: Dima Zavin <dima@android.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

Acked-by: Olof Johansson <olof@lixom.net>

> Russell, this patch has been around for a while but still no decision
> made, so I'm reposting it. It solves a real bug in the ARM kernels and
> I'd like to be considered for upstream.

Agreed. Russell, please consider picking this up -- the bug is very
real and it sounds like the objection is vague.

An alternative solution could be to iterate over the flushed range and
only flush available pages, but that would add quite a bit of overhead
to this code path.

If needed, I should be able to provide a standalone testcase to
trigger it, but I can reproduce at will just running Chrome as well.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-16 21:23 ` Olof Johansson
@ 2011-11-16 23:50   ` Russell King - ARM Linux
  2011-11-17  0:16     ` Olof Johansson
  2011-11-17 10:22     ` Catalin Marinas
  0 siblings, 2 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2011-11-16 23:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 16, 2011 at 01:23:02PM -0800, Olof Johansson wrote:
> Agreed. Russell, please consider picking this up -- the bug is very
> real and it sounds like the objection is vague.

No, it isn't.  It's creating an unsafe situation.  If we're going to do
this, we might as well give up on architecture correctness because we're
throwing out locking correctness.

1. We look up the VMA.
2. We pass the VMA to the cache operation.
3. The cache operation dereferences the VMA to obtain the VMA flags.

If something else happens to modify the VMA (eg, a concurrent mremap or
munmap) then without locking, we could end up dereferencing something
that is no longer the VMA we thought we had.

Plus, consider that the VMA list is modifyable while the mmap_sem is not
held - do we really want to see the effects of having find_vma() being
preempted having loaded a vm_next pointer, another thread coming in,
modifying the VMA list (possibly deleting the next vma), that memory being
reallocated to something else, and then we resume and fall off the VMA
list and possibly OOPS because we've accessed what we thought was the
next vm_next pointer but it wasn't?

While we can say "an application should not do that" that's not a reason
to ignore this problem and just delete all the locking in the hope that
we'll always have well behaving applications.

So, it's *totally* unsafe to delete the locking here.

The requirement is that the mmap_sem must be held while you expect to
hold any reference to a vm_area_struct.  There's no exceptions to that.

One solution to this is to use pagefault_disable()...pagefault_enable()
around the cache operation, so that faults just cause us to skip to the
next page.  The down-side to that is that pagefault_disable() makes us
non-preemptible while performing the cache operation.  However, we can
work around that by performing the cache handling in appropriately sized
chunks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-16 23:50   ` Russell King - ARM Linux
@ 2011-11-17  0:16     ` Olof Johansson
  2011-11-17  0:20       ` Olof Johansson
                         ` (2 more replies)
  2011-11-17 10:22     ` Catalin Marinas
  1 sibling, 3 replies; 36+ messages in thread
From: Olof Johansson @ 2011-11-17  0:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Wed, Nov 16, 2011 at 3:50 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Nov 16, 2011 at 01:23:02PM -0800, Olof Johansson wrote:
>> Agreed. Russell, please consider picking this up -- the bug is very
>> real and it sounds like the objection is vague.
>
> No, it isn't. ?It's creating an unsafe situation. ?If we're going to do
> this, we might as well give up on architecture correctness because we're
> throwing out locking correctness.
>
> 1. We look up the VMA.
> 2. We pass the VMA to the cache operation.
> 3. The cache operation dereferences the VMA to obtain the VMA flags.

The current implementation doesn't use the flush_cache_user_range
per-cpu function, it uses the coherent_user_page:

from arch/arm/include/asm/cacheflush.h:

#define flush_cache_user_range(vma,start,end) \
    __cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))

Which doesn't take flags. So the vma isn't used at all at the flushing
step (which is also why this patch removes the flags being passed in).

> If something else happens to modify the VMA (eg, a concurrent mremap or
> munmap) then without locking, we could end up dereferencing something
> that is no longer the VMA we thought we had.
>
> Plus, consider that the VMA list is modifyable while the mmap_sem is not
> held - do we really want to see the effects of having find_vma() being
> preempted having loaded a vm_next pointer, another thread coming in,
> modifying the VMA list (possibly deleting the next vma), that memory being
> reallocated to something else, and then we resume and fall off the VMA
> list and possibly OOPS because we've accessed what we thought was the
> next vm_next pointer but it wasn't?
>
> While we can say "an application should not do that" that's not a reason
> to ignore this problem and just delete all the locking in the hope that
> we'll always have well behaving applications.
>
> So, it's *totally* unsafe to delete the locking here.
>
> The requirement is that the mmap_sem must be held while you expect to
> hold any reference to a vm_area_struct. ?There's no exceptions to that.

Yep, I would agree if we actually needed the vma for the flags, but we
don't with the current implementation.

We also don't continue iterating over the vmas, we only flush for the
first one in the range that we find. That is possibly a bug.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17  0:16     ` Olof Johansson
@ 2011-11-17  0:20       ` Olof Johansson
  2011-11-17 10:26       ` Catalin Marinas
  2011-11-17 10:45       ` Russell King - ARM Linux
  2 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2011-11-17  0:20 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 16, 2011 at 4:16 PM, Olof Johansson <olof@lixom.net> wrote:
> Hi,
>
> On Wed, Nov 16, 2011 at 3:50 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Wed, Nov 16, 2011 at 01:23:02PM -0800, Olof Johansson wrote:
>>> Agreed. Russell, please consider picking this up -- the bug is very
>>> real and it sounds like the objection is vague.
>>
>> No, it isn't. ?It's creating an unsafe situation. ?If we're going to do
>> this, we might as well give up on architecture correctness because we're
>> throwing out locking correctness.
>>
>> 1. We look up the VMA.
>> 2. We pass the VMA to the cache operation.
>> 3. The cache operation dereferences the VMA to obtain the VMA flags.
>
> The current implementation doesn't use the flush_cache_user_range
> per-cpu function, it uses the coherent_user_page:
>
> from arch/arm/include/asm/cacheflush.h:
>
> #define flush_cache_user_range(vma,start,end) \
> ? ?__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
>
> Which doesn't take flags. So the vma isn't used at all at the flushing
> step (which is also why this patch removes the flags being passed in).

I wonder if I am getting dyslexic.

s/the coherent_user_page/coherent_user_range/ above, and s/removes the
flags/removes the vma from/.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-16 23:50   ` Russell King - ARM Linux
  2011-11-17  0:16     ` Olof Johansson
@ 2011-11-17 10:22     ` Catalin Marinas
  2011-11-17 10:42       ` Russell King - ARM Linux
  1 sibling, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2011-11-17 10:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 16, 2011 at 11:50:24PM +0000, Russell King - ARM Linux wrote:
> On Wed, Nov 16, 2011 at 01:23:02PM -0800, Olof Johansson wrote:
> > Agreed. Russell, please consider picking this up -- the bug is very
> > real and it sounds like the objection is vague.
> 
> No, it isn't.  It's creating an unsafe situation.  If we're going to do
> this, we might as well give up on architecture correctness because we're
> throwing out locking correctness.
> 
> 1. We look up the VMA.
> 2. We pass the VMA to the cache operation.
> 3. The cache operation dereferences the VMA to obtain the VMA flags.

Not anymore - if you look at the patch, flush_cache_user_range() no
longer gets the vma. We just use the vma to adjust the start/end and
that's done with the mmap_sem down. I won't comment on the rest of your
reply since it's based on the assumption that we pass vma to
flush_cache_user_range().


BTW, we could even go a step further an remove the vma checks entirely,
just use access_ok() since __cpuc_coherent_user_range() can handle
unmapped ranges properly (though it may introduce some latency if some
user app passes a 3G range but we can change the fixup code to abort the
operation when it gets a fault that can't be fixed up).


diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index d5d8d5c..1252a26 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -249,7 +249,7 @@ extern void flush_cache_page(struct vm_area_struct *vma, unsigned long user_addr
  * Harvard caches are synchronised for the user space address range.
  * This is used for the ARM private sys_cacheflush system call.
  */
-#define flush_cache_user_range(vma,start,end) \
+#define flush_cache_user_range(start,end) \
 	__cpuc_coherent_user_range((start) & PAGE_MASK, PAGE_ALIGN(end))
 
 /*
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 99a5727..b215e39 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -474,23 +474,12 @@ static int bad_syscall(int n, struct pt_regs *regs)
 static inline void
 do_cache_op(unsigned long start, unsigned long end, int flags)
 {
-	struct mm_struct *mm = current->active_mm;
-	struct vm_area_struct *vma;
-
 	if (end < start || flags)
 		return;
+	if (!access_ok(VERIFY_WRITE, start, end - start))
+		return;
 
-	down_read(&mm->mmap_sem);
-	vma = find_vma(mm, start);
-	if (vma && vma->vm_start < end) {
-		if (start < vma->vm_start)
-			start = vma->vm_start;
-		if (end > vma->vm_end)
-			end = vma->vm_end;
-
-		flush_cache_user_range(vma, start, end);
-	}
-	up_read(&mm->mmap_sem);
+	flush_cache_user_range(start, end);
 }
 
 /*
@@ -816,6 +805,6 @@ void __init early_trap_init(void)
 	memcpy((void *)(vectors + KERN_RESTART_CODE - CONFIG_VECTORS_BASE),
 	       syscall_restart_code, sizeof(syscall_restart_code));
 
-	flush_icache_range(vectors, vectors + PAGE_SIZE);
+	flush_icache_range(CONFIG_VECTORS_BASE, CONFIG_VECTORS_BASE+PAGE_SIZE);
 	modify_domain(DOMAIN_USER, DOMAIN_CLIENT);
 }

-- 
Catalin

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17  0:16     ` Olof Johansson
  2011-11-17  0:20       ` Olof Johansson
@ 2011-11-17 10:26       ` Catalin Marinas
  2011-11-17 10:49         ` Russell King - ARM Linux
  2011-11-17 10:45       ` Russell King - ARM Linux
  2 siblings, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2011-11-17 10:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 12:16:17AM +0000, Olof Johansson wrote:
> On Wed, Nov 16, 2011 at 3:50 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Wed, Nov 16, 2011 at 01:23:02PM -0800, Olof Johansson wrote:
> >> Agreed. Russell, please consider picking this up -- the bug is very
> >> real and it sounds like the objection is vague.
> >
> > No, it isn't. ?It's creating an unsafe situation. ?If we're going to do
> > this, we might as well give up on architecture correctness because we're
> > throwing out locking correctness.
> >
> > 1. We look up the VMA.
> > 2. We pass the VMA to the cache operation.
> > 3. The cache operation dereferences the VMA to obtain the VMA flags.
...
> We also don't continue iterating over the vmas, we only flush for the
> first one in the range that we find. That is possibly a bug.

I don't remember the details but couple of years ago someone working on
Java in ARM pointed out that a cache operation spanning two vmas only
flushes a single one. I think the JIT was modified since as I haven't
heard back but I would be more in favour of just dropping the vma
checks altogether.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 10:22     ` Catalin Marinas
@ 2011-11-17 10:42       ` Russell King - ARM Linux
  2011-11-17 10:59         ` Catalin Marinas
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2011-11-17 10:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
> BTW, we could even go a step further an remove the vma checks entirely,
> just use access_ok() since __cpuc_coherent_user_range() can handle
> unmapped ranges properly (though it may introduce some latency if some
> user app passes a 3G range but we can change the fixup code to abort the
> operation when it gets a fault that can't be fixed up).

So, do you think that it is acceptable to be able to pass into this from
userspace the arguments '0', '~0', '0' and have the kernel spin over the
entire 4G space, including IO space on any of the supported architectures.

Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
whether or not there's a page present.

Note that this can starve other threads in the system from running.  That's
a great local DoS attack possible from any priviledge level.

That's why the VMA checks were added 9 years ago.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17  0:16     ` Olof Johansson
  2011-11-17  0:20       ` Olof Johansson
  2011-11-17 10:26       ` Catalin Marinas
@ 2011-11-17 10:45       ` Russell King - ARM Linux
  2011-11-20 17:54         ` Olof Johansson
  2 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2011-11-17 10:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Nov 16, 2011 at 04:16:17PM -0800, Olof Johansson wrote:
> Yep, I would agree if we actually needed the vma for the flags, but we
> don't with the current implementation.

That's just one reason.  I mentioned more than one reason why the locking
was necessary in my mail but you seem to have ignored that.  The second
reason is a far more serious problem.

> We also don't continue iterating over the vmas, we only flush for the
> first one in the range that we find. That is possibly a bug.

No, that's intentional.  This function is not supposed to be used to
invalidate across several mappings.  It's there to deal with JIT code,
which will be written into some allocated memory.  It is extremely
unlikely that JIT code will be written into two consecutive, independently
created mappings and a single cache operation requested over both.

What if the two independent mappings are located at wildly different
addresses?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 10:26       ` Catalin Marinas
@ 2011-11-17 10:49         ` Russell King - ARM Linux
  0 siblings, 0 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2011-11-17 10:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 10:26:23AM +0000, Catalin Marinas wrote:
> I don't remember the details but couple of years ago someone working on
> Java in ARM pointed out that a cache operation spanning two vmas only
> flushes a single one. I think the JIT was modified since as I haven't
> heard back but I would be more in favour of just dropping the vma
> checks altogether.

To put this bluntly, if you still think that after my previous reply
to you, then you shouldn't be touching the kernel because you don't
understand the effects of the changes you're proposing - and that's
already making the allowance that you should _already_ had thought
about the security aspects of what you were suggesting _before_ posting
such a buggered suggestion.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 10:42       ` Russell King - ARM Linux
@ 2011-11-17 10:59         ` Catalin Marinas
  2011-11-17 11:03           ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2011-11-17 10:59 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 10:42:46AM +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
> > BTW, we could even go a step further an remove the vma checks entirely,
> > just use access_ok() since __cpuc_coherent_user_range() can handle
> > unmapped ranges properly (though it may introduce some latency if some
> > user app passes a 3G range but we can change the fixup code to abort the
> > operation when it gets a fault that can't be fixed up).
> 
> So, do you think that it is acceptable to be able to pass into this from
> userspace the arguments '0', '~0', '0' and have the kernel spin over the
> entire 4G space, including IO space on any of the supported architectures.

We have access_ok() to check for user only space.

> Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
> whether or not there's a page present.

You are right, the pre-v6 hardware would not trigger a fault. So if we
don't want another #ifdef in this function, we just keep the vma checks
for all architecture versions.

But I consider that the original patch is still valid.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 10:59         ` Catalin Marinas
@ 2011-11-17 11:03           ` Russell King - ARM Linux
  2011-11-17 11:25             ` Catalin Marinas
  0 siblings, 1 reply; 36+ messages in thread
From: Russell King - ARM Linux @ 2011-11-17 11:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 10:59:36AM +0000, Catalin Marinas wrote:
> On Thu, Nov 17, 2011 at 10:42:46AM +0000, Russell King - ARM Linux wrote:
> > On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
> > > BTW, we could even go a step further an remove the vma checks entirely,
> > > just use access_ok() since __cpuc_coherent_user_range() can handle
> > > unmapped ranges properly (though it may introduce some latency if some
> > > user app passes a 3G range but we can change the fixup code to abort the
> > > operation when it gets a fault that can't be fixed up).
> > 
> > So, do you think that it is acceptable to be able to pass into this from
> > userspace the arguments '0', '~0', '0' and have the kernel spin over the
> > entire 4G space, including IO space on any of the supported architectures.
> 
> We have access_ok() to check for user only space.
> 
> > Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
> > whether or not there's a page present.
> 
> You are right, the pre-v6 hardware would not trigger a fault. So if we
> don't want another #ifdef in this function, we just keep the vma checks
> for all architecture versions.
> 
> But I consider that the original patch is still valid.

You thinks it's safe to walk the vma list with no locks held?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 11:03           ` Russell King - ARM Linux
@ 2011-11-17 11:25             ` Catalin Marinas
  2012-04-09  5:58               ` Dirk Behme
  0 siblings, 1 reply; 36+ messages in thread
From: Catalin Marinas @ 2011-11-17 11:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Nov 17, 2011 at 11:03:39AM +0000, Russell King - ARM Linux wrote:
> On Thu, Nov 17, 2011 at 10:59:36AM +0000, Catalin Marinas wrote:
> > On Thu, Nov 17, 2011 at 10:42:46AM +0000, Russell King - ARM Linux wrote:
> > > On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
> > > > BTW, we could even go a step further an remove the vma checks entirely,
> > > > just use access_ok() since __cpuc_coherent_user_range() can handle
> > > > unmapped ranges properly (though it may introduce some latency if some
> > > > user app passes a 3G range but we can change the fixup code to abort the
> > > > operation when it gets a fault that can't be fixed up).
> > > 
> > > So, do you think that it is acceptable to be able to pass into this from
> > > userspace the arguments '0', '~0', '0' and have the kernel spin over the
> > > entire 4G space, including IO space on any of the supported architectures.
> > 
> > We have access_ok() to check for user only space.
> > 
> > > Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
> > > whether or not there's a page present.
> > 
> > You are right, the pre-v6 hardware would not trigger a fault. So if we
> > don't want another #ifdef in this function, we just keep the vma checks
> > for all architecture versions.
> > 
> > But I consider that the original patch is still valid.
> 
> You thinks it's safe to walk the vma list with no locks held?

No, I don't, but the patch only releases the semaphore when calling the
cache flushing function. The vmas are walked with the semaphore held.

Please read the patch I posted properly.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 10:45       ` Russell King - ARM Linux
@ 2011-11-20 17:54         ` Olof Johansson
  0 siblings, 0 replies; 36+ messages in thread
From: Olof Johansson @ 2011-11-20 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Thu, Nov 17, 2011 at 2:45 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Wed, Nov 16, 2011 at 04:16:17PM -0800, Olof Johansson wrote:
>> Yep, I would agree if we actually needed the vma for the flags, but we
>> don't with the current implementation.
>
> That's just one reason. ?I mentioned more than one reason why the locking
> was necessary in my mail but you seem to have ignored that. ?The second
> reason is a far more serious problem.

Just to make sure we're talking about the same thing here: With 'the
second reason' you mean the fact that we will fault if the cache flush
is done on an address that has since (for example) been unmapped?

Since the flush ops are in the exception tables, the page fault code
should be handling this properly and deliver the fault up to the
process, shouldn't it? The one remaining piece for this to work is to
make sure that the exception table annotations are in place in all
cache flush implementations, and it seems like they are only there in
v6 and v7. I'll be happy to add them to the others.

Address space changes done such that they race with the cache flush
ops will cause problems for the application. However, this is not
unlike the architectures where cache flushes can be done entirely in
userspace, for those the same code will cause problems as well. We
just let the application developers deal with it there, and they don't
seem to complain.

>> We also don't continue iterating over the vmas, we only flush for the
>> first one in the range that we find. That is possibly a bug.
>
> No, that's intentional. ?This function is not supposed to be used to
> invalidate across several mappings. ?It's there to deal with JIT code,
> which will be written into some allocated memory. ?It is extremely
> unlikely that JIT code will be written into two consecutive, independently
> created mappings and a single cache operation requested over both.
>
> What if the two independent mappings are located at wildly different
> addresses?

I was more concerned with the case where a VMA has split because of
page protection changes, for example. But if no application developers
are complaining about this then I don't see a reason for us to argue
over it -- let's leave those aspects of the code alone at this time --
having the sanity check of the VMA and the range of it is a good idea
overall.

-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2011-11-17 11:25             ` Catalin Marinas
@ 2012-04-09  5:58               ` Dirk Behme
  2012-04-09 14:24                 ` Olof Johansson
  0 siblings, 1 reply; 36+ messages in thread
From: Dirk Behme @ 2012-04-09  5:58 UTC (permalink / raw)
  To: linux-arm-kernel

On 17.11.2011 12:25, Catalin Marinas wrote:
> On Thu, Nov 17, 2011 at 11:03:39AM +0000, Russell King - ARM Linux wrote:
>> On Thu, Nov 17, 2011 at 10:59:36AM +0000, Catalin Marinas wrote:
>>> On Thu, Nov 17, 2011 at 10:42:46AM +0000, Russell King - ARM Linux wrote:
>>>> On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
>>>>> BTW, we could even go a step further an remove the vma checks entirely,
>>>>> just use access_ok() since __cpuc_coherent_user_range() can handle
>>>>> unmapped ranges properly (though it may introduce some latency if some
>>>>> user app passes a 3G range but we can change the fixup code to abort the
>>>>> operation when it gets a fault that can't be fixed up).
>>>>
>>>> So, do you think that it is acceptable to be able to pass into this from
>>>> userspace the arguments '0', '~0', '0' and have the kernel spin over the
>>>> entire 4G space, including IO space on any of the supported architectures.
>>>
>>> We have access_ok() to check for user only space.
>>>
>>>> Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
>>>> whether or not there's a page present.
>>>
>>> You are right, the pre-v6 hardware would not trigger a fault. So if we
>>> don't want another #ifdef in this function, we just keep the vma checks
>>> for all architecture versions.
>>>
>>> But I consider that the original patch is still valid.
>>
>> You thinks it's safe to walk the vma list with no locks held?
>
> No, I don't, but the patch only releases the semaphore when calling the
> cache flushing function. The vmas are walked with the semaphore held.
>
> Please read the patch I posted properly.

In an other thread

http://lists.arm.linux.org.uk/lurker/message/20120406.033509.ca9fe8cf.en.html

it seems that we have an additional user where this patch fixed an issue.

So would it be possible to talk about this patch, again?

Many thanks

Dirk

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-09  5:58               ` Dirk Behme
@ 2012-04-09 14:24                 ` Olof Johansson
  2012-04-10 17:17                   ` Will Deacon
  2012-04-18  8:40                   ` Catalin Marinas
  0 siblings, 2 replies; 36+ messages in thread
From: Olof Johansson @ 2012-04-09 14:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

On Sun, Apr 8, 2012 at 10:58 PM, Dirk Behme <dirk.behme@googlemail.com> wrote:
> On 17.11.2011 12:25, Catalin Marinas wrote:
>>
>> On Thu, Nov 17, 2011 at 11:03:39AM +0000, Russell King - ARM Linux wrote:
>>>
>>> On Thu, Nov 17, 2011 at 10:59:36AM +0000, Catalin Marinas wrote:
>>>>
>>>> On Thu, Nov 17, 2011 at 10:42:46AM +0000, Russell King - ARM Linux
>>>> wrote:
>>>>>
>>>>> On Thu, Nov 17, 2011 at 10:22:05AM +0000, Catalin Marinas wrote:
>>>>>>
>>>>>> BTW, we could even go a step further an remove the vma checks
>>>>>> entirely,
>>>>>> just use access_ok() since __cpuc_coherent_user_range() can handle
>>>>>> unmapped ranges properly (though it may introduce some latency if some
>>>>>> user app passes a 3G range but we can change the fixup code to abort
>>>>>> the
>>>>>> operation when it gets a fault that can't be fixed up).
>>>>>
>>>>>
>>>>> So, do you think that it is acceptable to be able to pass into this
>>>>> from
>>>>> userspace the arguments '0', '~0', '0' and have the kernel spin over
>>>>> the
>>>>> entire 4G space, including IO space on any of the supported
>>>>> architectures.
>>>>
>>>>
>>>> We have access_ok() to check for user only space.
>>>>
>>>>> Note that pre-ARMv6 CPUs will spin over that range in 32-byte steps
>>>>> whether or not there's a page present.
>>>>
>>>>
>>>> You are right, the pre-v6 hardware would not trigger a fault. So if we
>>>> don't want another #ifdef in this function, we just keep the vma checks
>>>> for all architecture versions.
>>>>
>>>> But I consider that the original patch is still valid.
>>>
>>>
>>> You thinks it's safe to walk the vma list with no locks held?
>>
>>
>> No, I don't, but the patch only releases the semaphore when calling the
>> cache flushing function. The vmas are walked with the semaphore held.
>>
>> Please read the patch I posted properly.
>
>
> In an other thread
>
> http://lists.arm.linux.org.uk/lurker/message/20120406.033509.ca9fe8cf.en.html
>
> it seems that we have an additional user where this patch fixed an issue.
>
> So would it be possible to talk about this patch, again?

This patch had plenty of talk already. :-)  What it needs is for
someone to go in and annotate (and thus partially rewrite) the pre-v6
cacheflush loops with exception fixups, so that the mmap_sem can be
safely dropped.

As mentioned above, v6 and v7 are already annotated -- so a bad
passed-in pointer will just cause a fault, fixup and signal to the
application like other architectures where the flush can be done from
userspace, or any other bad pointer passed into a syscall.

I started looking at it, realized it needed more than just trivial
changes and haven't had a chance to revisit it since. It's currently
not near the top of my todo list so if someone else has time to do it,
please go ahead.


-Olof

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-09 14:24                 ` Olof Johansson
@ 2012-04-10 17:17                   ` Will Deacon
  2012-04-18 15:05                     ` Will Deacon
  2012-04-18  8:40                   ` Catalin Marinas
  1 sibling, 1 reply; 36+ messages in thread
From: Will Deacon @ 2012-04-10 17:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 09, 2012 at 03:24:29PM +0100, Olof Johansson wrote:
> On Sun, Apr 8, 2012 at 10:58 PM, Dirk Behme <dirk.behme@googlemail.com> wrote:
> >
> > So would it be possible to talk about this patch, again?
> 
> This patch had plenty of talk already. :-)  What it needs is for
> someone to go in and annotate (and thus partially rewrite) the pre-v6
> cacheflush loops with exception fixups, so that the mmap_sem can be
> safely dropped.
> 
> As mentioned above, v6 and v7 are already annotated -- so a bad
> passed-in pointer will just cause a fault, fixup and signal to the
> application like other architectures where the flush can be done from
> userspace, or any other bad pointer passed into a syscall.

Are you sure about this? It looks to me like the v6 implementation just
skips over faulting pages and the v7 implementation gives up after the first
faulting page and just returns 0 (despite the comment suggesting otherwise).

If we want to send a signal, we need to rework do_cache_op slightly and fix
up all of the low-level cache accessors.  We also need to decide whether we
continue past a faulting page or give up at that point. Alternatively, we
could return -EFAULT from the syscall instead of sending a signal.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-09 14:24                 ` Olof Johansson
  2012-04-10 17:17                   ` Will Deacon
@ 2012-04-18  8:40                   ` Catalin Marinas
  1 sibling, 0 replies; 36+ messages in thread
From: Catalin Marinas @ 2012-04-18  8:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 09, 2012 at 03:24:29PM +0100, Olof Johansson wrote:
> On Sun, Apr 8, 2012 at 10:58 PM, Dirk Behme <dirk.behme@googlemail.com> wrote:
> > In an other thread
> >
> > http://lists.arm.linux.org.uk/lurker/message/20120406.033509.ca9fe8cf.en.html
> >
> > it seems that we have an additional user where this patch fixed an issue.
> >
> > So would it be possible to talk about this patch, again?
> 
> This patch had plenty of talk already. :-)  What it needs is for
> someone to go in and annotate (and thus partially rewrite) the pre-v6
> cacheflush loops with exception fixups, so that the mmap_sem can be
> safely dropped.

There is no need to annotate the pre-v6 cacheflush loops as the cache
maintenance ops on v5 and early do not generate translation or page
faults (VIVT cache that doesn't do page table walks).

> As mentioned above, v6 and v7 are already annotated -- so a bad
> passed-in pointer will just cause a fault, fixup and signal to the
> application like other architectures where the flush can be done from
> userspace, or any other bad pointer passed into a syscall.

A bad pointer is already ignored by do_cache_op() since find_vma()
cannot find a proper address. There is a scenario for multi-threaded
apps where one thread unmaps a range while another tries to flush. For
consistency with the other bad pointer case (i.e. ignoring it), the
cache flushing fixup code simply skips the faulting page without sending
signals to the application.

>From my perspective, this patch is still valid (it keeps the find_vma()
call under the mmap_sem lock but calls the cache flushing outside this
critical region).

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-10 17:17                   ` Will Deacon
@ 2012-04-18 15:05                     ` Will Deacon
  2012-04-18 15:27                       ` Russell King - ARM Linux
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2012-04-18 15:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 10, 2012 at 06:17:20PM +0100, Will Deacon wrote:
> On Mon, Apr 09, 2012 at 03:24:29PM +0100, Olof Johansson wrote:
> > This patch had plenty of talk already. :-)  What it needs is for
> > someone to go in and annotate (and thus partially rewrite) the pre-v6
> > cacheflush loops with exception fixups, so that the mmap_sem can be
> > safely dropped.
> > 
> > As mentioned above, v6 and v7 are already annotated -- so a bad
> > passed-in pointer will just cause a fault, fixup and signal to the
> > application like other architectures where the flush can be done from
> > userspace, or any other bad pointer passed into a syscall.
> 
> Are you sure about this? It looks to me like the v6 implementation just
> skips over faulting pages and the v7 implementation gives up after the first
> faulting page and just returns 0 (despite the comment suggesting otherwise).
> 
> If we want to send a signal, we need to rework do_cache_op slightly and fix
> up all of the low-level cache accessors.  We also need to decide whether we
> continue past a faulting page or give up at that point. Alternatively, we
> could return -EFAULT from the syscall instead of sending a signal.

... and here's a patch on top of the original which lets us return errors to
userspace from the cacheflush syscall.

Will


>From adbb915ab1ef37d7c66f6d48761d481fe6361f32 Mon Sep 17 00:00:00 2001
From: Will Deacon <will.deacon@arm.com>
Date: Wed, 18 Apr 2012 10:50:15 +0100
Subject: [PATCH] ARM: cacheflush: return error to userspace when flushing syscall fails

The cacheflush syscall can fail for two reasons:

(1) The arguments are invalid (nonsensical address range or no VMA)

(2) The region generates a translation fault on a VIPT or PIPT cache

This patch allows do_cache_op to return an error code to userspace in
the case of the above. The various coherent_user_range implementations
are modified to return 0 in the case of VIVT caches or -EFAULT in the
case of an abort on v6/v7 cores.

Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/cacheflush.h |    4 ++--
 arch/arm/kernel/traps.c           |   11 +++++------
 arch/arm/mm/cache-v3.S            |    1 +
 arch/arm/mm/cache-v4.S            |    1 +
 arch/arm/mm/cache-v4wb.S          |    6 +++---
 arch/arm/mm/cache-v4wt.S          |    1 +
 arch/arm/mm/cache-v6.S            |   10 ++++------
 arch/arm/mm/cache-v7.S            |   10 ++++------
 arch/arm/mm/proc-arm1020.S        |    1 +
 arch/arm/mm/proc-arm1020e.S       |    1 +
 arch/arm/mm/proc-arm1022.S        |    1 +
 arch/arm/mm/proc-arm1026.S        |    1 +
 arch/arm/mm/proc-arm920.S         |    1 +
 arch/arm/mm/proc-arm922.S         |    1 +
 arch/arm/mm/proc-arm925.S         |    1 +
 arch/arm/mm/proc-arm926.S         |    1 +
 arch/arm/mm/proc-arm940.S         |    6 +++---
 arch/arm/mm/proc-arm946.S         |    1 +
 arch/arm/mm/proc-feroceon.S       |    1 +
 arch/arm/mm/proc-mohawk.S         |    1 +
 20 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index 1252a26..004c1bc 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -101,7 +101,7 @@ struct cpu_cache_fns {
 	void (*flush_user_range)(unsigned long, unsigned long, unsigned int);
 
 	void (*coherent_kern_range)(unsigned long, unsigned long);
-	void (*coherent_user_range)(unsigned long, unsigned long);
+	int  (*coherent_user_range)(unsigned long, unsigned long);
 	void (*flush_kern_dcache_area)(void *, size_t);
 
 	void (*dma_map_area)(const void *, size_t, int);
@@ -142,7 +142,7 @@ extern void __cpuc_flush_kern_all(void);
 extern void __cpuc_flush_user_all(void);
 extern void __cpuc_flush_user_range(unsigned long, unsigned long, unsigned int);
 extern void __cpuc_coherent_kern_range(unsigned long, unsigned long);
-extern void __cpuc_coherent_user_range(unsigned long, unsigned long);
+extern int  __cpuc_coherent_user_range(unsigned long, unsigned long);
 extern void __cpuc_flush_dcache_area(void *, size_t);
 
 /*
diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index 63d402f..3647170 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -479,14 +479,14 @@ static int bad_syscall(int n, struct pt_regs *regs)
 	return regs->ARM_r0;
 }
 
-static inline void
+static inline int
 do_cache_op(unsigned long start, unsigned long end, int flags)
 {
 	struct mm_struct *mm = current->active_mm;
 	struct vm_area_struct *vma;
 
 	if (end < start || flags)
-		return;
+		return -EINVAL;
 
 	down_read(&mm->mmap_sem);
 	vma = find_vma(mm, start);
@@ -497,10 +497,10 @@ do_cache_op(unsigned long start, unsigned long end, int flags)
 			end = vma->vm_end;
 
 		up_read(&mm->mmap_sem);
-		flush_cache_user_range(start, end);
-		return;
+		return flush_cache_user_range(start, end);
 	}
 	up_read(&mm->mmap_sem);
+	return -EINVAL;
 }
 
 /*
@@ -546,8 +546,7 @@ asmlinkage int arm_syscall(int no, struct pt_regs *regs)
 	 * the specified region).
 	 */
 	case NR(cacheflush):
-		do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2);
-		return 0;
+		return do_cache_op(regs->ARM_r0, regs->ARM_r1, regs->ARM_r2);
 
 	case NR(usr26):
 		if (!(elf_hwcap & HWCAP_26BIT))
diff --git a/arch/arm/mm/cache-v3.S b/arch/arm/mm/cache-v3.S
index c2301f2..52e35f3 100644
--- a/arch/arm/mm/cache-v3.S
+++ b/arch/arm/mm/cache-v3.S
@@ -78,6 +78,7 @@ ENTRY(v3_coherent_kern_range)
  *	- end	 - virtual end address
  */
 ENTRY(v3_coherent_user_range)
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/cache-v4.S b/arch/arm/mm/cache-v4.S
index fd9bb7a..022135d 100644
--- a/arch/arm/mm/cache-v4.S
+++ b/arch/arm/mm/cache-v4.S
@@ -88,6 +88,7 @@ ENTRY(v4_coherent_kern_range)
  *	- end	 - virtual end address
  */
 ENTRY(v4_coherent_user_range)
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/cache-v4wb.S b/arch/arm/mm/cache-v4wb.S
index 4f2c141..8f1eeae 100644
--- a/arch/arm/mm/cache-v4wb.S
+++ b/arch/arm/mm/cache-v4wb.S
@@ -167,9 +167,9 @@ ENTRY(v4wb_coherent_user_range)
 	add	r0, r0, #CACHE_DLINESIZE
 	cmp	r0, r1
 	blo	1b
-	mov	ip, #0
-	mcr	p15, 0, ip, c7, c5, 0		@ invalidate I cache
-	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mov	r0, #0
+	mcr	p15, 0, r0, c7, c5, 0		@ invalidate I cache
+	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
 	mov	pc, lr
 
 
diff --git a/arch/arm/mm/cache-v4wt.S b/arch/arm/mm/cache-v4wt.S
index 4d7b467..b34a5f9 100644
--- a/arch/arm/mm/cache-v4wt.S
+++ b/arch/arm/mm/cache-v4wt.S
@@ -125,6 +125,7 @@ ENTRY(v4wt_coherent_user_range)
 	add	r0, r0, #CACHE_DLINESIZE
 	cmp	r0, r1
 	blo	1b
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/cache-v6.S b/arch/arm/mm/cache-v6.S
index 74c2e5a..4b10760 100644
--- a/arch/arm/mm/cache-v6.S
+++ b/arch/arm/mm/cache-v6.S
@@ -12,6 +12,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/errno.h>
 #include <asm/unwind.h>
 
 #include "proc-macros.S"
@@ -135,7 +136,6 @@ ENTRY(v6_coherent_user_range)
 1:
  USER(	mcr	p15, 0, r0, c7, c10, 1	)	@ clean D line
 	add	r0, r0, #CACHE_LINE_SIZE
-2:
 	cmp	r0, r1
 	blo	1b
 #endif
@@ -154,13 +154,11 @@ ENTRY(v6_coherent_user_range)
 
 /*
  * Fault handling for the cache operation above. If the virtual address in r0
- * isn't mapped, just try the next page.
+ * isn't mapped, fail with -EFAULT.
  */
 9001:
-	mov	r0, r0, lsr #12
-	mov	r0, r0, lsl #12
-	add	r0, r0, #4096
-	b	2b
+	mov	r0, #-EFAULT
+	mov	pc, lr
  UNWIND(.fnend		)
 ENDPROC(v6_coherent_user_range)
 ENDPROC(v6_coherent_kern_range)
diff --git a/arch/arm/mm/cache-v7.S b/arch/arm/mm/cache-v7.S
index a655d3d..39e3fb3 100644
--- a/arch/arm/mm/cache-v7.S
+++ b/arch/arm/mm/cache-v7.S
@@ -13,6 +13,7 @@
 #include <linux/linkage.h>
 #include <linux/init.h>
 #include <asm/assembler.h>
+#include <asm/errno.h>
 #include <asm/unwind.h>
 
 #include "proc-macros.S"
@@ -198,7 +199,6 @@ ENTRY(v7_coherent_user_range)
 	add	r12, r12, r2
 	cmp	r12, r1
 	blo	2b
-3:
 	mov	r0, #0
 	ALT_SMP(mcr	p15, 0, r0, c7, c1, 6)	@ invalidate BTB Inner Shareable
 	ALT_UP(mcr	p15, 0, r0, c7, c5, 6)	@ invalidate BTB
@@ -208,13 +208,11 @@ ENTRY(v7_coherent_user_range)
 
 /*
  * Fault handling for the cache operation above. If the virtual address in r0
- * isn't mapped, just try the next page.
+ * isn't mapped, fail with -EFAULT.
  */
 9001:
-	mov	r12, r12, lsr #12
-	mov	r12, r12, lsl #12
-	add	r12, r12, #4096
-	b	3b
+	mov	r0, #-EFAULT
+	mov	pc, lr
  UNWIND(.fnend		)
 ENDPROC(v7_coherent_kern_range)
 ENDPROC(v7_coherent_user_range)
diff --git a/arch/arm/mm/proc-arm1020.S b/arch/arm/mm/proc-arm1020.S
index 2349513..0650bb8 100644
--- a/arch/arm/mm/proc-arm1020.S
+++ b/arch/arm/mm/proc-arm1020.S
@@ -241,6 +241,7 @@ ENTRY(arm1020_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm1020e.S b/arch/arm/mm/proc-arm1020e.S
index c244b06..4188478 100644
--- a/arch/arm/mm/proc-arm1020e.S
+++ b/arch/arm/mm/proc-arm1020e.S
@@ -235,6 +235,7 @@ ENTRY(arm1020e_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm1022.S b/arch/arm/mm/proc-arm1022.S
index 38fe22e..33c6882 100644
--- a/arch/arm/mm/proc-arm1022.S
+++ b/arch/arm/mm/proc-arm1022.S
@@ -224,6 +224,7 @@ ENTRY(arm1022_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm1026.S b/arch/arm/mm/proc-arm1026.S
index 3eb9c3c..fbc1d5f 100644
--- a/arch/arm/mm/proc-arm1026.S
+++ b/arch/arm/mm/proc-arm1026.S
@@ -218,6 +218,7 @@ ENTRY(arm1026_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm920.S b/arch/arm/mm/proc-arm920.S
index cb941ae..1a8c138 100644
--- a/arch/arm/mm/proc-arm920.S
+++ b/arch/arm/mm/proc-arm920.S
@@ -210,6 +210,7 @@ ENTRY(arm920_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm922.S b/arch/arm/mm/proc-arm922.S
index 4ec0e07..4c44d7e 100644
--- a/arch/arm/mm/proc-arm922.S
+++ b/arch/arm/mm/proc-arm922.S
@@ -212,6 +212,7 @@ ENTRY(arm922_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm925.S b/arch/arm/mm/proc-arm925.S
index 9dccd9a..ec5b118 100644
--- a/arch/arm/mm/proc-arm925.S
+++ b/arch/arm/mm/proc-arm925.S
@@ -258,6 +258,7 @@ ENTRY(arm925_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm926.S b/arch/arm/mm/proc-arm926.S
index 820259b..c31e62c 100644
--- a/arch/arm/mm/proc-arm926.S
+++ b/arch/arm/mm/proc-arm926.S
@@ -221,6 +221,7 @@ ENTRY(arm926_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm940.S b/arch/arm/mm/proc-arm940.S
index 9fdc0a1..a613a7d 100644
--- a/arch/arm/mm/proc-arm940.S
+++ b/arch/arm/mm/proc-arm940.S
@@ -160,7 +160,7 @@ ENTRY(arm940_coherent_user_range)
  *	- size	- region size
  */
 ENTRY(arm940_flush_kern_dcache_area)
-	mov	ip, #0
+	mov	r0, #0
 	mov	r1, #(CACHE_DSEGMENTS - 1) << 4	@ 4 segments
 1:	orr	r3, r1, #(CACHE_DENTRIES - 1) << 26 @ 64 entries
 2:	mcr	p15, 0, r3, c7, c14, 2		@ clean/flush D index
@@ -168,8 +168,8 @@ ENTRY(arm940_flush_kern_dcache_area)
 	bcs	2b				@ entries 63 to 0
 	subs	r1, r1, #1 << 4
 	bcs	1b				@ segments 7 to 0
-	mcr	p15, 0, ip, c7, c5, 0		@ invalidate I cache
-	mcr	p15, 0, ip, c7, c10, 4		@ drain WB
+	mcr	p15, 0, r0, c7, c5, 0		@ invalidate I cache
+	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-arm946.S b/arch/arm/mm/proc-arm946.S
index f684cfe..9f4f299 100644
--- a/arch/arm/mm/proc-arm946.S
+++ b/arch/arm/mm/proc-arm946.S
@@ -190,6 +190,7 @@ ENTRY(arm946_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-feroceon.S b/arch/arm/mm/proc-feroceon.S
index ba3c500..23a8e4c 100644
--- a/arch/arm/mm/proc-feroceon.S
+++ b/arch/arm/mm/proc-feroceon.S
@@ -232,6 +232,7 @@ ENTRY(feroceon_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
diff --git a/arch/arm/mm/proc-mohawk.S b/arch/arm/mm/proc-mohawk.S
index cdfedc5..b047546 100644
--- a/arch/arm/mm/proc-mohawk.S
+++ b/arch/arm/mm/proc-mohawk.S
@@ -193,6 +193,7 @@ ENTRY(mohawk_coherent_user_range)
 	cmp	r0, r1
 	blo	1b
 	mcr	p15, 0, r0, c7, c10, 4		@ drain WB
+	mov	r0, #0
 	mov	pc, lr
 
 /*
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-18 15:05                     ` Will Deacon
@ 2012-04-18 15:27                       ` Russell King - ARM Linux
  2012-04-18 16:27                         ` Will Deacon
  2012-04-18 17:15                         ` Catalin Marinas
  0 siblings, 2 replies; 36+ messages in thread
From: Russell King - ARM Linux @ 2012-04-18 15:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 18, 2012 at 04:05:08PM +0100, Will Deacon wrote:
> ... and here's a patch on top of the original which lets us return errors to
> userspace from the cacheflush syscall.

Is this worth it?  No one's checking for errors as it stands.

> >From adbb915ab1ef37d7c66f6d48761d481fe6361f32 Mon Sep 17 00:00:00 2001
> From: Will Deacon <will.deacon@arm.com>
> Date: Wed, 18 Apr 2012 10:50:15 +0100
> Subject: [PATCH] ARM: cacheflush: return error to userspace when flushing syscall fails
> 
> The cacheflush syscall can fail for two reasons:
> 
> (1) The arguments are invalid (nonsensical address range or no VMA)

That's a good reason...

> (2) The region generates a translation fault on a VIPT or PIPT cache

But I don't think this is, for two reasons:

1. Remember that the purpose of this operation is to allow userspace to
   write code into pages and then execute that code - so all that we're
   concerned about in this call is to ensure that the I/D caches for the
   memory region are synchronized.

2. We'll fault at the cache operation on v6 and v7 because the page was
   not present, and that will make the system try and bring the page back
   into the memory space.  Only if that fails (eg, because we're running
   low on memory) will we fail to replace the page, and use the exception
   handling.

I think the original intention for (2) was that the page would merely get
skipped and we'd move onto the next page without trying to bring the
previous page back into the memory space - this won't work because of the
way the page fault handler works.  If we want that behaviour, we need to
use pagefault_disable()..pagefault_enable() around the cache flushing to
ensure that the page fault handler does not try to replace the missing
page, but instead immediately calls the fixup function.

Now, the fact is that 99% of userspace will not be checking the return
value of this syscall (they can't, because the return value has never been
defined.)  They can't even start to check the return value, because they
may fail on older kernels which don't.  So all round, this is a bad idea.

Just treat this the same way that the 'prefetch' instruction does - ignore
errors, and continue.  If you're going to be calling this function on a
region which you haven't just written code to (and therefore should exist)
then you're doing something wrong in userspace.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-18 15:27                       ` Russell King - ARM Linux
@ 2012-04-18 16:27                         ` Will Deacon
  2012-04-18 17:15                         ` Catalin Marinas
  1 sibling, 0 replies; 36+ messages in thread
From: Will Deacon @ 2012-04-18 16:27 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Wed, Apr 18, 2012 at 04:27:26PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 18, 2012 at 04:05:08PM +0100, Will Deacon wrote:
> > ... and here's a patch on top of the original which lets us return errors to
> > userspace from the cacheflush syscall.
> 
> Is this worth it?  No one's checking for errors as it stands.

Indeed, but that's because we always return 0 at the moment. I spoke to some
of the JIT guys here and they seemed to think it would be useful for them.

> > The cacheflush syscall can fail for two reasons:
> > 
> > (1) The arguments are invalid (nonsensical address range or no VMA)
> 
> That's a good reason...
> 
> > (2) The region generates a translation fault on a VIPT or PIPT cache
> 
> But I don't think this is, for two reasons:
> 
> 1. Remember that the purpose of this operation is to allow userspace to
>    write code into pages and then execute that code - so all that we're
>    concerned about in this call is to ensure that the I/D caches for the
>    memory region are synchronized.
> 
> 2. We'll fault at the cache operation on v6 and v7 because the page was
>    not present, and that will make the system try and bring the page back
>    into the memory space.  Only if that fails (eg, because we're running
>    low on memory) will we fail to replace the page, and use the exception
>    handling.

Understood, although the case I was actually thinking of is when you flush a
region that has the wrong protection bits (i.e. PROT_WRITE isn't set). In
this case it probably means userspace is doing something wrong and it would
be nice to indicate that with an -EFAULT.

> I think the original intention for (2) was that the page would merely get
> skipped and we'd move onto the next page without trying to bring the
> previous page back into the memory space - this won't work because of the
> way the page fault handler works.  If we want that behaviour, we need to
> use pagefault_disable()..pagefault_enable() around the cache flushing to
> ensure that the page fault handler does not try to replace the missing
> page, but instead immediately calls the fixup function.

I think the cleanest way to tidy this up is to give up as soon as we take a
fault that wasn't handled by the page fault handler. I don't see the value in
continuing with the remaining pages. even if we decide not to report it to
userspace.

> Now, the fact is that 99% of userspace will not be checking the return
> value of this syscall (they can't, because the return value has never been
> defined.)  They can't even start to check the return value, because they
> may fail on older kernels which don't.  So all round, this is a bad idea.

It looks to me like we've always return 0 for this syscall, so we could
define it retrospectively if we need to.

> Just treat this the same way that the 'prefetch' instruction does - ignore
> errors, and continue.  If you're going to be calling this function on a
> region which you haven't just written code to (and therefore should exist)
> then you're doing something wrong in userspace.

Agreed, but it would be nice to shout at userspace when they're misbehaving.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH] ARM: Do not call flush_cache_user_range with mmap_sem held
  2012-04-18 15:27                       ` Russell King - ARM Linux
  2012-04-18 16:27                         ` Will Deacon
@ 2012-04-18 17:15                         ` Catalin Marinas
  1 sibling, 0 replies; 36+ messages in thread
From: Catalin Marinas @ 2012-04-18 17:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 18, 2012 at 04:27:26PM +0100, Russell King - ARM Linux wrote:
> On Wed, Apr 18, 2012 at 04:05:08PM +0100, Will Deacon wrote:
> > The cacheflush syscall can fail for two reasons:
> > 
> > (1) The arguments are invalid (nonsensical address range or no VMA)
> 
> That's a good reason...
> 
> > (2) The region generates a translation fault on a VIPT or PIPT cache
> 
> But I don't think this is, for two reasons:
> 
> 1. Remember that the purpose of this operation is to allow userspace to
>    write code into pages and then execute that code - so all that we're
>    concerned about in this call is to ensure that the I/D caches for the
>    memory region are synchronized.
> 
> 2. We'll fault at the cache operation on v6 and v7 because the page was
>    not present, and that will make the system try and bring the page back
>    into the memory space. 

With the current mainline kernel, if a fault happens during the cache
flushing operation we will deadlock since we try to re-acquire the
mmap_sem in do_page_fault() (that's what the original patch tries to
fix).

>    Only if that fails (eg, because we're running
>    low on memory) will we fail to replace the page, and use the exception
>    handling.
> 
> I think the original intention for (2) was that the page would merely get
> skipped and we'd move onto the next page without trying to bring the
> previous page back into the memory space - this won't work because of the
> way the page fault handler works.  If we want that behaviour, we need to
> use pagefault_disable()..pagefault_enable() around the cache flushing to
> ensure that the page fault handler does not try to replace the missing
> page, but instead immediately calls the fixup function.

That's not always desirable as we may fault on an old pte and the
mkyoung path does not necessarily flush the caches. So processing the
page fault is still required in some cases.

-- 
Catalin

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-04-18 17:15 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-28  7:32 [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Dima Zavin
2010-04-28  7:35 ` Dima Zavin
2010-04-29 13:00 ` Russell King - ARM Linux
2010-04-29 18:16   ` Jamie Lokier
2010-04-29 18:24     ` Russell King - ARM Linux
2010-04-29 19:23       ` Dima Zavin
2010-05-04  4:07         ` Dima Zavin
2010-05-04  7:40           ` Russell King - ARM Linux
2010-05-06 15:00       ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_semheld Catalin Marinas
2010-05-06 16:01         ` Jamie Lokier
2010-05-06 16:07           ` Jamie Lokier
2010-05-06 16:24             ` Catalin Marinas
2010-05-06 16:21           ` Catalin Marinas
2010-05-06 15:08 ` [PATCH] [ARM] Do not call flush_cache_user_range with mmap_sem held Catalin Marinas
  -- strict thread matches above, loose matches on Subject: below --
2011-11-07 17:33 [PATCH] ARM: " Catalin Marinas
2011-11-16 21:23 ` Olof Johansson
2011-11-16 23:50   ` Russell King - ARM Linux
2011-11-17  0:16     ` Olof Johansson
2011-11-17  0:20       ` Olof Johansson
2011-11-17 10:26       ` Catalin Marinas
2011-11-17 10:49         ` Russell King - ARM Linux
2011-11-17 10:45       ` Russell King - ARM Linux
2011-11-20 17:54         ` Olof Johansson
2011-11-17 10:22     ` Catalin Marinas
2011-11-17 10:42       ` Russell King - ARM Linux
2011-11-17 10:59         ` Catalin Marinas
2011-11-17 11:03           ` Russell King - ARM Linux
2011-11-17 11:25             ` Catalin Marinas
2012-04-09  5:58               ` Dirk Behme
2012-04-09 14:24                 ` Olof Johansson
2012-04-10 17:17                   ` Will Deacon
2012-04-18 15:05                     ` Will Deacon
2012-04-18 15:27                       ` Russell King - ARM Linux
2012-04-18 16:27                         ` Will Deacon
2012-04-18 17:15                         ` Catalin Marinas
2012-04-18  8:40                   ` Catalin Marinas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).