* mm: get_user_pages_fast()
@ 2013-11-05 22:43 Michael McTernan
2013-11-06 10:19 ` Will Deacon
2013-11-06 10:54 ` Russell King - ARM Linux
0 siblings, 2 replies; 7+ messages in thread
From: Michael McTernan @ 2013-11-05 22:43 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
I think there's a problem on ARM with futex calls to FUTEX_WAIT and
similar for addresses that have been mapped to userspace from /dev/mem
or a device which sets VM_IO or VM_PFNMAP. This can break
pthread_mutex_lock() if placed on such mapped memory and attributed as
process-shared.
The cause looks to be that ARM doesn't yet implement
get_user_pages_fast(), which is called in get_futex_key() for a
non-process private futex.
On ARM the weak wrapper function in mm/util.c is used as a compatibility
fallback. This fallback takes the mm->mmap_sem lock, and calls through
to get_user_pages(). Unfortunately get_user_pages() then returns
-EFAULT for pages marked with VM_IO | VM_PFNMAP set. Memory which has
been mapped to userspace through /dev/mem is correctly flagged in this
way.
get_user_pages_fast() on other architectures, as implemented in
arch/.../gup.c doesn't appear have the same checks for VM_IO or
VM_PFNMAP, so futex functionality can work.
Implementing get_user_pages_fast() to the same spec on ARM should fix
this. The weak get_user_pages_fast() may then also need fixing or
eventually removing since its behaviour isn't the same as
get_user_pages_fast() in all cases and may be causing bugs where it us
used. Bugs in locking are not very nice.
I've seen a couple of patches go by that provide get_user_pages_fast()
on ARM, but none yet accepted. I therefore would like to highlight this
particular case for context, and also to perhaps generate some movement
on this issue, which I also reported here:
https://bugzilla.kernel.org/show_bug.cgi?id=64321
For reference, here are some previous patches and discussion on adding
this function:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/162115.html
https://lkml.org/lkml/2013/10/3/529
http://patches.linaro.org/21130/
Kind Regards,
Mike
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-05 22:43 mm: get_user_pages_fast() Michael McTernan
@ 2013-11-06 10:19 ` Will Deacon
2013-11-06 10:54 ` Russell King - ARM Linux
1 sibling, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-11-06 10:19 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 05, 2013 at 10:43:15PM +0000, Michael McTernan wrote:
> Hi all,
Hi Mike,
> I think there's a problem on ARM with futex calls to FUTEX_WAIT and
> similar for addresses that have been mapped to userspace from /dev/mem
> or a device which sets VM_IO or VM_PFNMAP. This can break
> pthread_mutex_lock() if placed on such mapped memory and attributed as
> process-shared.
>
> The cause looks to be that ARM doesn't yet implement
> get_user_pages_fast(), which is called in get_futex_key() for a
> non-process private futex.
[...]
> I've seen a couple of patches go by that provide get_user_pages_fast()
> on ARM, but none yet accepted. I therefore would like to highlight this
> particular case for context, and also to perhaps generate some movement
> on this issue, which I also reported here:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=64321
The latest patches from Steve are here:
http://lists.infradead.org/pipermail/linux-arm-kernel/2013-October/205951.html
but we need some feedback from the MM guys regarding the atomics in patch 2
(since we don't require IPIs for TLB invalidation). I'd also like to see
some numbers comparing the two schemes (since we could add a dummy IPI to
our huge TLB flushing code if it's cheaper).
Will
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-05 22:43 mm: get_user_pages_fast() Michael McTernan
2013-11-06 10:19 ` Will Deacon
@ 2013-11-06 10:54 ` Russell King - ARM Linux
2013-11-06 11:07 ` Will Deacon
2013-11-06 19:38 ` Michael McTernan
1 sibling, 2 replies; 7+ messages in thread
From: Russell King - ARM Linux @ 2013-11-06 10:54 UTC (permalink / raw)
To: linux-arm-kernel
On Tue, Nov 05, 2013 at 10:43:15PM +0000, Michael McTernan wrote:
> I think there's a problem on ARM with futex calls to FUTEX_WAIT and
> similar for addresses that have been mapped to userspace from /dev/mem
> or a device which sets VM_IO or VM_PFNMAP. This can break
> pthread_mutex_lock() if placed on such mapped memory and attributed as
> process-shared.
Firstly, please stop this madness. Placing futexes in IO memory is not
supported, period. You can't map random bits of /dev/mem and expect
this stuff to work.
(a) you're generally not allowed to map kernel memory via /dev/mem, so
that rules out all kernel managed memory.
(b) you are allowed to map memory outside of that, but you get it as
strongly ordered memory.
(b) means that the load/store exclusives, which userspace mutexes will
use on ARMv6+, will not work correctly. (they're not supported to
strongly ordered memory by the architecture.)
In other words, don't put mutexes in memory you've remapped from /dev/mem.
In fact, as an application, you should not be mapping /dev/mem at all.
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-06 10:54 ` Russell King - ARM Linux
@ 2013-11-06 11:07 ` Will Deacon
2013-11-06 19:38 ` Michael McTernan
1 sibling, 0 replies; 7+ messages in thread
From: Will Deacon @ 2013-11-06 11:07 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Nov 06, 2013 at 10:54:00AM +0000, Russell King - ARM Linux wrote:
> On Tue, Nov 05, 2013 at 10:43:15PM +0000, Michael McTernan wrote:
> > I think there's a problem on ARM with futex calls to FUTEX_WAIT and
> > similar for addresses that have been mapped to userspace from /dev/mem
> > or a device which sets VM_IO or VM_PFNMAP. This can break
> > pthread_mutex_lock() if placed on such mapped memory and attributed as
> > process-shared.
>
> Firstly, please stop this madness. Placing futexes in IO memory is not
> supported, period. You can't map random bits of /dev/mem and expect
> this stuff to work.
Yikes, I didn't notice that when I read the original post!
> (a) you're generally not allowed to map kernel memory via /dev/mem, so
> that rules out all kernel managed memory.
> (b) you are allowed to map memory outside of that, but you get it as
> strongly ordered memory.
>
> (b) means that the load/store exclusives, which userspace mutexes will
> use on ARMv6+, will not work correctly. (they're not supported to
> strongly ordered memory by the architecture.)
>
> In other words, don't put mutexes in memory you've remapped from /dev/mem.
> In fact, as an application, you should not be mapping /dev/mem at all.
Completely agreed.
As an aside, fastgup is still something we want on ARM (it is required to
support futexes on transparent huge pages, which is what I originally
assumed this was about), so my previous comments about Steve's patches still
stand.
Will
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-06 10:54 ` Russell King - ARM Linux
2013-11-06 11:07 ` Will Deacon
@ 2013-11-06 19:38 ` Michael McTernan
2013-11-06 19:57 ` Russell King - ARM Linux
1 sibling, 1 reply; 7+ messages in thread
From: Michael McTernan @ 2013-11-06 19:38 UTC (permalink / raw)
To: linux-arm-kernel
On 11/06/2013 10:54 AM, Russell King - ARM Linux wrote:
> Firstly, please stop this madness.
In the context of an embedded system, where memory is shared between
CPUs, DSP, FPGAs and some peripherals, and where the SCU is maintaining
coherency on selected areas, this is a feature of the platform, not
madness.
> Placing futexes in IO memory is not supported, period.
Except the _PRIVATE futex calls do work fine in IO memory since they
don't walk the process page tables to determine the 'key' from the
address. These a limited to only sync between threads in the same
processes though.
> You can't map random bits of /dev/mem and expect
> this stuff to work.
Nothing I am mapping is random.
> (b) you are allowed to map memory outside of that, but you get it as
> strongly ordered memory.
>
> (b) means that the load/store exclusives, which userspace mutexes will
> use on ARMv6+, will not work correctly.
> (they're not supported to strongly ordered memory by the architecture.)
I don't see this stated in the ARM Architecture Reference Manual.
Instead I see it saying that it is implementation dependent whether
LDREX and STREX work on strongly ordered memory (section A3.4.5).
> In other words, don't put mutexes in memory you've remapped from /dev/mem.
> In fact, as an application, you should not be mapping /dev/mem at all.
It's certainly more standard to map the mutexes() out of something like
file backed mmap(), but that isn't as elegant for my application.
To give a bit of background, what we have is some memory at a know
address which controls a peripheral. Part of the peripheral addresses
are buffers and registers, but some is set aside for use by software.
Being able to map this whole section into userspace makes for really
fast fast-paths where the driver is basically outside the kernel. Also
keeping driver state variables in the peripheral RAM means things can be
restarted if the Linux processes crash or die, or even if the whole of
Linux were to be restarted for some reason. So it's more than
convenient to have the mutexes work there, though as an easier goal I'll
probably just get the futex calls to work for my applications.
Hopefully this doesn't sound as mad as first assumed & thanks for your time.
Kind Regards,
Mike
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-06 19:38 ` Michael McTernan
@ 2013-11-06 19:57 ` Russell King - ARM Linux
2013-11-06 22:31 ` Michael McTernan
0 siblings, 1 reply; 7+ messages in thread
From: Russell King - ARM Linux @ 2013-11-06 19:57 UTC (permalink / raw)
To: linux-arm-kernel
On Wed, Nov 06, 2013 at 07:38:33PM +0000, Michael McTernan wrote:
> On 11/06/2013 10:54 AM, Russell King - ARM Linux wrote:
>> Firstly, please stop this madness.
>
> In the context of an embedded system, where memory is shared between
> CPUs, DSP, FPGAs and some peripherals, and where the SCU is maintaining
> coherency on selected areas, this is a feature of the platform, not
> madness.
Please expose the memory in a saner way rather than re-using /dev/mem.
/dev/mem is all well and good, but it has side effects with the way it
maps memory. Having your own driver gives you control over that. It's
also better practice because it avoids a whole load of security issues.
> > Placing futexes in IO memory is not supported, period.
>
> Except the _PRIVATE futex calls do work fine in IO memory since they
> don't walk the process page tables to determine the 'key' from the
> address. These a limited to only sync between threads in the same
> processes though.
Sorry, I should have said mutexes. The point here is that userspace
pthreads can ldrex/strex on that memory before issuing any futex calls.
> > You can't map random bits of /dev/mem and expect
> > this stuff to work.
>
> Nothing I am mapping is random.
As far as I'm concerned, it's random areas of system memory - random in
the sense that it's not well defined what it is and what attributes it
should have in the page tables for things like memory type etc. I
somehow doubt that you've thought about this at all, you just decided
that opening /dev/mem is a solution to your problem.
>> (b) means that the load/store exclusives, which userspace mutexes will
>> use on ARMv6+, will not work correctly.
>> (they're not supported to strongly ordered memory by the architecture.)
>
> I don't see this stated in the ARM Architecture Reference Manual.
> Instead I see it saying that it is implementation dependent whether
> LDREX and STREX work on strongly ordered memory (section A3.4.5).
Here's the latest wording:
It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
performed to a memory region with the Device or Strongly-ordered memory
attribute. Unless the implementation documentation explicitly states
that LDREX and STREX operations to a memory region with the Device or
Strongly-ordered attribute are permitted, the effect of such operations
is UNPREDICTABLE.
In other words, it is only supported where it is explicitly documented
that it works on these memory types; otherwise it must be assumed that
they do not work...
However, that concern goes away if you provide your own device driver
in Linux to provide _just_ a mechanism to map this memory. Plus, you
can restrict it to just being able to map this specific memory without
exposing the entire system RAM to anything which happens to exploit
your application.
>> In other words, don't put mutexes in memory you've remapped from /dev/mem.
>> In fact, as an application, you should not be mapping /dev/mem at all.
>
> It's certainly more standard to map the mutexes() out of something like
> file backed mmap(), but that isn't as elegant for my application.
>
> To give a bit of background, what we have is some memory at a know
> address which controls a peripheral. Part of the peripheral addresses
> are buffers and registers, but some is set aside for use by software.
> Being able to map this whole section into userspace makes for really
> fast fast-paths where the driver is basically outside the kernel. Also
> keeping driver state variables in the peripheral RAM means things can be
> restarted if the Linux processes crash or die, or even if the whole of
> Linux were to be restarted for some reason. So it's more than
> convenient to have the mutexes work there, though as an easier goal I'll
> probably just get the futex calls to work for my applications.
So, what happens if Linux crashes having taken a pthread mutex in this
region, and your application then comes along and tries to regain that
lock?
Do you always re-initialise the mutexes (you technically have to if you
read the man page for pthread_mutex_lock() which tells you about a whole
raft of undefined behaviours concerning the pthread mutex implementation.)
^ permalink raw reply [flat|nested] 7+ messages in thread
* mm: get_user_pages_fast()
2013-11-06 19:57 ` Russell King - ARM Linux
@ 2013-11-06 22:31 ` Michael McTernan
0 siblings, 0 replies; 7+ messages in thread
From: Michael McTernan @ 2013-11-06 22:31 UTC (permalink / raw)
To: linux-arm-kernel
On 11/06/2013 07:57 PM, Russell King - ARM Linux wrote:
> Please expose the memory in a saner way rather than re-using /dev/mem.
> /dev/mem is all well and good, but it has side effects with the way it
> maps memory. Having your own driver gives you control over that. It's
> also better practice because it avoids a whole load of security issues.
Sure - that's where I'm heading now. The security isn't an issue for
this application, though I accept it's good practice.
>>> Placing futexes in IO memory is not supported, period.
>>
>> Except the _PRIVATE futex calls do work fine in IO memory since they
>> don't walk the process page tables to determine the 'key' from the
>> address.
...
> Sorry, I should have said mutexes. The point here is that userspace
> pthreads can ldrex/strex on that memory before issuing any futex calls.
Mutexes do access the memory, yes.
There is still inconsistency in futex behaviour, with _PRIVATE calls
working for IO memory, non-private not. This isn't ARM specific though.
>>> You can't map random bits of /dev/mem and expect
>>> this stuff to work.
>>
>> Nothing I am mapping is random.
>
> As far as I'm concerned, it's random areas of system memory - random in
> the sense that it's not well defined what it is and what attributes it
> should have in the page tables for things like memory type etc. I
> somehow doubt that you've thought about this at all, you just decided
> that opening /dev/mem is a solution to your problem.
I didn't want to bore the list with gory details, but I didn't earlier
state that I added a /dev/shmem and defined memory regions and types for
the areas, with the required attributes for sharability and caching - so
it isn't really random, though I see how it sounds like that from my
original post. /dev/shmem is a close cousin of /dev/mem, and has picked
up the VM_PFNMAP and VM_IO flags which cause me inconvenience.
>>> (b) means that the load/store exclusives, which userspace mutexes will
>>> use on ARMv6+, will not work correctly.
>>> (they're not supported to strongly ordered memory by the architecture.)
>>
>> I don't see this stated in the ARM Architecture Reference Manual.
>> Instead I see it saying that it is implementation dependent whether
>> LDREX and STREX work on strongly ordered memory (section A3.4.5).
>
> Here's the latest wording:
>
> It is IMPLEMENTATION DEFINED whether LDREX and STREX operations can be
> performed to a memory region with the Device or Strongly-ordered memory
> attribute. Unless the implementation documentation explicitly states
> that LDREX and STREX operations to a memory region with the Device or
> Strongly-ordered attribute are permitted, the effect of such operations
> is UNPREDICTABLE.
>
> In other words, it is only supported where it is explicitly documented
> that it works on these memory types; otherwise it must be assumed that
> they do not work...
What you've quoted is not the same as your first statement that "they're
not supported to strongly ordered memory by the architecture." It's
implementation defined, which means it is a possibility.
> So, what happens if Linux crashes having taken a pthread mutex in this
> region, and your application then comes along and tries to regain that
> lock?
Well there's a whole restart procedure needed to pick up state from the
memory area and get back to sensible operation. Re-initialising
pthread_mutexes and some other counters and indices is all part of that
- for a simple enough FIFO type interface it's not that difficult to
cover potentially corrupt state variables, though getting good testing
coverage is hard. Fuzzing the restart state helps a lot, though the key
is to keep the state space as small as possible.
Kind Regards,
Mike
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2013-11-06 22:31 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-05 22:43 mm: get_user_pages_fast() Michael McTernan
2013-11-06 10:19 ` Will Deacon
2013-11-06 10:54 ` Russell King - ARM Linux
2013-11-06 11:07 ` Will Deacon
2013-11-06 19:38 ` Michael McTernan
2013-11-06 19:57 ` Russell King - ARM Linux
2013-11-06 22:31 ` Michael McTernan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox