public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* Question about PRIVATE_FUTEX
@ 2009-03-27  2:12 Minchan Kim
  2009-03-27  4:32 ` Minchan Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Minchan Kim @ 2009-03-27  2:12 UTC (permalink / raw)
  To: Peter Zijlstra, Eric Dumazet; +Cc: lkml

Hi, Peter and Eric.

I am not expert about futex.
I am sorry if this is dumb question.

If we use private futex, get_futex_key don't call get_user_pages_fast
which pins page at page table.
Then, get_futex_value_locked calls __cpy_from_user_inatomic with
pagefault_disable.

Who make sure the user page is mapped at app's page table ?

-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  2:12 Question about PRIVATE_FUTEX Minchan Kim
@ 2009-03-27  4:32 ` Minchan Kim
  2009-03-27  4:56 ` Eric Dumazet
  2009-03-27  8:49 ` Peter Zijlstra
  2 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2009-03-27  4:32 UTC (permalink / raw)
  To: Peter Zijlstra, Eric Dumazet; +Cc: lkml

Hmm, It seems even shared futex, too.

After calling get_user_pages_fast, get_futex_key calls unlock, put_page, too.
Then futex_wait  calls get_futex_value_locked.
Wouldn't kernel reclaim the page between get_fuex_key and
get_futex_value_locked ?

How do we make sure this race condition ?


On Fri, Mar 27, 2009 at 11:12 AM, Minchan Kim <minchan.kim@gmail.com> wrote:
> Hi, Peter and Eric.
>
> I am not expert about futex.
> I am sorry if this is dumb question.
>
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.
> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
>
> Who make sure the user page is mapped at app's page table ?
>
> --
> Kinds regards,
> Minchan Kim
>



-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  2:12 Question about PRIVATE_FUTEX Minchan Kim
  2009-03-27  4:32 ` Minchan Kim
@ 2009-03-27  4:56 ` Eric Dumazet
  2009-03-27  5:20   ` Minchan Kim
  2009-03-27  8:49 ` Peter Zijlstra
  2 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2009-03-27  4:56 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Peter Zijlstra, lkml

Minchan Kim a écrit :
> Hi, Peter and Eric.
> 
> I am not expert about futex.
> I am sorry if this is dumb question.
> 
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.
> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
> 
> Who make sure the user page is mapped at app's page table ?
> 

Nothing makes sure user page is mapped, as we dont have to (for private futexes
at least, since the 'key' is a combination of the futex virtual address (not
depending on the underlying physical page) and the task mm (sort of a static
offset per task)

If no page is mapped, a normal error should be returned to user, since
access to futex location will trigger a fault.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  4:56 ` Eric Dumazet
@ 2009-03-27  5:20   ` Minchan Kim
  2009-03-27  5:50     ` Eric Dumazet
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2009-03-27  5:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Peter Zijlstra, lkml

Thanks for kind explanation.

On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <dada1@cosmosbay.com> wrote:
> Minchan Kim a écrit :
>> Hi, Peter and Eric.
>>
>> I am not expert about futex.
>> I am sorry if this is dumb question.
>>
>> If we use private futex, get_futex_key don't call get_user_pages_fast
>> which pins page at page table.
>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>> pagefault_disable.
>>
>> Who make sure the user page is mapped at app's page table ?
>>
>
> Nothing makes sure user page is mapped, as we dont have to (for private futexes
> at least, since the 'key' is a combination of the futex virtual address (not
> depending on the underlying physical page) and the task mm (sort of a static
> offset per task)
> If no page is mapped, a normal error should be returned to user, since
> access to futex location will trigger a fault.
>

I mean as follows.
It seems even shared futex case.

After calling get_user_pages_fast, get_futex_key calls unlock_page and
put_page, too.  Then futex_wait  calls get_futex_value_locked.

Generally, current page->count is one and nolocked.
I think kernel reclaimer can reclaim the page.

Wouldn't kernel reclaim the page between get_fuex_key and
get_futex_value_locked ?
If kernel reclaimed the page, __copy_from_user_inatomic can happens
page fault although pagefault_disable is on.

How do we make sure this race condition ?
Do I miss something ?

-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  5:20   ` Minchan Kim
@ 2009-03-27  5:50     ` Eric Dumazet
  2009-03-27  6:20       ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Eric Dumazet @ 2009-03-27  5:50 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Peter Zijlstra, lkml

Minchan Kim a écrit :
> Thanks for kind explanation.
> 
> On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <dada1@cosmosbay.com> wrote:
>> Minchan Kim a écrit :
>>> Hi, Peter and Eric.
>>>
>>> I am not expert about futex.
>>> I am sorry if this is dumb question.
>>>
>>> If we use private futex, get_futex_key don't call get_user_pages_fast
>>> which pins page at page table.
>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>> pagefault_disable.
>>>
>>> Who make sure the user page is mapped at app's page table ?
>>>
>> Nothing makes sure user page is mapped, as we dont have to (for private futexes
>> at least, since the 'key' is a combination of the futex virtual address (not
>> depending on the underlying physical page) and the task mm (sort of a static
>> offset per task)
>> If no page is mapped, a normal error should be returned to user, since
>> access to futex location will trigger a fault.
>>
> 
> I mean as follows.
> It seems even shared futex case.
> 
> After calling get_user_pages_fast, get_futex_key calls unlock_page and
> put_page, too.  Then futex_wait  calls get_futex_value_locked.
> 
> Generally, current page->count is one and nolocked.
> I think kernel reclaimer can reclaim the page.
> 
> Wouldn't kernel reclaim the page between get_fuex_key and
> get_futex_value_locked ?
> If kernel reclaimed the page, __copy_from_user_inatomic can happens
> page fault although pagefault_disable is on.
> 
> How do we make sure this race condition ?
> Do I miss something ?
> 

Hmmm, so your question is not about PRIVATE futexes, but shared ones.

I guess if page is no more present, its not a problem since
get_futex_value_locked() returns an error. We then take a slow
path, calling get_user() and retrying whole futex logic.

However, comment at line 1213 is misleading I guess, since
we dont hold mmap semaphore anymore ?

         * for shared futexes, we hold the mmap semaphore, so the mapping
         * cannot have changed since we looked it up in get_futex_key.
         */
        ret = get_futex_value_locked(&uval, uaddr);

So if page was un-mapped by another thread, and re-mapped to another physical
page, then this thread might sleep on 'kernel futex' not anymore reachable...

User error, as it is not supposed to happen in a sane program, undefined
result...




^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  5:50     ` Eric Dumazet
@ 2009-03-27  6:20       ` Minchan Kim
  0 siblings, 0 replies; 11+ messages in thread
From: Minchan Kim @ 2009-03-27  6:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Peter Zijlstra, lkml

On Fri, Mar 27, 2009 at 2:50 PM, Eric Dumazet <dada1@cosmosbay.com> wrote:
> Minchan Kim a écrit :
>> Thanks for kind explanation.
>>
>> On Fri, Mar 27, 2009 at 1:56 PM, Eric Dumazet <dada1@cosmosbay.com> wrote:
>>> Minchan Kim a écrit :
>>>> Hi, Peter and Eric.
>>>>
>>>> I am not expert about futex.
>>>> I am sorry if this is dumb question.
>>>>
>>>> If we use private futex, get_futex_key don't call get_user_pages_fast
>>>> which pins page at page table.
>>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>>> pagefault_disable.
>>>>
>>>> Who make sure the user page is mapped at app's page table ?
>>>>
>>> Nothing makes sure user page is mapped, as we dont have to (for private futexes
>>> at least, since the 'key' is a combination of the futex virtual address (not
>>> depending on the underlying physical page) and the task mm (sort of a static
>>> offset per task)
>>> If no page is mapped, a normal error should be returned to user, since
>>> access to futex location will trigger a fault.
>>>
>>
>> I mean as follows.
>> It seems even shared futex case.
>>
>> After calling get_user_pages_fast, get_futex_key calls unlock_page and
>> put_page, too.  Then futex_wait  calls get_futex_value_locked.
>>
>> Generally, current page->count is one and nolocked.
>> I think kernel reclaimer can reclaim the page.
>>
>> Wouldn't kernel reclaim the page between get_fuex_key and
>> get_futex_value_locked ?
>> If kernel reclaimed the page, __copy_from_user_inatomic can happens
>> page fault although pagefault_disable is on.
>>
>> How do we make sure this race condition ?
>> Do I miss something ?
>>
>
> Hmmm, so your question is not about PRIVATE futexes, but shared ones.
>
> I guess if page is no more present, its not a problem since
> get_futex_value_locked() returns an error. We then take a slow
> path, calling get_user() and retrying whole futex logic.

Indeed.
I misunderstood about __copy_from_user_inatomic.
It never sleep.

> However, comment at line 1213 is misleading I guess, since
> we dont hold mmap semaphore anymore ?
>
>         * for shared futexes, we hold the mmap semaphore, so the mapping
>         * cannot have changed since we looked it up in get_futex_key.
>         */
>        ret = get_futex_value_locked(&uval, uaddr);
>
> So if page was un-mapped by another thread, and re-mapped to another physical
> page, then this thread might sleep on 'kernel futex' not anymore reachable...
>
> User error, as it is not supposed to happen in a sane program, undefined
> result...

Yes. How about removing confusing comments ?

Thanks for great explanation :)

-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  2:12 Question about PRIVATE_FUTEX Minchan Kim
  2009-03-27  4:32 ` Minchan Kim
  2009-03-27  4:56 ` Eric Dumazet
@ 2009-03-27  8:49 ` Peter Zijlstra
  2009-03-27 10:56   ` Minchan Kim
  2 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-03-27  8:49 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Eric Dumazet, lkml, Darren Hart

On Fri, 2009-03-27 at 11:12 +0900, Minchan Kim wrote:
> Hi, Peter and Eric.
> 
> I am not expert about futex.
> I am sorry if this is dumb question.
> 
> If we use private futex, get_futex_key don't call get_user_pages_fast
> which pins page at page table.

But also drops that page ref at the end of get_futex_key(). The whole
and only purpose of using get_user_pages_fast() is to get at the mapping
data without having to obtain the mmap_sem.

> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> pagefault_disable.
> 
> Who make sure the user page is mapped at app's page table ?

Nobody, all uses of get_futex_value_locked() have to deal with it
returning -EFAULT.

Most of this is legacy btw, from when futex ops were done under the
mmap_sem. Back then we couldn't fault because that would cause mmap_sem
recursion. Howver, now that we don't hold mmap_sem anymore we could use
a faulting user access like get_user().

Darren has been working on patches to clean that up, some of those are
already merged in the -tip tree.

HTH

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27  8:49 ` Peter Zijlstra
@ 2009-03-27 10:56   ` Minchan Kim
  2009-03-27 11:14     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2009-03-27 10:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Eric Dumazet, lkml, Darren Hart

Hi, Perter.
Thanks for joining this thread.

My concern is page reclaimer can reclaim the user page which have
futex between get_fuex_key and get_futex_value_locked.

On Fri, Mar 27, 2009 at 5:49 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, 2009-03-27 at 11:12 +0900, Minchan Kim wrote:
>> Hi, Peter and Eric.
>>
>> I am not expert about futex.
>> I am sorry if this is dumb question.
>>
>> If we use private futex, get_futex_key don't call get_user_pages_fast
>> which pins page at page table.
>
> But also drops that page ref at the end of get_futex_key(). The whole
> and only purpose of using get_user_pages_fast() is to get at the mapping
> data without having to obtain the mmap_sem.

Thanks.
I understand. It's not only private futex.

>
>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>> pagefault_disable.
>>
>> Who make sure the user page is mapped at app's page table ?
>
> Nobody, all uses of get_futex_value_locked() have to deal with it
> returning -EFAULT.

Does It mean that __copy_from_user_inatomic in get_futex_value_locked
would be failed rather than sleep?
In fact, I don't make sure _copy_from_user_inatomic function's meaning.
As far as I understand, It never sleep. It just can be failed in case
of user page isn't mapped. Is right ?
Otherwise, it can be scheduled with pagefault_disable which increments
preempt_count. It is a atomic bug.
If my assume is right, it can be failed rather than sleep.
At this case, other architecture implements __copy_from_user_inatomic
with __copy_from_user which can be scheduled. It also can be bug.

Hmm, Now I am confusing.

> Most of this is legacy btw, from when futex ops were done under the
> mmap_sem. Back then we couldn't fault because that would cause mmap_sem
> recursion. Howver, now that we don't hold mmap_sem anymore we could use
> a faulting user access like get_user().
> Darren has been working on patches to clean that up, some of those are
> already merged in the -tip tree.

Thanks for good information.
It will be very desirable way to enhance kernel performance.

> HTH
>



-- 
Kinds regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27 10:56   ` Minchan Kim
@ 2009-03-27 11:14     ` Peter Zijlstra
  2009-03-27 11:37       ` Minchan Kim
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2009-03-27 11:14 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Eric Dumazet, lkml, Darren Hart

On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:

> >> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
> >> pagefault_disable.
> >>
> >> Who make sure the user page is mapped at app's page table ?
> >
> > Nobody, all uses of get_futex_value_locked() have to deal with it
> > returning -EFAULT.
> 
> Does It mean that __copy_from_user_inatomic in get_futex_value_locked
> would be failed rather than sleep?

Correct.

> In fact, I don't make sure _copy_from_user_inatomic function's meaning.
> As far as I understand, It never sleep. It just can be failed in case
> of user page isn't mapped. Is right ?

Correct.

> Otherwise, it can be scheduled with pagefault_disable which increments
> preempt_count. It is a atomic bug.
> If my assume is right, it can be failed rather than sleep.
> At this case, other architecture implements __copy_from_user_inatomic
> with __copy_from_user which can be scheduled. It also can be bug.
> 
> Hmm, Now I am confusing.

Confused I guess ;-)

The trick is in the in_atomic() check in the pagefault handler and the
fixup section of the copy routines.

#define __copy_user(to, from, size)                                     \
do {                                                                    \
        int __d0, __d1, __d2;                                           \
        __asm__ __volatile__(                                           \
                "       cmp  $7,%0\n"                                   \
                "       jbe  1f\n"                                      \
                "       movl %1,%0\n"                                   \
                "       negl %0\n"                                      \
                "       andl $7,%0\n"                                   \
                "       subl %0,%3\n"                                   \
                "4:     rep; movsb\n"                                   \
                "       movl %3,%0\n"                                   \
                "       shrl $2,%0\n"                                   \
                "       andl $3,%3\n"                                   \
                "       .align 2,0x90\n"                                \
                "0:     rep; movsl\n"                                   \
                "       movl %3,%0\n"                                   \
                "1:     rep; movsb\n"                                   \
                "2:\n"                                                  \
                ".section .fixup,\"ax\"\n"                              \
                "5:     addl %3,%0\n"                                   \
                "       jmp 2b\n"                                       \
                "3:     lea 0(%3,%0,4),%0\n"                            \
                "       jmp 2b\n"                                       \
                ".previous\n"                                           \
                ".section __ex_table,\"a\"\n"                           \
                "       .align 4\n"                                     \
                "       .long 4b,5b\n"                                  \
                "       .long 0b,3b\n"                                  \
                "       .long 1b,2b\n"                                  \
                ".previous"                                             \
                : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)   \
                : "3"(size), "0"(size), "1"(to), "2"(from)              \
                : "memory");                                            \
} while (0)

see that __ex_table section, it tells the fault handler where to
continue in case of an atomic fault.

> > Most of this is legacy btw, from when futex ops were done under the
> > mmap_sem. Back then we couldn't fault because that would cause mmap_sem
> > recursion. Howver, now that we don't hold mmap_sem anymore we could use
> > a faulting user access like get_user().
> > Darren has been working on patches to clean that up, some of those are
> > already merged in the -tip tree.
> 
> Thanks for good information.
> It will be very desirable way to enhance kernel performance.

I doubt it'll make a measurable difference, if you need to fault
performance sucks anyway. If you don't, the current code is just as
fast.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27 11:14     ` Peter Zijlstra
@ 2009-03-27 11:37       ` Minchan Kim
  2009-03-27 15:43         ` Darren Hart
  0 siblings, 1 reply; 11+ messages in thread
From: Minchan Kim @ 2009-03-27 11:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Eric Dumazet, lkml, Darren Hart

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 5651 bytes --]

On Fri, Mar 27, 2009 at 8:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:> On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:>>> >> Then, get_futex_value_locked calls __cpy_from_user_inatomic with>> >> pagefault_disable.>> >>>> >> Who make sure the user page is mapped at app's page table ?>> >>> > Nobody, all uses of get_futex_value_locked() have to deal with it>> > returning -EFAULT.>>>> Does It mean that __copy_from_user_inatomic in get_futex_value_locked>> would be failed rather than sleep?>> Correct.>>> In fact, I don't make sure _copy_from_user_inatomic function's meaning.>> As far as I understand, It never sleep. It just can be failed in case>> of user page isn't mapped. Is right ?>> Correct.>>> Otherwise, it can be scheduled with pagefault_disable which increments>> preempt_count. It is a atomic bug.>> If my assume is right, it can be failed rather than sleep.>> At this case, other architecture implements __copy_from_user_inatomic>> with __copy_from_user which can be scheduled. It also can be bug.>>>> Hmm, Now I am confusing.>> Confused I guess ;-)> The trick is in the in_atomic() check in the pagefault handler and the> fixup section of the copy routines.
Whew~,  There was good hidden trick.I will dive into this assembly.I always thanks for your kindness. :)
> #define __copy_user(to, from, size)                                     \> do {                                                                    \>        int __d0, __d1, __d2;                                           \>        __asm__ __volatile__(                                           \>                "       cmp  $7,%0\n"                                   \>                "       jbe  1f\n"                                      \>                "       movl %1,%0\n"                                   \>                "       negl %0\n"                                      \>                "       andl $7,%0\n"                                   \>                "       subl %0,%3\n"                                   \>                "4:     rep; movsb\n"                                   \>                "       movl %3,%0\n"                                   \>                "       shrl $2,%0\n"                                   \>                "       andl $3,%3\n"                                   \>                "       .align 2,0x90\n"                                \>                "0:     rep; movsl\n"                                   \>                "       movl %3,%0\n"                                   \>                "1:     rep; movsb\n"                                   \>                "2:\n"                                                  \>                ".section .fixup,\"ax\"\n"                              \>                "5:     addl %3,%0\n"                                   \>                "       jmp 2b\n"                                       \>                "3:     lea 0(%3,%0,4),%0\n"                            \>                "       jmp 2b\n"                                       \>                ".previous\n"                                           \>                ".section __ex_table,\"a\"\n"                           \>                "       .align 4\n"                                     \>                "       .long 4b,5b\n"                                  \>                "       .long 0b,3b\n"                                  \>                "       .long 1b,2b\n"                                  \>                ".previous"                                             \>                : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)   \>                : "3"(size), "0"(size), "1"(to), "2"(from)              \>                : "memory");                                            \> } while (0)>> see that __ex_table section, it tells the fault handler where to> continue in case of an atomic fault.>>> > Most of this is legacy btw, from when futex ops were done under the>> > mmap_sem. Back then we couldn't fault because that would cause mmap_sem>> > recursion. Howver, now that we don't hold mmap_sem anymore we could use>> > a faulting user access like get_user().>> > Darren has been working on patches to clean that up, some of those are>> > already merged in the -tip tree.>>>> Thanks for good information.>> It will be very desirable way to enhance kernel performance.>> I doubt it'll make a measurable difference, if you need to fault> performance sucks anyway. If you don't, the current code is just as> fast.>


-- Kinds regards,Minchan Kimÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question about PRIVATE_FUTEX
  2009-03-27 11:37       ` Minchan Kim
@ 2009-03-27 15:43         ` Darren Hart
  0 siblings, 0 replies; 11+ messages in thread
From: Darren Hart @ 2009-03-27 15:43 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Peter Zijlstra, Eric Dumazet, lkml

Minchan Kim wrote:
> On Fri, Mar 27, 2009 at 8:14 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Fri, 2009-03-27 at 19:56 +0900, Minchan Kim wrote:
>>
>>>>> Then, get_futex_value_locked calls __cpy_from_user_inatomic with
>>>>> pagefault_disable.
>>>>>
>>>>> Who make sure the user page is mapped at app's page table ?
>>>> Nobody, all uses of get_futex_value_locked() have to deal with it
>>>> returning -EFAULT.
>>> Does It mean that __copy_from_user_inatomic in get_futex_value_locked
>>> would be failed rather than sleep?
>> Correct.
>>
>>> In fact, I don't make sure _copy_from_user_inatomic function's meaning.
>>> As far as I understand, It never sleep. It just can be failed in case
>>> of user page isn't mapped. Is right ?
>> Correct.
>>
>>> Otherwise, it can be scheduled with pagefault_disable which increments
>>> preempt_count. It is a atomic bug.
>>> If my assume is right, it can be failed rather than sleep.
>>> At this case, other architecture implements __copy_from_user_inatomic
>>> with __copy_from_user which can be scheduled. It also can be bug.
>>>
>>> Hmm, Now I am confusing.
>> Confused I guess ;-)
>> The trick is in the in_atomic() check in the pagefault handler and the
>> fixup section of the copy routines.
> 
> Whew~,  There was good hidden trick.
> I will dive into this assembly.
> I always thanks for your kindness. :)
> 
>> #define __copy_user(to, from, size)                                     \
>> do {                                                                    \
>>        int __d0, __d1, __d2;                                           \
>>        __asm__ __volatile__(                                           \
>>                "       cmp  $7,%0\n"                                   \
>>                "       jbe  1f\n"                                      \
>>                "       movl %1,%0\n"                                   \
>>                "       negl %0\n"                                      \
>>                "       andl $7,%0\n"                                   \
>>                "       subl %0,%3\n"                                   \
>>                "4:     rep; movsb\n"                                   \
>>                "       movl %3,%0\n"                                   \
>>                "       shrl $2,%0\n"                                   \
>>                "       andl $3,%3\n"                                   \
>>                "       .align 2,0x90\n"                                \
>>                "0:     rep; movsl\n"                                   \
>>                "       movl %3,%0\n"                                   \
>>                "1:     rep; movsb\n"                                   \
>>                "2:\n"                                                  \
>>                ".section .fixup,\"ax\"\n"                              \
>>                "5:     addl %3,%0\n"                                   \
>>                "       jmp 2b\n"                                       \
>>                "3:     lea 0(%3,%0,4),%0\n"                            \
>>                "       jmp 2b\n"                                       \
>>                ".previous\n"                                           \
>>                ".section __ex_table,\"a\"\n"                           \
>>                "       .align 4\n"                                     \
>>                "       .long 4b,5b\n"                                  \
>>                "       .long 0b,3b\n"                                  \
>>                "       .long 1b,2b\n"                                  \
>>                ".previous"                                             \
>>                : "=&c"(size), "=&D" (__d0), "=&S" (__d1), "=r"(__d2)   \
>>                : "3"(size), "0"(size), "1"(to), "2"(from)              \
>>                : "memory");                                            \
>> } while (0)
>>
>> see that __ex_table section, it tells the fault handler where to
>> continue in case of an atomic fault.
>>
>>>> Most of this is legacy btw, from when futex ops were done under the
>>>> mmap_sem. Back then we couldn't fault because that would cause mmap_sem
>>>> recursion. Howver, now that we don't hold mmap_sem anymore we could use
>>>> a faulting user access like get_user().
>>>> Darren has been working on patches to clean that up, some of those are
>>>> already merged in the -tip tree.

I'm a little late to the party I guess.  Minchan, a lot of the fault 
logic has been cleaned up in the tip tree, core/futexes branch.  The 
removes a lot of the legacy complication from the faulting paths. 
However, the get_futex_key code remains the same if I remember correctly.

>>> Thanks for good information.
>>> It will be very desirable way to enhance kernel performance.
>> I doubt it'll make a measurable difference, if you need to fault
>> performance sucks anyway. If you don't, the current code is just as
>> fast.
>>

Agreed.  If you are suffering performance hits from excessive paging, 
consider locking your memory.


-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2009-03-27 15:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-27  2:12 Question about PRIVATE_FUTEX Minchan Kim
2009-03-27  4:32 ` Minchan Kim
2009-03-27  4:56 ` Eric Dumazet
2009-03-27  5:20   ` Minchan Kim
2009-03-27  5:50     ` Eric Dumazet
2009-03-27  6:20       ` Minchan Kim
2009-03-27  8:49 ` Peter Zijlstra
2009-03-27 10:56   ` Minchan Kim
2009-03-27 11:14     ` Peter Zijlstra
2009-03-27 11:37       ` Minchan Kim
2009-03-27 15:43         ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox