public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* futex() on vdso makes process unkillable
@ 2010-01-24  0:04 Mark Seaborn
  2010-01-25  3:37 ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: Mark Seaborn @ 2010-01-24  0:04 UTC (permalink / raw)
  To: linux-kernel

I was experimenting with futexes and was a little surprised to
discover that futex() works on read-only pages.  This creates quite a
high bandwidth side channel that allows two processes to communicate
if, for example, they share a library.  (Mind you, this is not much
different from file locks, which also work on read-only file
descriptors.)

I also found a couple of differences between 2.6.24 (from Ubuntu
hardy) and 2.6.31 (from Ubuntu karmic).  The first is a definite bug
in 2.6.31:


1) On 2.6.31 i686, using futex() on the vdso causes the process to get
stuck, consuming CPU in an unkillable state.  Both FUTEX_WAIT and
FUTEX_WAKE cause the problem.  The problem doesn't occur on 2.6.24.
(BTW, I was testing to see whether futex() on the vdso allows any two
processes to communicate.  This appears not to be the case on 2.6.24.)

A test program is below.


2) Suppose a file is mapped into two processes with MAP_PRIVATE.  Can
the resulting mappings be used to communicate via futex()?  i.e. Does
futex() consider the mappings to be the same?

On 2.6.24, the futex wakeup is not transferred; pages must be mapped
with MAP_SHARED for futex to work.  On 2.6.31, the futex wakeup *is*
transferred; futex works with either MAP_SHARED or MAP_PRIVATE.

2.6.24's behaviour seems more correct, because the mappings are
logically different, even if the underlying memory pages are the same
before copy-on-write is triggered.  Is 2.6.31's behaviour a
regression, or is the kernel's behaviour here supposed to be
undefined?

Cheers,
Mark


/* Test futex() on the vdso, which the kernel maps on process startup. */

#include <stdio.h>
#include <stdlib.h>

#include <elf.h>
#include <linux/futex.h>
#include <sys/syscall.h>
#include <unistd.h>


#if __WORDSIZE == 32
#  define Elf(name) Elf32_##name
#elif __WORDSIZE == 64
#  define Elf(name) Elf64_##name
#endif

void *find_vdso(char **argv)
{
  /* Find auxv. */
  char **p = argv;
  /* Skip past argv. */
  while(*p)
    p++;
  p++;
  /* Skip past env. */
  while(*p)
    p++;
  p++;
  Elf(auxv_t) *auxv = (void *) p;
  for(; auxv->a_type; auxv++)
    if(auxv->a_type == AT_SYSINFO_EHDR)
      return (void *) auxv->a_un.a_val;
  fprintf(stderr, "vdso not found\n");
  exit(1);
}

int main(int argc, char **argv)
{
  int *vdso = find_vdso(argv);
  fprintf(stderr, "vdso found at %p\n", vdso);
  if(syscall(__NR_futex, vdso, FUTEX_WAKE, 1) < 0)
    perror("futex/WAKE");
  if(syscall(__NR_futex, vdso, FUTEX_WAIT, *vdso, NULL) < 0)
    perror("futex/WAIT");
  return 0;
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-24  0:04 futex() on vdso makes process unkillable Mark Seaborn
@ 2010-01-25  3:37 ` KOSAKI Motohiro
  2010-01-25  7:27   ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2010-01-25  3:37 UTC (permalink / raw)
  To: Mark Seaborn
  Cc: kosaki.motohiro, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Darren Hart

CC to futex folks.

> I was experimenting with futexes and was a little surprised to
> discover that futex() works on read-only pages.  This creates quite a
> high bandwidth side channel that allows two processes to communicate
> if, for example, they share a library.  (Mind you, this is not much
> different from file locks, which also work on read-only file
> descriptors.)
> 
> I also found a couple of differences between 2.6.24 (from Ubuntu
> hardy) and 2.6.31 (from Ubuntu karmic).  The first is a definite bug
> in 2.6.31:
> 
> 
> 1) On 2.6.31 i686, using futex() on the vdso causes the process to get
> stuck, consuming CPU in an unkillable state.  Both FUTEX_WAIT and
> FUTEX_WAKE cause the problem.  The problem doesn't occur on 2.6.24.
> (BTW, I was testing to see whether futex() on the vdso allows any two
> processes to communicate.  This appears not to be the case on 2.6.24.)
> 
> A test program is below.
> 
> 
> 2) Suppose a file is mapped into two processes with MAP_PRIVATE.  Can
> the resulting mappings be used to communicate via futex()?  i.e. Does
> futex() consider the mappings to be the same?
> 
> On 2.6.24, the futex wakeup is not transferred; pages must be mapped
> with MAP_SHARED for futex to work.  On 2.6.31, the futex wakeup *is*
> transferred; futex works with either MAP_SHARED or MAP_PRIVATE.
> 
> 2.6.24's behaviour seems more correct, because the mappings are
> logically different, even if the underlying memory pages are the same
> before copy-on-write is triggered.  Is 2.6.31's behaviour a
> regression, or is the kernel's behaviour here supposed to be
> undefined?
> 
> Cheers,
> Mark
> 
> 
> /* Test futex() on the vdso, which the kernel maps on process startup. */
> 
> #include <stdio.h>
> #include <stdlib.h>
> 
> #include <elf.h>
> #include <linux/futex.h>
> #include <sys/syscall.h>
> #include <unistd.h>
> 
> 
> #if __WORDSIZE == 32
> #  define Elf(name) Elf32_##name
> #elif __WORDSIZE == 64
> #  define Elf(name) Elf64_##name
> #endif
> 
> void *find_vdso(char **argv)
> {
>   /* Find auxv. */
>   char **p = argv;
>   /* Skip past argv. */
>   while(*p)
>     p++;
>   p++;
>   /* Skip past env. */
>   while(*p)
>     p++;
>   p++;
>   Elf(auxv_t) *auxv = (void *) p;
>   for(; auxv->a_type; auxv++)
>     if(auxv->a_type == AT_SYSINFO_EHDR)
>       return (void *) auxv->a_un.a_val;
>   fprintf(stderr, "vdso not found\n");
>   exit(1);
> }
> 
> int main(int argc, char **argv)
> {
>   int *vdso = find_vdso(argv);
>   fprintf(stderr, "vdso found at %p\n", vdso);
>   if(syscall(__NR_futex, vdso, FUTEX_WAKE, 1) < 0)
>     perror("futex/WAKE");
>   if(syscall(__NR_futex, vdso, FUTEX_WAIT, *vdso, NULL) < 0)
>     perror("futex/WAIT");
>   return 0;
> }

This test with function tracer output following.

           a.out-11459 [000] 242281.165505: get_user_pages_fast <-get_futex_key
           a.out-11459 [000] 242281.165505: gup_pud_range <-get_user_pages_fast
           a.out-11459 [000] 242281.165506: gup_pte_range <-gup_pud_range
           a.out-11459 [000] 242281.165506: __might_sleep <-get_futex_key
           a.out-11459 [000] 242281.165507: unlock_page <-get_futex_key
           a.out-11459 [000] 242281.165507: page_waitqueue <-unlock_page
           a.out-11459 [000] 242281.165508: __wake_up_bit <-unlock_page
           a.out-11459 [000] 242281.165508: put_page <-get_futex_key
           a.out-11459 [000] 242281.165508: get_user_pages_fast <-get_futex_key
           a.out-11459 [000] 242281.165509: gup_pud_range <-get_user_pages_fast
           a.out-11459 [000] 242281.165509: gup_pte_range <-gup_pud_range
           a.out-11459 [000] 242281.165510: __might_sleep <-get_futex_key
           a.out-11459 [000] 242281.165511: unlock_page <-get_futex_key
           a.out-11459 [000] 242281.165511: page_waitqueue <-unlock_page
           a.out-11459 [000] 242281.165512: __wake_up_bit <-unlock_page
           a.out-11459 [000] 242281.165512: put_page <-get_futex_key
           a.out-11459 [000] 242281.165513: get_user_pages_fast <-get_futex_key
           a.out-11459 [000] 242281.165513: gup_pud_range <-get_user_pages_fast
           a.out-11459 [000] 242281.165514: gup_pte_range <-gup_pud_range
           a.out-11459 [000] 242281.165515: __might_sleep <-get_futex_key
           a.out-11459 [000] 242281.165515: unlock_page <-get_futex_key
           a.out-11459 [000] 242281.165516: page_waitqueue <-unlock_page
           a.out-11459 [000] 242281.165516: __wake_up_bit <-unlock_page
           a.out-11459 [000] 242281.165517: put_page <-get_futex_key

It mean the following code of get_futex_key() makes infinite loop.


	again:
	        err = get_user_pages_fast(address, 1, 1, &page);
	        if (err < 0)
	                return err;

	        page = compound_head(page);
	        lock_page(page);
	        if (!page->mapping) {
	                unlock_page(page);
	                put_page(page);
	                goto again;
	        }





^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-25  3:37 ` KOSAKI Motohiro
@ 2010-01-25  7:27   ` KOSAKI Motohiro
  2010-01-25  9:26     ` Peter Zijlstra
  0 siblings, 1 reply; 9+ messages in thread
From: KOSAKI Motohiro @ 2010-01-25  7:27 UTC (permalink / raw)
  To: Mark Seaborn
  Cc: kosaki.motohiro, linux-kernel, Peter Zijlstra, Ingo Molnar,
	Thomas Gleixner, Darren Hart

Hi

> CC to futex folks.
> 
> > I was experimenting with futexes and was a little surprised to
> > discover that futex() works on read-only pages.  This creates quite a
> > high bandwidth side channel that allows two processes to communicate
> > if, for example, they share a library.  (Mind you, this is not much
> > different from file locks, which also work on read-only file
> > descriptors.)
> > 
> > I also found a couple of differences between 2.6.24 (from Ubuntu
> > hardy) and 2.6.31 (from Ubuntu karmic).  The first is a definite bug
> > in 2.6.31:
> > 
> > 1) On 2.6.31 i686, using futex() on the vdso causes the process to get
> > stuck, consuming CPU in an unkillable state.  Both FUTEX_WAIT and
> > FUTEX_WAKE cause the problem.  The problem doesn't occur on 2.6.24.
> > (BTW, I was testing to see whether futex() on the vdso allows any two
> > processes to communicate.  This appears not to be the case on 2.6.24.)
> > 
> > A test program is below.
> > 
> > 
> > 2) Suppose a file is mapped into two processes with MAP_PRIVATE.  Can
> > the resulting mappings be used to communicate via futex()?  i.e. Does
> > futex() consider the mappings to be the same?
> > 
> > On 2.6.24, the futex wakeup is not transferred; pages must be mapped
> > with MAP_SHARED for futex to work.  On 2.6.31, the futex wakeup *is*
> > transferred; futex works with either MAP_SHARED or MAP_PRIVATE.
> > 
> > 2.6.24's behaviour seems more correct, because the mappings are
> > logically different, even if the underlying memory pages are the same
> > before copy-on-write is triggered.  Is 2.6.31's behaviour a
> > regression, or is the kernel's behaviour here supposed to be
> > undefined?

Futex should work both file anon anon. however I personally think 
vdso is not file nor anon. it is special mappings. nobody defined
futex spec on special mappings. (yes, undefined).

Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
futexing againt kernel address. but I guess another person have another thinking.

I'd like to hear futex folks's opinion.


Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-25  7:27   ` KOSAKI Motohiro
@ 2010-01-25  9:26     ` Peter Zijlstra
  2010-01-25 17:37       ` Darren Hart
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2010-01-25  9:26 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Mark Seaborn, linux-kernel, Ingo Molnar, Thomas Gleixner,
	Darren Hart, hugh.dickins

On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote:
> Hi
> 
> > CC to futex folks.
> > 
> > > I was experimenting with futexes and was a little surprised to
> > > discover that futex() works on read-only pages.  This creates quite a
> > > high bandwidth side channel that allows two processes to communicate
> > > if, for example, they share a library.  (Mind you, this is not much
> > > different from file locks, which also work on read-only file
> > > descriptors.)
> > > 
> > > I also found a couple of differences between 2.6.24 (from Ubuntu
> > > hardy) and 2.6.31 (from Ubuntu karmic).  The first is a definite bug
> > > in 2.6.31:
> > > 
> > > 1) On 2.6.31 i686, using futex() on the vdso causes the process to get
> > > stuck, consuming CPU in an unkillable state.  Both FUTEX_WAIT and
> > > FUTEX_WAKE cause the problem.  The problem doesn't occur on 2.6.24.
> > > (BTW, I was testing to see whether futex() on the vdso allows any two
> > > processes to communicate.  This appears not to be the case on 2.6.24.)
> > > 
> > > A test program is below.
> > > 
> > > 
> > > 2) Suppose a file is mapped into two processes with MAP_PRIVATE.  Can
> > > the resulting mappings be used to communicate via futex()?  i.e. Does
> > > futex() consider the mappings to be the same?
> > > 
> > > On 2.6.24, the futex wakeup is not transferred; pages must be mapped
> > > with MAP_SHARED for futex to work.  On 2.6.31, the futex wakeup *is*
> > > transferred; futex works with either MAP_SHARED or MAP_PRIVATE.
> > > 
> > > 2.6.24's behaviour seems more correct, because the mappings are
> > > logically different, even if the underlying memory pages are the same
> > > before copy-on-write is triggered.  Is 2.6.31's behaviour a
> > > regression, or is the kernel's behaviour here supposed to be
> > > undefined?
> 
> Futex should work both file anon anon. however I personally think 
> vdso is not file nor anon. it is special mappings. nobody defined
> futex spec on special mappings. (yes, undefined).
> 
> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
> futexing againt kernel address. but I guess another person have another thinking.
> 
> I'd like to hear futex folks's opinion.

Well, my opinion is we should remove the vdso, its ugly as hell :-)

But I think it would make most sense to extend its definition in the
direction of it being a file (for all intents and purposes its a special
DSO -- which unfortunately isn't present in any filesystem).

[ For all intents and purposes processes can already communicate through
futexes on the libc space, so being able to do so through the vsdo
really doesn't add anything ]

So the problem is that the VDSO pages do not have a page->mapping
because they lack the actual filesystem part of files, so even if (with
the recent zero-page patch from Kosaki-san) you make private COWs of the
VDSO, you'll get stuck in that loop.

So the prettiest solution is to simply place the vdso in an actual
filesystem and slowly migrate towards letting userspace map it as a
regular DSO -- /sys/lib{32,64}/libkernel.so like.

[ that has the bonus of getting rid of install_special_mapping() ]

The ugly solution is special casing the vdso in get_futex_key().


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-25  9:26     ` Peter Zijlstra
@ 2010-01-25 17:37       ` Darren Hart
  2010-01-26  2:41         ` KOSAKI Motohiro
  0 siblings, 1 reply; 9+ messages in thread
From: Darren Hart @ 2010-01-25 17:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: KOSAKI Motohiro, Mark Seaborn, linux-kernel, Ingo Molnar,
	Thomas Gleixner, hugh.dickins

Peter Zijlstra wrote:
> On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote:
<snip>
>> Futex should work both file anon anon. however I personally think 
>> vdso is not file nor anon. it is special mappings. nobody defined
>> futex spec on special mappings. (yes, undefined).
>>
>> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
>> futexing againt kernel address. but I guess another person have another thinking.
>>
>> I'd like to hear futex folks's opinion.
> 
> Well, my opinion is we should remove the vdso, its ugly as hell :-)
> 
> But I think it would make most sense to extend its definition in the
> direction of it being a file (for all intents and purposes its a special
> DSO -- which unfortunately isn't present in any filesystem).
> 
> [ For all intents and purposes processes can already communicate through
> futexes on the libc space, so being able to do so through the vsdo
> really doesn't add anything ]
> 
> So the problem is that the VDSO pages do not have a page->mapping
> because they lack the actual filesystem part of files, so even if (with
> the recent zero-page patch from Kosaki-san) you make private COWs of the
> VDSO, you'll get stuck in that loop.
> 
> So the prettiest solution is to simply place the vdso in an actual
> filesystem and slowly migrate towards letting userspace map it as a
> regular DSO -- /sys/lib{32,64}/libkernel.so like.
> 
> [ that has the bonus of getting rid of install_special_mapping() ]
> 
> The ugly solution is special casing the vdso in get_futex_key().

I like the creating-a-real-file solution. However, for now (and for 
stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good 
stop-gap. EINVAL might play the best with existing glibc implementations.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-25 17:37       ` Darren Hart
@ 2010-01-26  2:41         ` KOSAKI Motohiro
  2010-01-26  7:52           ` Peter Zijlstra
                             ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: KOSAKI Motohiro @ 2010-01-26  2:41 UTC (permalink / raw)
  To: Darren Hart
  Cc: kosaki.motohiro, Peter Zijlstra, Mark Seaborn, linux-kernel,
	Ingo Molnar, Thomas Gleixner, hugh.dickins

> Peter Zijlstra wrote:
> > On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote:
> <snip>
> >> Futex should work both file anon anon. however I personally think 
> >> vdso is not file nor anon. it is special mappings. nobody defined
> >> futex spec on special mappings. (yes, undefined).
> >>
> >> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
> >> futexing againt kernel address. but I guess another person have another thinking.
> >>
> >> I'd like to hear futex folks's opinion.
> > 
> > Well, my opinion is we should remove the vdso, its ugly as hell :-)
> > 
> > But I think it would make most sense to extend its definition in the
> > direction of it being a file (for all intents and purposes its a special
> > DSO -- which unfortunately isn't present in any filesystem).
> > 
> > [ For all intents and purposes processes can already communicate through
> > futexes on the libc space, so being able to do so through the vsdo
> > really doesn't add anything ]
> > 
> > So the problem is that the VDSO pages do not have a page->mapping
> > because they lack the actual filesystem part of files, so even if (with
> > the recent zero-page patch from Kosaki-san) you make private COWs of the
> > VDSO, you'll get stuck in that loop.
> > 
> > So the prettiest solution is to simply place the vdso in an actual
> > filesystem and slowly migrate towards letting userspace map it as a
> > regular DSO -- /sys/lib{32,64}/libkernel.so like.
> > 
> > [ that has the bonus of getting rid of install_special_mapping() ]
> > 
> > The ugly solution is special casing the vdso in get_futex_key().
> 
> I like the creating-a-real-file solution. However, for now (and for 
> stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good 
> stop-gap. EINVAL might play the best with existing glibc implementations.

May I confirm your mention?

If we can accept EFAULT, we don't need any change. my previous futex patch
already did. because 1) VDSO is alwasys read-only mapped 2) write mode
get_user_pages_fast() against read-only pte/vma return EFAULT.

Current linus and stable tree don't cause Mark's original problem. instead, just
return EFAULT. (Well, I'm sorry. my previous mail was unclear. I wrote v2.6.31 test
result)

If you can't accept EFAULT, we need to add vdso specific logic into get_futex_key().
Is this your intention?




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-26  2:41         ` KOSAKI Motohiro
@ 2010-01-26  7:52           ` Peter Zijlstra
  2010-01-26  8:33           ` Thomas Gleixner
  2010-01-26 14:21           ` Darren Hart
  2 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2010-01-26  7:52 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Darren Hart, Mark Seaborn, linux-kernel, Ingo Molnar,
	Thomas Gleixner, hugh.dickins

On Tue, 2010-01-26 at 11:41 +0900, KOSAKI Motohiro wrote:
> > Peter Zijlstra wrote:
> > > On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote:
> > <snip>
> > >> Futex should work both file anon anon. however I personally think 
> > >> vdso is not file nor anon. it is special mappings. nobody defined
> > >> futex spec on special mappings. (yes, undefined).
> > >>
> > >> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
> > >> futexing againt kernel address. but I guess another person have another thinking.
> > >>
> > >> I'd like to hear futex folks's opinion.
> > > 
> > > Well, my opinion is we should remove the vdso, its ugly as hell :-)
> > > 
> > > But I think it would make most sense to extend its definition in the
> > > direction of it being a file (for all intents and purposes its a special
> > > DSO -- which unfortunately isn't present in any filesystem).
> > > 
> > > [ For all intents and purposes processes can already communicate through
> > > futexes on the libc space, so being able to do so through the vsdo
> > > really doesn't add anything ]
> > > 
> > > So the problem is that the VDSO pages do not have a page->mapping
> > > because they lack the actual filesystem part of files, so even if (with
> > > the recent zero-page patch from Kosaki-san) you make private COWs of the
> > > VDSO, you'll get stuck in that loop.
> > > 
> > > So the prettiest solution is to simply place the vdso in an actual
> > > filesystem and slowly migrate towards letting userspace map it as a
> > > regular DSO -- /sys/lib{32,64}/libkernel.so like.
> > > 
> > > [ that has the bonus of getting rid of install_special_mapping() ]
> > > 
> > > The ugly solution is special casing the vdso in get_futex_key().
> > 
> > I like the creating-a-real-file solution. However, for now (and for 
> > stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good 
> > stop-gap. EINVAL might play the best with existing glibc implementations.
> 
> May I confirm your mention?
> 
> If we can accept EFAULT, we don't need any change. my previous futex patch
> already did. because 1) VDSO is alwasys read-only mapped 2) write mode
> get_user_pages_fast() against read-only pte/vma return EFAULT.
> 
> Current linus and stable tree don't cause Mark's original problem. instead, just
> return EFAULT. (Well, I'm sorry. my previous mail was unclear. I wrote v2.6.31 test
> result)
> 
> If you can't accept EFAULT, we need to add vdso specific logic into get_futex_key().
> Is this your intention?

Oh, right you are, I mixed up the force and write arguments. Yes I tihnk
we're good.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-26  2:41         ` KOSAKI Motohiro
  2010-01-26  7:52           ` Peter Zijlstra
@ 2010-01-26  8:33           ` Thomas Gleixner
  2010-01-26 14:21           ` Darren Hart
  2 siblings, 0 replies; 9+ messages in thread
From: Thomas Gleixner @ 2010-01-26  8:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Darren Hart, Peter Zijlstra, Mark Seaborn, linux-kernel,
	Ingo Molnar, hugh.dickins

On Tue, 26 Jan 2010, KOSAKI Motohiro wrote:
> > Peter Zijlstra wrote:
> > I like the creating-a-real-file solution. However, for now (and for 
> > stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good 
> > stop-gap. EINVAL might play the best with existing glibc implementations.
> 
> May I confirm your mention?
> 
> If we can accept EFAULT, we don't need any change. my previous futex patch
> already did. because 1) VDSO is alwasys read-only mapped 2) write mode
> get_user_pages_fast() against read-only pte/vma return EFAULT.
> 
> Current linus and stable tree don't cause Mark's original problem. instead, just
> return EFAULT. (Well, I'm sorry. my previous mail was unclear. I wrote v2.6.31 test
> result)
> 
> If you can't accept EFAULT, we need to add vdso specific logic into get_futex_key().

EFAULT is perfectly fine. No need for any special tricks.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: futex() on vdso makes process unkillable
  2010-01-26  2:41         ` KOSAKI Motohiro
  2010-01-26  7:52           ` Peter Zijlstra
  2010-01-26  8:33           ` Thomas Gleixner
@ 2010-01-26 14:21           ` Darren Hart
  2 siblings, 0 replies; 9+ messages in thread
From: Darren Hart @ 2010-01-26 14:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, Mark Seaborn, linux-kernel, Ingo Molnar,
	Thomas Gleixner, hugh.dickins

KOSAKI Motohiro wrote:
>> Peter Zijlstra wrote:
>>> On Mon, 2010-01-25 at 16:27 +0900, KOSAKI Motohiro wrote:
>> <snip>
>>>> Futex should work both file anon anon. however I personally think 
>>>> vdso is not file nor anon. it is special mappings. nobody defined
>>>> futex spec on special mappings. (yes, undefined).
>>>>
>>>> Personally, I think EINVAL or EFAULT are best result of vdso futexing, like as
>>>> futexing againt kernel address. but I guess another person have another thinking.
>>>>
>>>> I'd like to hear futex folks's opinion.
>>> Well, my opinion is we should remove the vdso, its ugly as hell :-)
>>>
>>> But I think it would make most sense to extend its definition in the
>>> direction of it being a file (for all intents and purposes its a special
>>> DSO -- which unfortunately isn't present in any filesystem).
>>>
>>> [ For all intents and purposes processes can already communicate through
>>> futexes on the libc space, so being able to do so through the vsdo
>>> really doesn't add anything ]
>>>
>>> So the problem is that the VDSO pages do not have a page->mapping
>>> because they lack the actual filesystem part of files, so even if (with
>>> the recent zero-page patch from Kosaki-san) you make private COWs of the
>>> VDSO, you'll get stuck in that loop.
>>>
>>> So the prettiest solution is to simply place the vdso in an actual
>>> filesystem and slowly migrate towards letting userspace map it as a
>>> regular DSO -- /sys/lib{32,64}/libkernel.so like.
>>>
>>> [ that has the bonus of getting rid of install_special_mapping() ]
>>>
>>> The ugly solution is special casing the vdso in get_futex_key().
>> I like the creating-a-real-file solution. However, for now (and for 
>> stable), I think Kosaki's suggestion of EINVAL or EFAULT is a good 
>> stop-gap. EINVAL might play the best with existing glibc implementations.
> 
> May I confirm your mention?
> 
> If we can accept EFAULT, we don't need any change. my previous futex patch
> already did. because 1) VDSO is alwasys read-only mapped 2) write mode
> get_user_pages_fast() against read-only pte/vma return EFAULT.
> 
> Current linus and stable tree don't cause Mark's original problem. instead, just
> return EFAULT. (Well, I'm sorry. my previous mail was unclear. I wrote v2.6.31 test
> result)
> 
> If you can't accept EFAULT, we need to add vdso specific logic into get_futex_key().
> Is this your intention?

That was my intention, but after looking at the glibc source, I don't 
see any reason for EINVAL over EFAULT. I apparently mis-remembered 
something there. EFAULT is fine.

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2010-01-26 14:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-24  0:04 futex() on vdso makes process unkillable Mark Seaborn
2010-01-25  3:37 ` KOSAKI Motohiro
2010-01-25  7:27   ` KOSAKI Motohiro
2010-01-25  9:26     ` Peter Zijlstra
2010-01-25 17:37       ` Darren Hart
2010-01-26  2:41         ` KOSAKI Motohiro
2010-01-26  7:52           ` Peter Zijlstra
2010-01-26  8:33           ` Thomas Gleixner
2010-01-26 14:21           ` Darren Hart

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox