All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Jelinek <jakub@redhat.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
	mingo@elte.hu, Andrew Morton <akpm@osdl.org>,
	linux-kernel@vger.kernel.org, rusty@rustcorp.com.au, ahu@ds9a.nl,
	Ulrich Drepper <drepper@redhat.com>,
	Roland McGrath <roland@redhat.com>,
	Scott Snyder <snyder@fnal.gov>
Subject: Re: Futex queue_me/get_user ordering
Date: Thu, 17 Mar 2005 10:55:39 -0500	[thread overview]
Message-ID: <20050317155539.GF853@devserv.devel.redhat.com> (raw)
In-Reply-To: <20050317152031.GB16743@mail.shareable.org>

On Thu, Mar 17, 2005 at 03:20:31PM +0000, Jamie Lokier wrote:
> If you change futex_wait to be "atomic", and then have userspace locks
> which _depend_ on that atomicity, it becomes impossible to wait on
> multiple of those locks, or make poll-driven state machines which can
> wait on those locks.

The futex man pages that have been around for years (certainly since mid 2002)
certainly don't document FUTEX_WAIT as token passing operation, but as atomic
operation:

Say http://www.icewalkers.com/Linux/ManPages/futex-2.html

FUTEX_WAIT
This operation atomically verifies that  the  futex
address  still contains the value given, and sleeps
awaiting FUTEX_WAKE on this futex address.  If  the
timeout argument is non-NULL, its contents describe
the maximum duration of the wait, which is infinite
otherwise.   For futex(4), this call is executed if
decrementing the count gave a negative value (indi
cating  contention),  and  will sleep until another
process   releases  the  futex  and  executes   the
FUTEX_WAKE operation.

RETURN VALUE
FUTEX_WAIT
Returns 0 if the process was woken by a  FUTEX_WAKE
call. In case of timeout, ETIMEDOUT is returned. If
the futex was not equal to the expected value,  the
operation  returns  EWOULDBLOCK.  Signals (or other
spurious wakeups) cause FUTEX_WAIT to return EINTR.

so there very well might be programs other than glibc that
depend on this behaviour.  Given that in most cases the race
is not hit every day (after all, we have been living with it for
several years), they probably wouldn't know there is a problem
like that.

> You can do userspace threading and simulate most blocking system calls
> by making them non-blocking and using poll).

Sure, but then you need to write your own locking as well and
can just use the token passing property of futexes there.

> It's not a _huge_ loss, but considering it's only Glibc which is
> demanding this and futexes have another property, token-passing, which
> Glibc could be using instead - why not use it?

Because that requires requeue being done with the cv lock held, which
means an extra context switch.

> > @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
> >  	inc_preempt_count();
> >  	ret = __copy_from_user_inatomic(dest, from, sizeof(int));
> >  	dec_preempt_count();
> > -	preempt_check_resched();
> >  
> >  	return ret ? -EFAULT : 0;
> >  }
> 
> inc_preempt_count() and dec_preempt_count() aren't needed, as
> preemption is disabled by the queue spinlocks.  So
> get_futex_value_locked isn't needed any more: with the spinlocks held,
> __get_user will do.

They aren't needed if CONFIG_PREEMPT.  But with !CONFIG_PREEMPT, they
are IMHO still needed, as spin_lock/spin_unlock call preempt_{disable,enable},
which is a nop if !CONFIG_PREEMPT.
__get_user can't be used though, it should be __get_user_inatomic
(or __copy_from_user_inatomic if the former doesn't exist).

> > [numerous instances of...]
> > +	preempt_check_resched();
> 
> Not required.  The spin unlocks will do this.

True, preempt_check_resched() is a nop if !CONFIG_PREEMPT and for
CONFIG_PREEMPT spin_unlock will handle it.  Will remove them from the
patch.

> > But with the recent changes to futex.c I think kernel can ensure
> > atomicity for free.
> 
> I agree it would probably not slow the kernel, but I would _strongly_
> prefer that Glibc were fixed to use the token-passing property, if
> Glibc is the driving intention behind this patch - instead of this
> becoming a semantic that application-level users of futex (like
> database and IPC libraries) come to depend on and which can't be
> decomposed into a multiple-waiting form.
> 
> (I admit that the kernel code does look nicer with
> get_futex_value_locked gone, though).
> 
> By the way, do you know of Scott Snyder's recent work on fixing Glibc
> in this way?  He bumped into one of Glibc's currently broken corner
> cases, fixed it (according to the algorithm I gave in November), and
> reported that it works fine with the fix.

I certainly haven't seen his patch.

	Jakub

  reply	other threads:[~2005-03-17 15:56 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20041113164048.2f31a8dd.akpm@osdl.org>
2004-11-14  9:00 ` Futex queue_me/get_user ordering (was: 2.6.10-rc1-mm5 [u]) Emergency Services Jamie Lokier
2004-11-14  9:09   ` Andrew Morton
2004-11-14  9:23     ` Jamie Lokier
2004-11-14  9:50       ` bert hubert
2004-11-15 14:12         ` Jamie Lokier
2004-11-16  8:30           ` Futex queue_me/get_user ordering Hidetoshi Seto
2004-11-16 14:58             ` Jamie Lokier
2004-11-18  1:29               ` Hidetoshi Seto
2004-11-15  0:58       ` Hidetoshi Seto
2004-11-15  2:01         ` Jamie Lokier
2004-11-15  3:06           ` Hidetoshi Seto
2004-11-15 13:22             ` Jamie Lokier
2004-11-17  8:47               ` Jakub Jelinek
2004-11-18  2:10                 ` Hidetoshi Seto
2004-11-18  7:20                 ` Jamie Lokier
2004-11-18 19:47                   ` Jakub Jelinek
2005-03-17 10:26                     ` Jakub Jelinek
2005-03-17 15:20                       ` Jamie Lokier
2005-03-17 15:55                         ` Jakub Jelinek [this message]
2005-03-18 17:00                           ` Ingo Molnar
2005-03-21  2:55                             ` Jamie Lokier
2005-03-18 16:53                         ` Jakub Jelinek
2004-11-26 17:06                 ` Jamie Lokier
2004-11-28 17:36                   ` Joe Seigh
2004-11-29 11:24                   ` Jakub Jelinek
2004-11-29 21:50                     ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050317155539.GF853@devserv.devel.redhat.com \
    --to=jakub@redhat.com \
    --cc=ahu@ds9a.nl \
    --cc=akpm@osdl.org \
    --cc=drepper@redhat.com \
    --cc=jamie@shareable.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=roland@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=seto.hidetoshi@jp.fujitsu.com \
    --cc=snyder@fnal.gov \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.