From: Jamie Lokier <jamie@shareable.org>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>,
mingo@elte.hu, Andrew Morton <akpm@osdl.org>,
linux-kernel@vger.kernel.org, rusty@rustcorp.com.au, ahu@ds9a.nl,
Ulrich Drepper <drepper@redhat.com>,
Roland McGrath <roland@redhat.com>,
Scott Snyder <snyder@fnal.gov>
Subject: Re: Futex queue_me/get_user ordering
Date: Thu, 17 Mar 2005 15:20:31 +0000 [thread overview]
Message-ID: <20050317152031.GB16743@mail.shareable.org> (raw)
In-Reply-To: <20050317102619.GA23494@devserv.devel.redhat.com>
Jakub Jelinek wrote:
> http://www.ussg.iu.edu/hypermail/linux/kernel/0411.2/0953.html
>
> Your argument in November was that you don't want to slow down the
> kernel and that userland must be able to cope with the
> non-atomicity of futex syscall.
Those were two of them.
But my other main concern is conceptual.
Right now, a futex_wait call is roughly equivalent to to
add_wait_queue, which is quite versatile.
It means anything you can do with one futex, you can extend to
multiple futexes (e.g. waiting on more than one lock), and you can do
asynchronously (e.g. futex_wait can be implemented in userspace as
futex_fd[1] + poll[2], and therefore things like poll-driven state machines
where one of the state machines wants to wait on a lock are possible).
[1] Ulrich was mistaken in his paper to say futex_fd needs to check a word
to be useful; userspace is supposed to check the word after futex_fd
and before polling or waiting on it. This is more useful because it
extends to multiple futexes.
[2] actually it can't right now because of a flaw in futex_fd's poll
function, but that could be fixed. The _principle_ is sound.
If you change futex_wait to be "atomic", and then have userspace locks
which _depend_ on that atomicity, it becomes impossible to wait on
multiple of those locks, or make poll-driven state machines which can
wait on those locks.
There are applications and libraries which use futex, not just for
threading but things like database locks in files.
You can do userspace threading and simulate most blocking system calls
by making them non-blocking and using poll).
(I'm not saying anything against NPTL by this, by the way - NPTL is a
very good general purpose library - but there are occasions when an
application wants to do it's own equivalent of simulated blocking
system calls for one reason or another. My favourite being research
into inter-thread JIT-optimisation in an environment like valgrind).
Right now, in principle, futex_wait is among the system calls which
can be simulated by making it non-blocking (= futex_fd) and using poll()[2].
Which means programs using futex themselves can be subject to interesting
thread optimisations by code which knows nothing about the program
(similar to valgrind..)
If you change futex_wait to be "atomic", then it would be _impossible_
to take a some random 3rd party library which is using that
futex_wait, and convert it's blocking system calls to use poll-driven
state machines instead.
I think taking that away would be a great conceptual loss.
It's not a _huge_ loss, but considering it's only Glibc which is
demanding this and futexes have another property, token-passing, which
Glibc could be using instead - why not use it?
That said, let's look at your patch.
> It would simplify requeue implementation (getting rid of the nqueued
> field),
The change to FUTEX_REQUEUE2 is an improvement :)
nqueued is an abomination, like the rest of FUTEX_REQUEUE2 :)
> @@ -265,7 +264,6 @@ static inline int get_futex_value_locked
> inc_preempt_count();
> ret = __copy_from_user_inatomic(dest, from, sizeof(int));
> dec_preempt_count();
> - preempt_check_resched();
>
> return ret ? -EFAULT : 0;
> }
inc_preempt_count() and dec_preempt_count() aren't needed, as
preemption is disabled by the queue spinlocks. So
get_futex_value_locked isn't needed any more: with the spinlocks held,
__get_user will do.
> [numerous instances of...]
> + preempt_check_resched();
Not required. The spin unlocks will do this.
> But with the recent changes to futex.c I think kernel can ensure
> atomicity for free.
I agree it would probably not slow the kernel, but I would _strongly_
prefer that Glibc were fixed to use the token-passing property, if
Glibc is the driving intention behind this patch - instead of this
becoming a semantic that application-level users of futex (like
database and IPC libraries) come to depend on and which can't be
decomposed into a multiple-waiting form.
(I admit that the kernel code does look nicer with
get_futex_value_locked gone, though).
By the way, do you know of Scott Snyder's recent work on fixing Glibc
in this way? He bumped into one of Glibc's currently broken corner
cases, fixed it (according to the algorithm I gave in November), and
reported that it works fine with the fix.
-- Jamie
next prev parent reply other threads:[~2005-03-17 15:21 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20041113164048.2f31a8dd.akpm@osdl.org>
2004-11-14 9:00 ` Futex queue_me/get_user ordering (was: 2.6.10-rc1-mm5 [u]) Emergency Services Jamie Lokier
2004-11-14 9:09 ` Andrew Morton
2004-11-14 9:23 ` Jamie Lokier
2004-11-14 9:50 ` bert hubert
2004-11-15 14:12 ` Jamie Lokier
2004-11-16 8:30 ` Futex queue_me/get_user ordering Hidetoshi Seto
2004-11-16 14:58 ` Jamie Lokier
2004-11-18 1:29 ` Hidetoshi Seto
2004-11-15 0:58 ` Hidetoshi Seto
2004-11-15 2:01 ` Jamie Lokier
2004-11-15 3:06 ` Hidetoshi Seto
2004-11-15 13:22 ` Jamie Lokier
2004-11-17 8:47 ` Jakub Jelinek
2004-11-18 2:10 ` Hidetoshi Seto
2004-11-18 7:20 ` Jamie Lokier
2004-11-18 19:47 ` Jakub Jelinek
2005-03-17 10:26 ` Jakub Jelinek
2005-03-17 15:20 ` Jamie Lokier [this message]
2005-03-17 15:55 ` Jakub Jelinek
2005-03-18 17:00 ` Ingo Molnar
2005-03-21 2:55 ` Jamie Lokier
2005-03-18 16:53 ` Jakub Jelinek
2004-11-26 17:06 ` Jamie Lokier
2004-11-28 17:36 ` Joe Seigh
2004-11-29 11:24 ` Jakub Jelinek
2004-11-29 21:50 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20050317152031.GB16743@mail.shareable.org \
--to=jamie@shareable.org \
--cc=ahu@ds9a.nl \
--cc=akpm@osdl.org \
--cc=drepper@redhat.com \
--cc=jakub@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=roland@redhat.com \
--cc=rusty@rustcorp.com.au \
--cc=seto.hidetoshi@jp.fujitsu.com \
--cc=snyder@fnal.gov \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.