From: Jamie Lokier <jamie@shareable.org>
To: Changli Gao <xiaosuo@gmail.com>
Cc: David Howells <dhowells@redhat.com>,
Yong Zhang <yong.zhang@windriver.com>,
Xiaotian Feng <xtfeng@gmail.com>, Ingo Molnar <mingo@elte.hu>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Andrew Morton <akpm@linux-foundation.org>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Davide Libenzi <davidel@xmailserver.org>,
Roland Dreier <rolandd@cisco.com>,
Stefan Richter <stefanr@s5r6.in-berlin.de>,
Peter Zijlstra <a.p.zijlstra@chello.nl>,
"David S. Miller" <davem@davemloft.net>,
Eric Dumazet <dada1@cosmosbay.com>,
Christoph Lameter <cl@linux.com>,
Andreas Herrmann <andreas.herrmann3@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Takashi Iwai <tiwai@suse.de>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] sched: implement the exclusive wait queue as a LIFO queue
Date: Wed, 28 Apr 2010 16:25:02 +0100 [thread overview]
Message-ID: <20100428152502.GA25569@shareable.org> (raw)
In-Reply-To: <y2r412e6f7f1004280642n49b8d6f2vcd08774531cb59da@mail.gmail.com>
Changli Gao wrote:
> On Wed, Apr 28, 2010 at 9:21 PM, Jamie Lokier <jamie@shareable.org> wrote:
> > Changli Gao wrote:
> >>
> >> fs/eventpoll.c: 1443.
> >> wait.flags |= WQ_FLAG_EXCLUSIVE;
> >> __add_wait_queue(&ep->wq, &wait);
> >
> > The same thing about assumptions applies here. The userspace process
> > may be waiting for an epoll condition to get access to a resource,
> > rather than being a worker thread interchangeable with others.
>
> Oh, the lines above are the current ones. So the assumptions applies
> and works here.
No, because WQ_FLAG_EXCLUSIVE doesn't have your LIFO semantic at the moment.
Your patch changes the behaviour of epoll, though I don't know if it
matters. Perhaps all programs which have multiple tasks waiting on
the same epoll fd are "interchangeable worker thread" types anyway :-)
> > For example, userspace might be using a pipe as a signal-safe lock, or
> > signal-safe multi-token semaphore, and epoll to wait for that pipe.
> >
> > WQ_FLAG_EXCLUSIVE means there is no point waking all tasks, to avoid a
> > pointless thundering herd. It doesn't mean unfairness is ok.
>
> The users should not make any assumption about the waking up sequence,
> neither LIFO nor FIFO.
Correct, but they should be able to assume non-starvation (eventual
progress) for all waiters.
It's one of those subtle things, possibly a unixy thing: Non-RT tasks
should always make progress when the competition is just other non-RT
tasks, even if the progress is slow.
Starvation can spread out beyond the starved process, to cause
priority inversions in other tasks that are waiting on a resource
locked by the starved process. Among other things, that can cause
higher priority tasks, and RT priority tasks, to block permanently.
Very unpleasant.
> > The LIFO idea _might_ make sense for interchangeable worker-thread
> > situations - including userspace. It would make sense for pipe
> > waiters, socket waiters (especially accept), etc.
>
> Yea, and my following patches are for socket waiters.
Occasionally unix socketpairs are occasionally used in the above ways too.
I'm not against your patch, but I worry that starvation is a new
semantic, and it may have a significant effect on something - either
in the kernel, or in userspace which is harder to check.
> > Do you have any measurements which showing the LIFO mode performing
> > better than FIFO, and by how much?
>
> I didn't do any test yet. But some work done by LSE project years ago
> showed that it is better.
>
> http://lse.sourceforge.net/io/aionotes.txt
>
> " Also in view of
> better cache utilization the wake queue mechanism is LIFO by default.
> (A new exclusive LIFO wakeup option has been introduced for this purpose)"
I suspect it's possible to combine LIFO-ish and FIFO-ish queuing to
prevent starvation while getting some of the locality benefit.
Something like add-LIFO and increment a small counter in the next wait
entry, but never add in front of an entry whose counter has reached
MAX_LIFO_WAITERS? :-)
-- Jamie
next prev parent reply other threads:[~2010-04-28 15:26 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-04-28 5:03 [RFC] sched: implement the exclusive wait queue as a LIFO queue Changli Gao
2010-04-28 6:22 ` Changli Gao
2010-04-28 8:05 ` Changli Gao
2010-04-28 7:47 ` Xiaotian Feng
2010-04-28 7:47 ` Xiaotian Feng
2010-04-28 7:52 ` Changli Gao
2010-04-28 7:52 ` Changli Gao
2010-04-28 8:15 ` Yong Zhang
2010-04-28 8:15 ` Yong Zhang
2010-04-28 8:23 ` Changli Gao
2010-04-28 9:25 ` Johannes Weiner
2010-04-28 9:29 ` David Howells
2010-04-28 11:17 ` Changli Gao
2010-04-28 11:17 ` Changli Gao
2010-04-28 13:21 ` David Howells
2010-04-28 13:21 ` Jamie Lokier
2010-04-28 13:21 ` Jamie Lokier
2010-04-28 13:42 ` Changli Gao
2010-04-28 15:25 ` Jamie Lokier [this message]
2010-04-28 15:49 ` Changli Gao
2010-04-28 18:57 ` Davide Libenzi
2010-04-28 9:32 ` David Howells
2010-04-28 13:56 ` Changli Gao
2010-04-28 13:56 ` Changli Gao
2010-04-28 14:06 ` David Howells
2010-04-28 14:53 ` Changli Gao
2010-04-28 14:53 ` Changli Gao
2010-04-28 15:00 ` David Howells
2010-04-28 15:33 ` Changli Gao
2010-04-28 15:33 ` Changli Gao
2010-04-28 9:34 ` David Howells
2010-04-28 13:47 ` Changli Gao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100428152502.GA25569@shareable.org \
--to=jamie@shareable.org \
--cc=a.p.zijlstra@chello.nl \
--cc=akpm@linux-foundation.org \
--cc=andreas.herrmann3@amd.com \
--cc=cl@linux.com \
--cc=dada1@cosmosbay.com \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=dhowells@redhat.com \
--cc=ebiederm@xmission.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=rolandd@cisco.com \
--cc=stefanr@s5r6.in-berlin.de \
--cc=tglx@linutronix.de \
--cc=tiwai@suse.de \
--cc=viro@zeniv.linux.org.uk \
--cc=xiaosuo@gmail.com \
--cc=xtfeng@gmail.com \
--cc=yong.zhang@windriver.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.