linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jamie Lokier <jamie@shareable.org>
To: Changli Gao <xiaosuo@gmail.com>
Cc: David Howells <dhowells@redhat.com>,
	Yong Zhang <yong.zhang@windriver.com>,
	Xiaotian Feng <xtfeng@gmail.com>, Ingo Molnar <mingo@elte.hu>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Eric W. Biederman" <ebiederm@xmission.com>,
	Davide Libenzi <davidel@xmailserver.org>,
	Roland Dreier <rolandd@cisco.com>,
	Stefan Richter <stefanr@s5r6.in-berlin.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	"David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <dada1@cosmosbay.com>,
	Christoph Lameter <cl@linux.com>,
	Andreas Herrmann <andreas.herrmann3@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Takashi Iwai <tiwai@suse.de>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] sched: implement the exclusive wait queue as a LIFO queue
Date: Wed, 28 Apr 2010 16:25:02 +0100	[thread overview]
Message-ID: <20100428152502.GA25569@shareable.org> (raw)
In-Reply-To: <y2r412e6f7f1004280642n49b8d6f2vcd08774531cb59da@mail.gmail.com>

Changli Gao wrote:
> On Wed, Apr 28, 2010 at 9:21 PM, Jamie Lokier <jamie@shareable.org> wrote:
> > Changli Gao wrote:
> >>
> >> fs/eventpoll.c: 1443.
> >>                 wait.flags |= WQ_FLAG_EXCLUSIVE;
> >>                 __add_wait_queue(&ep->wq, &wait);
> >
> > The same thing about assumptions applies here.  The userspace process
> > may be waiting for an epoll condition to get access to a resource,
> > rather than being a worker thread interchangeable with others.
> 
> Oh, the lines above are the current ones. So the assumptions applies
> and works here.

No, because WQ_FLAG_EXCLUSIVE doesn't have your LIFO semantic at the moment.

Your patch changes the behaviour of epoll, though I don't know if it
matters.  Perhaps all programs which have multiple tasks waiting on
the same epoll fd are "interchangeable worker thread" types anyway :-)

> > For example, userspace might be using a pipe as a signal-safe lock, or
> > signal-safe multi-token semaphore, and epoll to wait for that pipe.
> >
> > WQ_FLAG_EXCLUSIVE means there is no point waking all tasks, to avoid a
> > pointless thundering herd.  It doesn't mean unfairness is ok.
> 
> The users should not make any assumption about the waking up sequence,
> neither LIFO nor FIFO.

Correct, but they should be able to assume non-starvation (eventual
progress) for all waiters.

It's one of those subtle things, possibly a unixy thing: Non-RT tasks
should always make progress when the competition is just other non-RT
tasks, even if the progress is slow.

Starvation can spread out beyond the starved process, to cause
priority inversions in other tasks that are waiting on a resource
locked by the starved process.  Among other things, that can cause
higher priority tasks, and RT priority tasks, to block permanently.
Very unpleasant.

> > The LIFO idea _might_ make sense for interchangeable worker-thread
> > situations - including userspace.  It would make sense for pipe
> > waiters, socket waiters (especially accept), etc.
> 
> Yea, and my following patches are for socket waiters.

Occasionally unix socketpairs are occasionally used in the above ways too.

I'm not against your patch, but I worry that starvation is a new
semantic, and it may have a significant effect on something - either
in the kernel, or in userspace which is harder to check.

> > Do you have any measurements which showing the LIFO mode performing
> > better than FIFO, and by how much?
> 
> I didn't do any test yet. But some work done by LSE project years ago
> showed that it is better.
> 
> http://lse.sourceforge.net/io/aionotes.txt
> 
> " Also in view of
> better cache utilization the wake queue mechanism is LIFO by default.
> (A new exclusive LIFO wakeup option has been introduced for this purpose)"

I suspect it's possible to combine LIFO-ish and FIFO-ish queuing to
prevent starvation while getting some of the locality benefit.
Something like add-LIFO and increment a small counter in the next wait
entry, but never add in front of an entry whose counter has reached
MAX_LIFO_WAITERS? :-)

-- Jamie

  reply	other threads:[~2010-04-28 15:25 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-04-28  5:03 [RFC] sched: implement the exclusive wait queue as a LIFO queue Changli Gao
2010-04-28  6:22 ` Changli Gao
2010-04-28  8:05   ` Changli Gao
2010-04-28  7:47 ` Xiaotian Feng
2010-04-28  7:52   ` Changli Gao
2010-04-28  8:15     ` Yong Zhang
2010-04-28  8:23       ` Changli Gao
2010-04-28  9:25         ` Johannes Weiner
2010-04-28  9:29       ` David Howells
2010-04-28 11:17         ` Changli Gao
2010-04-28 13:21           ` Jamie Lokier
2010-04-28 13:42             ` Changli Gao
2010-04-28 15:25               ` Jamie Lokier [this message]
2010-04-28 15:49                 ` Changli Gao
2010-04-28 18:57           ` Davide Libenzi
2010-04-28 13:21         ` David Howells
2010-04-28  9:32 ` David Howells
2010-04-28 13:56   ` Changli Gao
2010-04-28 14:06   ` David Howells
2010-04-28 14:53     ` Changli Gao
2010-04-28 15:00     ` David Howells
2010-04-28 15:33       ` Changli Gao
2010-04-28  9:34 ` David Howells
2010-04-28 13:47   ` Changli Gao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20100428152502.GA25569@shareable.org \
    --to=jamie@shareable.org \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=andreas.herrmann3@amd.com \
    --cc=cl@linux.com \
    --cc=dada1@cosmosbay.com \
    --cc=davem@davemloft.net \
    --cc=davidel@xmailserver.org \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=rolandd@cisco.com \
    --cc=stefanr@s5r6.in-berlin.de \
    --cc=tglx@linutronix.de \
    --cc=tiwai@suse.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=xiaosuo@gmail.com \
    --cc=xtfeng@gmail.com \
    --cc=yong.zhang@windriver.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).