From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davi Arnaut <davi@haxent.com.br>,
Andrew Morton <akpm@linux-foundation.org>,
Davide Libenzi <davidel@xmailserver.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rfc: threaded epoll_wait thundering herd
Date: Sat, 05 May 2007 07:47:18 +0200 [thread overview]
Message-ID: <463C1A66.3080806@cosmosbay.com> (raw)
In-Reply-To: <alpine.LFD.0.98.0705042131040.3819@woody.linux-foundation.org>
Linus Torvalds a écrit :
>
> On Sat, 5 May 2007, Eric Dumazet wrote:
>> But... what happens if the thread that was chosen exits from the loop in
>> ep_poll() with res = -EINTR (because of signal_pending(current))
>
> Not a problem.
>
> What happens is that an exclusive wake-up stops on the first entry in the
> wait-queue that it actually *wakes*up*, but if some task has just marked
> itself as being TASK_UNINTERRUPTIBLE, but is still on the run-queue, it
> will just be marked TASK_RUNNING and that in itself isn't enough to cause
> the "exclusive" test to trigger.
>
> The code in sched.c is subtle, but worth understanding if you care about
> these things. You should look at:
>
> - try_to_wake_up() - this is the default wakeup function (and the one
> that should work correctly - I'm not going to guarantee that any of the
> other specialty-wakeup-functions do so)
>
> The return value is the important thing. Returning non-zero is
> "success", and implies that we actually activated it.
>
> See the "goto out_running" case for the case where the process was
> still actually on the run-queues, and we just ended up setting
> "p->state = TASK_RUNNING" - we still return 0, and the "exclusive"
> logic will not trigger.
>
> - __wake_up_common: this is the thing that _calls_ the above, and which
> cares about the return value above. It does
>
> if (curr->func(curr, mode, sync, key) &&
> (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
>
>
> ie it only decrements (and triggers) the nr_exclusive thing when the
> wakeup-function returned non-zero (and when the waitqueue entry was
> marked exclusive, of course).
>
> So what does all this subtlety *mean*?
>
> Walk through it. It means that it is safe to do the
>
> if (signal_pending())
> return -EINTR;
>
> kind of thing, because *when* you do this, you obviously are always on the
> run-queue (otherwise the process wouldn't be running, and couldn't be
> doing the test). So if there is somebody else waking you up right then and
> there, they'll never count your wakeup as an exclusive one, and they will
> wake up at least one other real exclusive waiter.
>
> (IOW, you get a very very small probability of a very very small
> "thundering herd" - obviously it won't be "thundering" any more, it will
> be more of a "whispering herdlet").
>
> The Linux kernel sleep/wakeup thing is really quite nifty and smart. And
> very few people realize just *how* nifty and elegant (and efficient) it
> is. Hopefully a few more people appreciate its beauty and subtlety now ;)
>
Thank you Linus for these detailed explanations.
I think I was frightened not by the wakeup logic, but by the possibility in
SMP that a signal could be delivered to the thread just after it has been
selected.
Looking again at ep_poll(), I see :
set_current_state(TASK_INTERRUPTIBLE);
[*] if (!list_empty(&ep->rdllist) || !jtimeout)
break;
if (signal_pending(current)) {
res = -EINTR;
break;
}
So the test against signal_pending() is not done if an event is present in
ready list : It should be delivered even if a signal is pending. I missed this
bit ealier...
next prev parent reply other threads:[~2007-05-05 5:47 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20070504225730.490334000@haxent.com.br>
2007-05-04 23:37 ` [PATCH] rfc: threaded epoll_wait thundering herd Davi Arnaut
2007-05-05 4:15 ` Eric Dumazet
2007-05-05 4:44 ` Linus Torvalds
2007-05-05 5:47 ` Eric Dumazet [this message]
2007-05-05 19:00 ` Davide Libenzi
2007-05-05 21:42 ` Davi Arnaut
2007-05-07 21:00 ` Ulrich Drepper
2007-05-07 21:34 ` Davi Arnaut
2007-05-07 22:19 ` Ulrich Drepper
2007-05-07 22:35 ` Davide Libenzi
2007-05-08 2:49 ` Ulrich Drepper
2007-05-08 3:56 ` Kyle Moffett
2007-05-08 4:35 ` Linus Torvalds
2007-05-08 6:30 ` Davide Libenzi
2007-05-07 23:15 ` Davi Arnaut
2007-05-08 2:32 ` Ulrich Drepper
2007-05-08 3:24 ` Davi Arnaut
2007-05-07 22:47 ` Davide Libenzi
2007-05-07 15:46 ` Chase Venters
2007-05-07 17:18 ` Davide Libenzi
2007-05-07 18:17 ` Chase Venters
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=463C1A66.3080806@cosmosbay.com \
--to=dada1@cosmosbay.com \
--cc=akpm@linux-foundation.org \
--cc=davi@haxent.com.br \
--cc=davidel@xmailserver.org \
--cc=linux-kernel@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox