public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davi Arnaut <davi@haxent.com.br>,
	Andrew Morton <akpm@linux-foundation.org>,
	Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rfc: threaded epoll_wait thundering herd
Date: Sat, 05 May 2007 07:47:18 +0200	[thread overview]
Message-ID: <463C1A66.3080806@cosmosbay.com> (raw)
In-Reply-To: <alpine.LFD.0.98.0705042131040.3819@woody.linux-foundation.org>

Linus Torvalds a écrit :
> 
> On Sat, 5 May 2007, Eric Dumazet wrote:
>> But... what happens if the thread that was chosen exits from the loop in
>> ep_poll() with res = -EINTR (because of signal_pending(current))
> 
> Not a problem.
> 
> What happens is that an exclusive wake-up stops on the first entry in the 
> wait-queue that it actually *wakes*up*, but if some task has just marked 
> itself as being TASK_UNINTERRUPTIBLE, but is still on the run-queue, it 
> will just be marked TASK_RUNNING and that in itself isn't enough to cause 
> the "exclusive" test to trigger.
> 
> The code in sched.c is subtle, but worth understanding if you care about 
> these things. You should look at:
> 
>  - try_to_wake_up() - this is the default wakeup function (and the one 
>    that should work correctly - I'm not going to guarantee that any of the 
>    other specialty-wakeup-functions do so)
> 
>    The return value is the important thing. Returning non-zero is 
>    "success", and implies that we actually activated it.
> 
>    See the "goto out_running" case for the case where the process was 
>    still actually on the run-queues, and we just ended up setting 
>    "p->state = TASK_RUNNING" - we still return 0, and the "exclusive" 
>    logic will not trigger.
> 
>  - __wake_up_common: this is the thing that _calls_ the above, and which 
>    cares about the return value above. It does
> 
> 	if (curr->func(curr, mode, sync, key) &&
> 		(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
> 
> 
>    ie it only decrements (and triggers) the nr_exclusive thing when the 
>    wakeup-function returned non-zero (and when the waitqueue entry was 
>    marked exclusive, of course).
> 
> So what does all this subtlety *mean*?
> 
> Walk through it. It means that it is safe to do the
> 
> 	if (signal_pending())
> 		return -EINTR;
> 
> kind of thing, because *when* you do this, you obviously are always on the 
> run-queue (otherwise the process wouldn't be running, and couldn't be 
> doing the test). So if there is somebody else waking you up right then and 
> there, they'll never count your wakeup as an exclusive one, and they will 
> wake up at least one other real exclusive waiter.
> 
> (IOW, you get a very very small probability of a very very small 
> "thundering herd" - obviously it won't be "thundering" any more, it will 
> be more of a "whispering herdlet").
> 
> The Linux kernel sleep/wakeup thing is really quite nifty and smart. And 
> very few people realize just *how* nifty and elegant (and efficient) it 
> is. Hopefully a few more people appreciate its beauty and subtlety now ;)
> 

Thank you Linus for these detailed explanations.

I think I was frightened not by the wakeup logic, but by the possibility in 
SMP that a signal could be delivered to the thread just after it has been 
selected.

Looking again at ep_poll(), I see  :

			set_current_state(TASK_INTERRUPTIBLE);
[*]                     if (!list_empty(&ep->rdllist) || !jtimeout)
                                 break;
                         if (signal_pending(current)) {
                                 res = -EINTR;
                                 break;
                         }

So the test against signal_pending() is not done if an event is present in 
ready list : It should be delivered even if a signal is pending. I missed this 
bit ealier...




  reply	other threads:[~2007-05-05  5:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070504225730.490334000@haxent.com.br>
2007-05-04 23:37 ` [PATCH] rfc: threaded epoll_wait thundering herd Davi Arnaut
2007-05-05  4:15   ` Eric Dumazet
2007-05-05  4:44     ` Linus Torvalds
2007-05-05  5:47       ` Eric Dumazet [this message]
2007-05-05 19:00   ` Davide Libenzi
2007-05-05 21:42     ` Davi Arnaut
2007-05-07 21:00       ` Ulrich Drepper
2007-05-07 21:34         ` Davi Arnaut
2007-05-07 22:19           ` Ulrich Drepper
2007-05-07 22:35             ` Davide Libenzi
2007-05-08  2:49               ` Ulrich Drepper
2007-05-08  3:56                 ` Kyle Moffett
2007-05-08  4:35                 ` Linus Torvalds
2007-05-08  6:30                 ` Davide Libenzi
2007-05-07 23:15             ` Davi Arnaut
2007-05-08  2:32               ` Ulrich Drepper
2007-05-08  3:24                 ` Davi Arnaut
2007-05-07 22:47         ` Davide Libenzi
2007-05-07 15:46     ` Chase Venters
2007-05-07 17:18       ` Davide Libenzi
2007-05-07 18:17         ` Chase Venters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463C1A66.3080806@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=davi@haxent.com.br \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox