All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <dada1@cosmosbay.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Davi Arnaut <davi@haxent.com.br>,
	Andrew Morton <akpm@linux-foundation.org>,
	Davide Libenzi <davidel@xmailserver.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH] rfc: threaded epoll_wait thundering herd
Date: Sat, 05 May 2007 07:47:18 +0200	[thread overview]
Message-ID: <463C1A66.3080806@cosmosbay.com> (raw)
In-Reply-To: <alpine.LFD.0.98.0705042131040.3819@woody.linux-foundation.org>

Linus Torvalds a écrit :
> 
> On Sat, 5 May 2007, Eric Dumazet wrote:
>> But... what happens if the thread that was chosen exits from the loop in
>> ep_poll() with res = -EINTR (because of signal_pending(current))
> 
> Not a problem.
> 
> What happens is that an exclusive wake-up stops on the first entry in the 
> wait-queue that it actually *wakes*up*, but if some task has just marked 
> itself as being TASK_UNINTERRUPTIBLE, but is still on the run-queue, it 
> will just be marked TASK_RUNNING and that in itself isn't enough to cause 
> the "exclusive" test to trigger.
> 
> The code in sched.c is subtle, but worth understanding if you care about 
> these things. You should look at:
> 
>  - try_to_wake_up() - this is the default wakeup function (and the one 
>    that should work correctly - I'm not going to guarantee that any of the 
>    other specialty-wakeup-functions do so)
> 
>    The return value is the important thing. Returning non-zero is 
>    "success", and implies that we actually activated it.
> 
>    See the "goto out_running" case for the case where the process was 
>    still actually on the run-queues, and we just ended up setting 
>    "p->state = TASK_RUNNING" - we still return 0, and the "exclusive" 
>    logic will not trigger.
> 
>  - __wake_up_common: this is the thing that _calls_ the above, and which 
>    cares about the return value above. It does
> 
> 	if (curr->func(curr, mode, sync, key) &&
> 		(flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive)
> 
> 
>    ie it only decrements (and triggers) the nr_exclusive thing when the 
>    wakeup-function returned non-zero (and when the waitqueue entry was 
>    marked exclusive, of course).
> 
> So what does all this subtlety *mean*?
> 
> Walk through it. It means that it is safe to do the
> 
> 	if (signal_pending())
> 		return -EINTR;
> 
> kind of thing, because *when* you do this, you obviously are always on the 
> run-queue (otherwise the process wouldn't be running, and couldn't be 
> doing the test). So if there is somebody else waking you up right then and 
> there, they'll never count your wakeup as an exclusive one, and they will 
> wake up at least one other real exclusive waiter.
> 
> (IOW, you get a very very small probability of a very very small 
> "thundering herd" - obviously it won't be "thundering" any more, it will 
> be more of a "whispering herdlet").
> 
> The Linux kernel sleep/wakeup thing is really quite nifty and smart. And 
> very few people realize just *how* nifty and elegant (and efficient) it 
> is. Hopefully a few more people appreciate its beauty and subtlety now ;)
> 

Thank you Linus for these detailed explanations.

I think I was frightened not by the wakeup logic, but by the possibility in 
SMP that a signal could be delivered to the thread just after it has been 
selected.

Looking again at ep_poll(), I see  :

			set_current_state(TASK_INTERRUPTIBLE);
[*]                     if (!list_empty(&ep->rdllist) || !jtimeout)
                                 break;
                         if (signal_pending(current)) {
                                 res = -EINTR;
                                 break;
                         }

So the test against signal_pending() is not done if an event is present in 
ready list : It should be delivered even if a signal is pending. I missed this 
bit ealier...




  reply	other threads:[~2007-05-05  5:47 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20070504225730.490334000@haxent.com.br>
2007-05-04 23:37 ` [PATCH] rfc: threaded epoll_wait thundering herd Davi Arnaut
2007-05-05  4:15   ` Eric Dumazet
2007-05-05  4:44     ` Linus Torvalds
2007-05-05  5:47       ` Eric Dumazet [this message]
2007-05-05 19:00   ` Davide Libenzi
2007-05-05 21:42     ` Davi Arnaut
2007-05-07 21:00       ` Ulrich Drepper
2007-05-07 21:34         ` Davi Arnaut
2007-05-07 22:19           ` Ulrich Drepper
2007-05-07 22:35             ` Davide Libenzi
2007-05-08  2:49               ` Ulrich Drepper
2007-05-08  3:56                 ` Kyle Moffett
2007-05-08  4:35                 ` Linus Torvalds
2007-05-08  6:30                 ` Davide Libenzi
2007-05-07 23:15             ` Davi Arnaut
2007-05-08  2:32               ` Ulrich Drepper
2007-05-08  3:24                 ` Davi Arnaut
2007-05-07 22:47         ` Davide Libenzi
2007-05-07 15:46     ` Chase Venters
2007-05-07 17:18       ` Davide Libenzi
2007-05-07 18:17         ` Chase Venters

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=463C1A66.3080806@cosmosbay.com \
    --to=dada1@cosmosbay.com \
    --cc=akpm@linux-foundation.org \
    --cc=davi@haxent.com.br \
    --cc=davidel@xmailserver.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.