All of lore.kernel.org
 help / color / mirror / Atom feed
From: Li Yu <raise.sail@gmail.com>
To: eric.dumazet@gmail.com
Cc: linux-kernel@vger.kernel.org, davidel@xmailserver.org
Subject: Re: The thundering herd like problem when multi epolls on one fd
Date: Sun, 15 Jan 2012 23:41:37 +0800	[thread overview]
Message-ID: <4F12F3B1.7060709@gmail.com> (raw)



2012/1/14 Eric Dumazet <eric.dumazet@gmail.com>:
> Le samedi 14 janvier 2012 à 19:13 +0800, Li Yu a écrit :
>> Hi,
>>
>>       My buddy reported a thundering herd problem about using epoll
>> on TCP listen sockets. He said their usage like below:
>>
>>       1. sk = new tcp_listen_socket();
>>       2. create many child processes or threads.
>>       3. in new created processes (threads), use epoll API on listen
>> sk to provide HTTP service.
>>
>>       Such using pattern means we have multi wait queues when
>> accepting one socket, and it is not exclusive waking up, so we get a
>> thundering herd like problem. And, so I heard many popular applications
>> can use such pattern, which includes nginx, lighttpd, haproxy at least.
>
> It is not very scalable. But we really lack a fanout mechanism to allow
> better paralelism on accept(), its not a poll() vs select() vs epoll()
> problem per se, but a generic problem.
>

I am interesting in this issue, my rough idea is it may utilize XPS or
RPS/RSS information to detect which tasks on target processor to
wake up,

>> So should we change this waking up behavior to exclusive too ?
>>
>
> Certainly not.
>
>>       Below is a simple patch (tested and works) for epoll() to do it,
>> of course, we also should fix select() and poll() syscalls if it is
right.
>>
>>       Thanks.
>>
>> Yu
>>
>> diff --git a/fs/eventpoll.c b/fs/eventpoll.c
>> index 828e750..a3d6ab4 100644
>> --- a/fs/eventpoll.c
>> +++ b/fs/eventpoll.c
>> @@ -898,7 +899,7 @@ static void ep_ptable_queue_proc(struct file
*file, wait_queue_head_t *whead,
>>                 init_waitqueue_func_entry(&pwq->wait, ep_poll_callback);
>>                 pwq->whead = whead;
>>                 pwq->base = epi;
>> -               add_wait_queue(whead, &pwq->wait);
>> +               add_wait_queue_exclusive(whead, &pwq->wait);
>>                 list_add_tail(&pwq->llink, &epi->pwqlist);
>>                 epi->nwait++;
>>         } else {
>> --
>
>
> What happens if the awaken thread does not consume the event, and prefer
> to exit ?

In my words, If so, it should be think as a bug in application.

>
> If several threads are doing select()/poll()/epoll() on a shared fd,
> they _all_ must be notified the fd is ready, as manpages claim.
>
> Doing otherwise would require the prior consent of the user, using a
> special flag for example, and documentation.
>

Indeed, thanks!

Yu

             reply	other threads:[~2012-01-15 15:41 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-15 15:41 Li Yu [this message]
  -- strict thread matches above, loose matches on Subject: below --
2012-01-14 11:13 The thundering herd like problem when multi epolls on one fd Li Yu
2012-01-14 13:20 ` Eric Dumazet
2012-01-14 15:57   ` Hagen Paul Pfeifer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F12F3B1.7060709@gmail.com \
    --to=raise.sail@gmail.com \
    --cc=davidel@xmailserver.org \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.