From: Andy Lutomirski <luto@mit.edu>
To: bert hubert <bert.hubert@netherlabs.nl>
Cc: linux-kernel@vger.kernel.org, davidel@xmailserver.org
Subject: Re: epoll clarification sought: multithreaded epoll_wait for UDP sockets?
Date: Thu, 04 Mar 2010 08:12:09 -0500 [thread overview]
Message-ID: <4B8FB1A9.8030407@mit.edu> (raw)
In-Reply-To: <20100303212942.GB24601@xs.powerdns.com>
bert hubert wrote:
> Dear kernel people, dear Davide,
>
> I am currently debugging performance issues in the PowerDNS Recursor, and it
> turns out I have been using epoll_wait() sub-optimally. And I need your help
> to improve this. I'm more than happy to update the epoll_wait() manpage to
> reflect your advice.
>
> Essentially, what I would like to have is a way to distribute incoming UDP DNS
> queries to various threads automatically. Right now, there is one
> fd that multiple threads wait on, using epoll() or select() and subsequently
> recvfrom(). Crucially, each thread has its own epoll fd set (which is
> wrong).
>
> The hope is that each thread hogs a single CPU, and that UDP DNS queries
> coming in arrive at a single thread that is currently in epoll_wait(), ie
> not doing other things.
>
> As indicated by the manpage of epoll however, my setup means that threads
> get woken up unnecessarily when a new packet comes in. This results in lots
> of recvfrom() calls returning EAGAIN (basically on most of the other
> threads).
>
> (this can be observed in
> http://svn.powerdns.com/snapshots/rc2/pdns-recursor-3.2-rc2.tar.bz2 )
>
> The alternative appears to be to create a single epoll set, and have all
> threads call epoll_wait on that same set.
>
> The epoll() manpage however is silent on what this will do exactly, although
> several LKML posts indicate that this might cause 'thundering herd'
> problems.
>
> My question is: what is your recommendation for achieving the scenario
> outlined above? In other words, that is the 'best current practice' on
> modern Linux kernels to get each packet to arrive at a single thread?
>
> Epoll offers 'edge triggered' behaviour, would this make sense? Would it be
> smart to cal epoll_wait with only a single event to be returned to prevent
> starvation? Might it be useful to dup() the single fd, once for each thread?
> I also tried SO_REUSEADDR, so I could bind() multiple times to the same IP
> address & port, but this does not distribute incoming queries.
EPOLLET sounds like the right approach. But if you don't want to drain
the entire buffer in one thread and you use EPOLLET, you'll have to
manually wake another thread. You could use eventfd for that, or maybe
add a syscall to inject a new edge on a file descriptor. Good luck.
(FWIW, a couple days ago I got one machine handling over 9 GBps of
incoming UDP packets on a single core without drops. It's a hefty
machine, and I was using jumbo frames, but still.)
--Andy
next prev parent reply other threads:[~2010-03-04 13:19 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-03-03 21:29 epoll clarification sought: multithreaded epoll_wait for UDP sockets? bert hubert
2010-03-04 13:12 ` Andy Lutomirski [this message]
2010-03-04 15:40 ` Davide Libenzi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4B8FB1A9.8030407@mit.edu \
--to=luto@mit.edu \
--cc=bert.hubert@netherlabs.nl \
--cc=davidel@xmailserver.org \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox