From: Eric Dumazet <eric.dumazet@gmail.com>
To: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: "David S. Miller" <davem@davemloft.net>,
Stephen Hemminger <shemminger@vyatta.com>,
Cyrill Gorcunov <gorcunov@openvz.org>,
Alexey Dobriyan <adobriyan@gmail.com>,
netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
Pauli Nieminen <pauli.nieminen@collabora.co.uk>,
Rainer Weikusat <rweikusat@mssgmbh.com>,
Davide Libenzi <davidel@xmailserver.org>
Subject: Re: [PATCH 0/1] RFC: poll/select performance on datagram sockets
Date: Sat, 30 Oct 2010 14:53:37 +0200 [thread overview]
Message-ID: <1288443217.2680.962.camel@edumazet-laptop> (raw)
In-Reply-To: <20101030123403.5e01540d@chocolatine.cbg.collabora.co.uk>
Le samedi 30 octobre 2010 à 12:34 +0100, Alban Crequy a écrit :
> Le Fri, 29 Oct 2010 21:27:11 +0200,
> Eric Dumazet <eric.dumazet@gmail.com> a écrit :
>
> > Le vendredi 29 octobre 2010 à 19:18 +0100, Alban Crequy a écrit :
> > > Hi,
> > >
> > > When a process calls the poll or select, the kernel calls (struct
> > > file_operations)->poll on every file descriptor and returns a mask
> > > of events which are ready. If the process is only interested by
> > > POLLIN events, the mask is still computed for POLLOUT and it can be
> > > expensive. For example, on Unix datagram sockets, a process running
> > > poll() with POLLIN will wakes-up when the remote end call read().
> > > This is a performance regression introduced when fixing another bug
> > > by 3c73419c09a5ef73d56472dbfdade9e311496e9b and
> > > ec0d215f9420564fc8286dcf93d2d068bb53a07e.
> > >
> > > The attached program illustrates the problem. It compares the
> > > performance of sending/receiving data on an Unix datagram socket and
> > > select(). When the datagram sockets are not connected, the
> > > performance problem is not triggered, but when they are connected
> > > it becomes a lot slower. On my computer, I have the following time:
> > >
> > > Connected datagram sockets: >4 seconds
> > > Non-connected datagram sockets: <1 second
> > >
> > > The patch attached in the next email fixes the performance problem:
> > > it becomes <1 second for both cases. I am not suggesting the patch
> > > for inclusion; I would like to change the prototype of (struct
> > > file_operations)->poll instead of adding ->poll2. But there is a
> > > lot of poll functions to change (grep tells me 337 functions).
> > >
> > > Any opinions?
> >
> > My opinion would be to use epoll() for this kind of workload.
>
> I found a problem with epoll() with the following program. When there
> is several datagram sockets connected to the same server and the
> receiving queue is full, epoll(EPOLLOUT) wakes up only the emitter who
> has its skb removed from the queue, and not all the emitters. It is
> because sock_wfree() runs sk->sk_write_space() only for one emitter.
>
I dont think this is the reason.
sock_wfree() really is good here, since it copes with one socket (the
one that sent the message)
Problem is the peer_wait, that epoll doesnt seem to be plugged into.
Bug is in unix_dgram_poll()
It calls sock_poll_wait( ... &unix_sk(other)->peer_wait,) only if socket
is 'writable'. Its a clear bug
Try this patch please ?
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 0ebc777..315716c 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2092,7 +2092,7 @@ static unsigned int unix_dgram_poll(struct file *file, struct socket *sock,
/* writable? */
writable = unix_writable(sk);
- if (writable) {
+ if (1 /*writable*/) {
other = unix_peer_get(sk);
if (other) {
if (unix_peer(other) != sk) {
next prev parent reply other threads:[~2010-10-30 12:53 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-10-29 18:18 [PATCH 0/1] RFC: poll/select performance on datagram sockets Alban Crequy
2010-10-29 18:21 ` [PATCH] " Alban Crequy
2010-10-29 19:27 ` [PATCH 0/1] RFC: " Eric Dumazet
2010-10-29 20:08 ` Davide Libenzi
2010-10-29 20:20 ` Eric Dumazet
2010-10-29 20:46 ` Davide Libenzi
2010-10-29 21:05 ` Eric Dumazet
2010-10-29 21:57 ` Davide Libenzi
2010-10-29 22:08 ` Eric Dumazet
2010-10-30 9:53 ` [PATCH] af_unix: optimize unix_dgram_poll() Eric Dumazet
2010-10-30 17:45 ` Davide Libenzi
2010-10-29 20:20 ` [PATCH 0/1] RFC: poll/select performance on datagram sockets Jesper Juhl
2010-10-29 20:40 ` David Miller
2010-10-29 20:45 ` Eric Dumazet
2010-10-30 6:44 ` [PATCH] af_unix: unix_write_space() use keyed wakeups Eric Dumazet
2010-10-30 15:03 ` Davide Libenzi
2010-11-08 21:44 ` David Miller
2010-10-30 21:36 ` Alban Crequy
[not found] ` <1290554876.2158.5.camel@Nokia-N900-51-1>
2010-11-24 0:20 ` Alban Crequy
2010-11-24 0:28 ` Eric Dumazet
2010-10-30 11:34 ` [PATCH 0/1] RFC: poll/select performance on datagram sockets Alban Crequy
2010-10-30 12:53 ` Eric Dumazet [this message]
2010-10-30 13:17 ` Eric Dumazet
[not found] ` <20101030224703.065e70f6@chocolatine.cbg.collabora.co.uk>
2010-10-31 15:36 ` [PATCH 1/2] af_unix: fix unix_dgram_poll() behavior for EPOLLOUT event Eric Dumazet
2010-10-31 19:07 ` Davide Libenzi
2010-11-08 21:44 ` David Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1288443217.2680.962.camel@edumazet-laptop \
--to=eric.dumazet@gmail.com \
--cc=adobriyan@gmail.com \
--cc=alban.crequy@collabora.co.uk \
--cc=davem@davemloft.net \
--cc=davidel@xmailserver.org \
--cc=gorcunov@openvz.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pauli.nieminen@collabora.co.uk \
--cc=rweikusat@mssgmbh.com \
--cc=shemminger@vyatta.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox