From: Li Yu <raise.sail@gmail.com>
To: Changli Gao <xiaosuo@gmail.com>
Cc: Linux Netdev List <netdev@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
davidel@xmailserver.org
Subject: Re: [RFC] Introduce to batch variants of accept() and epoll_ctl() syscall
Date: Fri, 15 Jun 2012 13:37:42 +0800 [thread overview]
Message-ID: <4FDACA26.70004@gmail.com> (raw)
In-Reply-To: <CABa6K_H3NrvvZ3Bh7JqsR6h33BSqYPBenUDG5Yt1U=2VvP700g@mail.gmail.com>
于 2012年06月15日 12:29, Changli Gao 写道:
> On Fri, Jun 15, 2012 at 12:13 PM, Li Yu<raise.sail@gmail.com> wrote:
>> Hi,
>>
>> We encounter a performance problem in a large scale computer
>> cluster, which needs to handle a lot of incoming concurrent TCP
>> connection requests.
>>
>> The top shows the kernel is most cpu hog, the testing is simple,
>> just a accept() -> epoll_ctl(ADD) loop, the ratio of cpu util sys% to
>> si% is about 2:5.
>>
>> I also asked some experienced webserver/proxy developers in my team
>> for suggestions, it seem that behavior of many userland programs already
>> called accept() multiple times after it is waked up by
>> epoll_wait(). And the common action is adding the fd that accept()
>> return into epoll interface by epoll_ctl() syscall then.
>>
>> Therefore, I think that we'd better to introduce to batch variants of
>> accept() and epoll_ctl() syscall, just like sendmmsg() or recvmmsg().
>>
>> For accept(), we may need a new syscall, it may like this,
>>
>> struct accept_result {
>> int fd;
>> struct sockaddr addr;
>> socklen_t addr_len;
>> };
>>
>> int maccept4(int fd, int flags, int nr_accept_result, struct
>> accept_result *results);
>>
>> For epoll_ctl(), there are two means to extend it, I prefer to extend
>> current interface instead of introduce to new syscall. We may introduce
>> to a new flag EPOLL_CTL_BATCH. If userland call epoll_ctl() with this
>> flag set, the meaning of last two arguments of epoll_ctl() change, .e.g:
>>
>> struct batch_epoll_event batch_event[] = {
>> {
>> .fd = a_newsock_fd;
>> .epoll_event = { ... };
>> },
>> ...
>> };
>>
>> ret = epoll_ctl(fd, EPOLL_CTL_ADD|EPOLL_CTL_BATCH, nr_batch_events,
>> batch_events);
>>
>
> I think it is good idea. Would you please implement a prototype and
> give some numbers? This kind of data may help selling this idea.
> Thanks.
>
Of course, I think that implementing them should not be a hard work :)
Em. I really do not know whether it is necessary to introduce to a new
syscall here. An alternative solution to add new socket option to handle
such batch requirement, so applications also can detect if kernel has
this extended ability with a easy getsockopt() call.
Any way, I am going to try to write a prototype first.
Thanks
Yu
next prev parent reply other threads:[~2012-06-15 5:37 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-06-15 4:13 [RFC] Introduce to batch variants of accept() and epoll_ctl() syscall Li Yu
2012-06-15 4:29 ` Changli Gao
2012-06-15 5:37 ` Li Yu [this message]
2012-06-15 8:51 ` Eric Dumazet
2012-06-18 23:27 ` Andi Kleen
2012-07-06 9:38 ` Li Yu
2012-07-09 3:36 ` Li Yu
2012-06-15 8:35 ` David Laight
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FDACA26.70004@gmail.com \
--to=raise.sail@gmail.com \
--cc=davidel@xmailserver.org \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=xiaosuo@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.