From: Hou Tao <houtao1@huawei.com>
To: <linux-fsdevel@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>, <viro@zeniv.linux.org.uk>,
<jbaron@akamai.com>, <oleg@redhat.com>, <dave@stgolabs.net>,
<koct9i@gmail.com>
Subject: [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case
Date: Sat, 28 Oct 2017 20:58:19 +0800 [thread overview]
Message-ID: <1509195507-29037-1-git-send-email-houtao1@huawei.com> (raw)
Hi,
We are optimizing the Request-Per-Second of nginx http server, and we found
that acquiring epmutex in eventpoll_release_file() will become a bottleneck
under the one-request-per-connection scenario. The following are some details
of the scenario:
* HTTP server (nginx):
* under ARM64 with 64 cores
* 64 worker processes, each worker is binded to a specific CPU
* keepalive_requests = 1 in nginx.conf: nginx will close the
connection fd after a reply is send
* HTTP benchmark tool (wrk):
* under x86-64 with 48 cores
* 16 threads, 64 connections per-thread
Before the patch, the RPS measured by wrk is ~220K, after applying
the patch the RPS is ~240K. We also measure the overhead of
eventpoll_release_file() and its children by perf: 29% before and
2% after.
In the following section I will explain the purposes of epmutex, and
the way of replacing it by using locks with a smaller granularity.
epmutex serves four purposes:
(1) serialize ep_loop_check() and ep_free()/eventpoll_release_file()
(a) ensure the validity of ep when clearing visited_list
The acquisition of epmutex in ep_free() prevent the freeing of ep.
It's fixed in patch 2: when freeing ep, remove it from visited_list.
When there is no nested-epoll cast, ep will not been added to
visited_list, so we check the condition first. If it has already been
added to visited_list, we need to wait for the release of epmutex.
(2) serialize reverse_path_check() and ep_free()/eventpoll_release_file()
(a) ensure the validity of file in tfile_check_list
epi->ffd.file was added to tfile_check_list under ep->mtx, but
was accessed without ep->mtx. The acquisition of epmutex in
eventpoll_release_file() prevent the freeing of file.
It's fixed in patch 3: when releasing file, remove it from
tfile_check_list. If it has been already added, we need to
wait for the release of epmutex.
(b) ensure the validity of epi->ep and epi->ep->file
The epmutex will prevent the freeing of ep and its related file,
so it's OK to access epi->ep under rcu read critical region.
The change is done in patch 4: we free ep by rcu, so it's OK
to access epi->ep->file under rcu read critical region. The file
has already been freed by rcu, so it's also OK to access its fields.
(3) serialize the concurrent invocations of epoll_ctl(EPOLL_CTL_ADD)
for the nested-epoll-fd case
(a) protect tfile_check_list and visited_list
There is nothing to do.
(4) serialize ep_free() and eventpoll_release_file()
(a) protect file->f_ep_links
eventpoll_release_file() will read the list through
file->f_ep_links, and modify it through epi->fllink.
ep_free() will modify it through epi->fllink.
It's fixed in patch 5: using rcu and list_first_or_null_rcu() to
iterate file->f_ep_links instead of epmutex.
(b) ensure the validity of epi->ep
When eventpoll_release_file() gets epi from file->f_ep_links,
epi->ep should still be valid.
It's fixed in patch 4 and 6: add an ref-counter to eventpoll and
free eventpoll by rcu.
(c) protect the removal of epi
Both ep_free() and eventpoll_release_file() will try to remove
the same epi, if one function has removed the epi, the other
function should not remove it again.
It's fixed in patch 7: check whether or not ep_free() has already
removed the epi before the invocation of ep_remove() in
eventpoll_release_file().
(d) ensure the validity of epi->ffd.file
When ep_remove() is invoked by ep_free(), epi->ffd.file should
still be valid.
Do not need to do anything: when ep_free() is invoking ep_remove()
and access epi->ffd.file, if the file is freeing, the freeing will
be blocked on ep->mtx, so it's OK to access the file in ep_remove().
Patch 1 just removes epmutex from ep_free() and eventpoll_release_file(),
and patch 8 enlarge the protected region of ep->mtx to protect against
the iteration of ep->rbr.
The patch set has passed the epoll related test cases in LTP, and we are
planing to run some torture or performance test cases for nested-epoll
cases.
Comments and questions are welcome.
Regards,
Tao
---
Hou Tao (8):
epoll: remove epmutex from ep_free() & eventpoll_release_file()
epoll: remove ep from visited_list when freeing ep
epoll: remove file from tfile_check_list when releasing file
epoll: free eventpoll by rcu to provide existence guarantee
epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu
epoll: ensure the validity of ep when removing epi in
eventpoll_release_file()
epoll: prevent the double-free of epi in eventpoll_release_file()
epoll: protect the iteration of ep->rbr by ep->mtx in ep_free()
fs/eventpoll.c | 102 +++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 88 insertions(+), 14 deletions(-)
--
2.7.5
next reply other threads:[~2017-10-28 12:52 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-10-28 12:58 Hou Tao [this message]
2017-10-28 12:58 ` [RFC][PATCH 1/8] epoll: remove epmutex from ep_free() & eventpoll_release_file() Hou Tao
2017-10-28 13:58 ` Davidlohr Bueso
2017-10-30 7:09 ` Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 2/8] epoll: remove ep from visited_list when freeing ep Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 3/8] epoll: remove file from tfile_check_list when releasing file Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 4/8] epoll: free eventpoll by rcu to provide existence guarantee Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 5/8] epoll: iterate epi in file->f_ep_links by using list_first_or_null_rcu Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 6/8] epoll: ensure the validity of ep when removing epi in eventpoll_release_file() Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 7/8] epoll: prevent the double-free of " Hou Tao
2017-10-28 12:58 ` [RFC][PATCH 8/8] epoll: protect the iteration of ep->rbr by ep->mtx in ep_free() Hou Tao
2017-10-31 13:01 ` [RFC][PATCH 0/8] epoll: remove epmutex from ep_free() and eventpoll_release_file() for non-nested case Davidlohr Bueso
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1509195507-29037-1-git-send-email-houtao1@huawei.com \
--to=houtao1@huawei.com \
--cc=dave@stgolabs.net \
--cc=jbaron@akamai.com \
--cc=koct9i@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=oleg@redhat.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).