From: Soheil Hassas Yeganeh <soheil@google.com>
To: Paolo Abeni <pabeni@redhat.com>
Cc: linux-fsdevel@vger.kernel.org, Al Viro <viro@zeniv.linux.org.uk>,
Davidlohr Bueso <dave@stgolabs.net>,
Jason Baron <jbaron@akamai.com>, Roman Penyaev <rpenyaev@suse.de>,
netdev@vger.kernel.org, Carlos Maiolino <cmaiolino@redhat.com>
Subject: Re: [REPOST PATCH] epoll: use refcount to reduce ep_mutex contention
Date: Tue, 22 Nov 2022 12:03:06 -0500 [thread overview]
Message-ID: <CACSApvaMCeKLn88uNAWOxrzPWC9Rr2BZLa3--6TQuY6toYZdOg@mail.gmail.com> (raw)
In-Reply-To: <819762b6eb549f74d0ebbb6663f042ae9b6cd86d.camel@redhat.com>
On Tue, Nov 22, 2022 at 11:59 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hello,
>
> Thank you for the prompt feedback!
>
> On Tue, 2022-11-22 at 11:18 -0500, Soheil Hassas Yeganeh wrote:
> > On Tue, Nov 22, 2022 at 10:43 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > We are observing huge contention on the epmutex during an http
> > > connection/rate test:
> > >
> > > 83.17% 0.25% nginx [kernel.kallsyms] [k] entry_SYSCALL_64_after_hwframe
> > > [...]
> > > |--66.96%--__fput
> > > |--60.04%--eventpoll_release_file
> > > |--58.41%--__mutex_lock.isra.6
> > > |--56.56%--osq_lock
> > >
> > > The application is multi-threaded, creates a new epoll entry for
> > > each incoming connection, and does not delete it before the
> > > connection shutdown - that is, before the connection's fd close().
> > >
> > > Many different threads compete frequently for the epmutex lock,
> > > affecting the overall performance.
> > >
> > > To reduce the contention this patch introduces explicit reference counting
> > > for the eventpoll struct. Each registered event acquires a reference,
> > > and references are released at ep_remove() time. ep_free() doesn't touch
> > > anymore the event RB tree, it just unregisters the existing callbacks
> > > and drops a reference to the ep struct. The struct itself is freed when
> > > the reference count reaches 0. The reference count updates are protected
> > > by the mtx mutex so no additional atomic operations are needed.
> > >
> > > Since ep_free() can't compete anymore with eventpoll_release_file()
> > > for epitems removal, we can drop the epmutex usage at disposal time.
> > >
> > > With the patched kernel, in the same connection/rate scenario, the mutex
> > > operations disappear from the perf report, and the measured connections/rate
> > > grows by ~60%.
> >
> > I locally tried this patch and I can reproduce the results. Thank you
> > for the nice optimization!
> >
> > > Tested-by: Xiumei Mu <xmu@redhat.com>
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > This is just a repost reaching out for more recipents,
> > > as suggested by Carlos.
> > >
> > > Previous post at:
> > >
> > > https://lore.kernel.org/linux-fsdevel/20221122102726.4jremle54zpcapia@andromeda/T/#m6f98d4ccbe0a385d10c04fd4018e782b793944e6
> > > ---
> > > fs/eventpoll.c | 113 ++++++++++++++++++++++++++++---------------------
> > > 1 file changed, 64 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/fs/eventpoll.c b/fs/eventpoll.c
> > > index 52954d4637b5..6e415287aeb8 100644
> > > --- a/fs/eventpoll.c
> > > +++ b/fs/eventpoll.c
> > > @@ -226,6 +226,12 @@ struct eventpoll {
> > > /* tracks wakeup nests for lockdep validation */
> > > u8 nests;
> > > #endif
> > > +
> > > + /*
> > > + * protected by mtx, used to avoid races between ep_free() and
> > > + * ep_eventpoll_release()
> > > + */
> > > + unsigned int refcount;
> >
> > nitpick: Given that napi_id and nest are both macro protected, you
> > might want to pull it right after min_wait_ts.
>
> Just to be on the same page: the above is just for an aesthetic reason,
> right? Is there some functional aspect I don't see?
Yes, a nitpick completely for aesthetics. It's also slightly easier
to think about the size, alignment and padding of the structure that
way. Please feel free to ignore.
> [...]
>
> > > @@ -2165,10 +2174,16 @@ int do_epoll_ctl(int epfd, int op, int fd, struct epoll_event *epds,
> > > error = -EEXIST;
> > > break;
> > > case EPOLL_CTL_DEL:
> > > - if (epi)
> > > - error = ep_remove(ep, epi);
> > > - else
> > > + if (epi) {
> > > + /*
> > > + * The eventpoll itself is still alive: the refcount
> > > + * can't go to zero here.
> > > + */
> > > + WARN_ON_ONCE(ep_remove(ep, epi));
> >
> > There are similar examples of calling ep_remove() without checking the
> > return value in ep_insert().
>
> Yes, the error paths in ep_insert(). I added a comment referring to all
> of them, trying to explain that ep_dispose() is not needed there.
>
> > I believe we should add a similar comment there, and maybe a
> > WARN_ON_ONCE. I'm not sure, but it might be worth adding a new helper
> > given this repeated pattern?
>
> I like the idea of such helper. I'll use it in the next iteration, if
> there is a reasonable agreement on this patch.
>
> Whould 'ep_remove_safe()' fit as the helper's name?
'ep_remove_safe()' sounds great to me.
Thanks!
Soheil
> Thanks,
>
> Paolo
>
next prev parent reply other threads:[~2022-11-22 17:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <JC496tgWfxduEmoxmrAOgiDTlftsRu1Lm9JOtXBEqnBKJVo0nLMyqk9ho4p08Q2PFZF6_F3j25b346oqwqSVOw==@protonmail.internalid>
2022-11-22 15:42 ` [REPOST PATCH] epoll: use refcount to reduce ep_mutex contention Paolo Abeni
2022-11-22 16:18 ` Soheil Hassas Yeganeh
2022-11-22 16:59 ` Paolo Abeni
2022-11-22 17:03 ` Soheil Hassas Yeganeh [this message]
2022-11-23 13:27 ` Carlos Maiolino
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CACSApvaMCeKLn88uNAWOxrzPWC9Rr2BZLa3--6TQuY6toYZdOg@mail.gmail.com \
--to=soheil@google.com \
--cc=cmaiolino@redhat.com \
--cc=dave@stgolabs.net \
--cc=jbaron@akamai.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=rpenyaev@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).