netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: Jacob Keller <jacob.e.keller@intel.com>, linux-fsdevel@vger.kernel.org
Cc: Soheil Hassas Yeganeh <soheil@google.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Jason Baron <jbaron@akamai.com>,
	netdev@vger.kernel.org, Carlos Maiolino <cmaiolino@redhat.com>,
	Eric Biggers <ebiggers@kernel.org>
Subject: Re: [PATCH v3] epoll: use refcount to reduce ep_mutex contention
Date: Tue, 29 Nov 2022 10:05:46 +0100	[thread overview]
Message-ID: <e9767b9d708db2593805e8507d3ca43532dad59e.camel@redhat.com> (raw)
In-Reply-To: <477f3642-608f-f710-9eed-6312a6e3f2d8@intel.com>

Hello,

On Mon, 2022-11-28 at 16:06 -0800, Jacob Keller wrote:
> > @@ -217,6 +218,12 @@ struct eventpoll {
> >   	u64 gen;
> >   	struct hlist_head refs;
> >   
> > +	/*
> > +	 * usage count, protected by mtx, used together with epitem->dying to
> > +	 * orchestrate the disposal of this struct
> > +	 */
> > +	unsigned int refcount;
> > +
> 
> Why not use a kref (or at least struct refcount?) those provide some 
> guarantees like guaranteeing atomic operations and saturation when the 
> refcount value would overflow.

Thank you for the feedback!

I thought about that options and ultimately opted otherwise because we
can avoid the additional atomic operations required by kref/refcount_t.
The reference count is always touched under the ep->mtx mutex.
Reasonably this does not introduce performance regressions even in the
stranger corner case.

The above was explicitly noted in the previous revisions commit
message, but I removed it by mistake while updating the message for v3.

I can switch to kref if there is agreement WRT such performance trade-
off. Another option would be adding a couple of additional checks for
wrap-arounds in ep_put() and ep_get() - so that we get similar safety
guarantees but no additional atomic operations.

> >   #ifdef CONFIG_NET_RX_BUSY_POLL
> >   	/* used to track busy poll napi_id */
> >   	unsigned int napi_id;
> > @@ -240,9 +247,7 @@ struct ep_pqueue {
> >   /* Maximum number of epoll watched descriptors, per user */
> >   static long max_user_watches __read_mostly;
> >   
> > -/*
> > - * This mutex is used to serialize ep_free() and eventpoll_release_file().
> > - */
> > +/* Used for cycles detection */
> >   static DEFINE_MUTEX(epmutex);
> >   
> >   static u64 loop_check_gen = 0;
> > @@ -555,8 +560,7 @@ static void ep_remove_wait_queue(struct eppoll_entry *pwq)
> >   
> >   /*
> >    * This function unregisters poll callbacks from the associated file
> > - * descriptor.  Must be called with "mtx" held (or "epmutex" if called from
> > - * ep_free).
> > + * descriptor.  Must be called with "mtx" held.
> >    */
> >   static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
> >   {
> > @@ -679,11 +683,38 @@ static void epi_rcu_free(struct rcu_head *head)
> >   	kmem_cache_free(epi_cache, epi);
> >   }
> >   
> > +static void ep_get(struct eventpoll *ep)
> > +{
> > +	ep->refcount++;
> > +}
> This would become something like "kref_get(&ep->kref)" or maybe even 
> something like "kref_get_unless_zero" or some other form depending on 
> exactly how you acquire a pointer to an eventpoll structure.

No need for kref_get_unless_zero here, in all ep_get() call-sites we
know that at least onother reference is alive and can't go away
concurrently.

> > +
> > +/*
> > + * Returns true if the event poll can be disposed
> > + */
> > +static bool ep_put(struct eventpoll *ep)
> > +{
> > +	if (--ep->refcount)
> > +		return false;
> > +
> > +	WARN_ON_ONCE(!RB_EMPTY_ROOT(&ep->rbr.rb_root));
> > +	return true;
> > +}
> 
> This could become kref_put(&ep->kref, ep_dispose).

I think it would be necessary releasing the ep->mtx mutex before
invoking ep_dispose()...

> > +
> > +static void ep_dispose(struct eventpoll *ep)
> > +{
> > +	mutex_destroy(&ep->mtx);
> > +	free_uid(ep->user);
> > +	wakeup_source_unregister(ep->ws);
> > +	kfree(ep);
> > +}
> This would takea  kref pointer, use container_of to get to the eventpoll 
> structure, and then perform necessary cleanup once all references drop.
> 
> The exact specific steps here and whether it would still be safe to call 
> mutex_destroy is a bit unclear since you typically would only call 
> mutex_destroy when its absolutely sure that no one has locked the mutex.

... due to the above. The current patch code ensures that ep_dispose()
is called only after the ep->mtx mutex is released.


> See Documentation/core-api/kref.rst for a better overview of the API and 
> how to use it safely. I suspect that with just kref you could also 
> safely avoid the "dying" flag as well, but I am not 100% sure.

I *think* we will still need the 'dying' flag, otherwise ep_free()
can't tell if the traversed epitems entries still held a reference to
struct eventpoll - eventpoll_release_file() and ep_free() could
potentially try to release the same reference twice and kref could
detect that race only for the last reference.

Thanks!

Paolo


  reply	other threads:[~2022-11-29  9:06 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 18:00 [PATCH v3] epoll: use refcount to reduce ep_mutex contention Paolo Abeni
2022-11-29  0:06 ` Jacob Keller
2022-11-29  9:05   ` Paolo Abeni [this message]
2022-11-29 18:25     ` Jacob Keller
2022-12-13  6:01 ` Eric Biggers
2022-12-13 18:21   ` Paolo Abeni
2022-12-13 18:59     ` Eric Biggers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e9767b9d708db2593805e8507d3ca43532dad59e.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=cmaiolino@redhat.com \
    --cc=dave@stgolabs.net \
    --cc=ebiggers@kernel.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jbaron@akamai.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=soheil@google.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).