Re: [PATCH] generic dynamic per cpu refcounting

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Kent Overstreet <koverstreet@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>,
	srivatsa.bhat@linux.vnet.ibm.com, rusty@rustcorp.com.au,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] generic dynamic per cpu refcounting
Date: Mon, 28 Jan 2013 10:49:33 -0800	[thread overview]
Message-ID: <20130128184933.GC26407@google.com> (raw)
In-Reply-To: <20130128182737.GC22465@mtj.dyndns.org>

On Mon, Jan 28, 2013 at 10:27:37AM -0800, Tejun Heo wrote:
> Hello, guys.
> 
> On Mon, Jan 28, 2013 at 10:15:28AM -0800, Kent Overstreet wrote:
> > > 	percpu_ref_kill();
> > > 	put_and_dsetroy();
> > > 
> > > And this can race with another holder which drops the last reference,
> > > its put_and_dsetroy() can see PCPU_REF_DYING and return false.
> > > 
> > > Or I misunderstood the code/interface?
> > 
> > Nope, nailed it :) That should _definitely_ be in the documentation.
> 
> Can we just combine kill initiation and base ref put and make that the
> responsibility of the owner?  Extra features on basic constructs may
> seem good for certain use cases but tend to bring more confusion than
> good in the long run.  If a user needs to synchronize among multiple
> killers, let the user deal with the issue.

Don't follow...

Something I forgot to mention in the last mail though is that often the
caller will need its own synchronize_rcu()/call_rcu() -
percpu_ref_kill() corresponds to when you make the object unavailable
(i.e. deleting it from the rcu protected hash table in aio) and you need
a synchronize_rcu() before you drop your initial ref.

So letting the caller do it means the caller can merge the two
synchronize_rcu()s.

> 
> > Actually - I think it'd be better to have the default percpu_ref_kill()
> > do the second synchronize_rcu(), and have an unsafe version that skips
> > it.
> 
> Note that synchronize_rcu/sched() can be very slow and cause problems
> in paths which are frequently traveled and visible to userland.  It's
> fine for things like module destruction but can be a problem even
> during device destruction - blkcg had synchronize_rcu() in
> request_queue destruction which led to huge latencies during boot
> because SCSI wants to create and then destroy request_queues for all
> possible LUNs on certain configurations.  So, if you put
> synchronize_rcu/sched() in percpu_ref_kill(), that better not be used
> from e.g. close(2).

Yeah. It'd be really nice if it was doable without synchronize_rcu(),
but it'd definitely make get/put heavier.

Though, re. close() - considering we only need a synchronize_rcu() if
the ref was in percpu mode, I wonder if that would be a dealbreaker. I
have no clue myself.

Getting rid of synchronize_rcu would basically require turning get and
put into cmpxchg() loops - even in the percpu fastpath. However, percpu
mode would still be getting rid of the shared cacheline contention, we'd
just be adding another branch that can be safely marked unlikely() - and
my current version has one of those already, so two branches instead of
one in the fast path.

I suppose I should give it a shot.

As long as I'm going down that route I could probably make the bare non
percpu ref 8 bytes instead of 16, too...

next prev parent reply	other threads:[~2013-01-28 18:49 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130124232024.GA584@google.com>
2013-01-25 18:09 ` [PATCH] generic dynamic per cpu refcounting Oleg Nesterov
2013-01-25 18:29   ` Oleg Nesterov
2013-01-28 18:10     ` Kent Overstreet
2013-01-28 18:50       ` Oleg Nesterov
2013-01-25 19:11   ` Oleg Nesterov
2013-01-28 18:15     ` Kent Overstreet
2013-01-28 18:27       ` Tejun Heo
2013-01-28 18:49         ` Kent Overstreet [this message]
2013-01-28 18:55           ` Tejun Heo
2013-01-28 20:22             ` Kent Overstreet
2013-01-28 20:27               ` Tejun Heo
2013-01-28 20:55                 ` Kent Overstreet
2013-01-28 21:18                   ` Tejun Heo
2013-01-28 21:24                     ` Kent Overstreet
2013-01-28 21:28                       ` Tejun Heo
2013-01-28 21:36                         ` Tejun Heo
2013-01-28 21:48                           ` Kent Overstreet
2013-01-28 21:45                         ` Kent Overstreet
2013-01-28 21:50                           ` Tejun Heo
2013-01-29 16:39                             ` Kent Overstreet
2013-01-29 19:29                               ` Tejun Heo
2013-01-29 19:51                                 ` Kent Overstreet
2013-01-29 20:02                                   ` Tejun Heo
2013-01-29 21:45                                     ` Kent Overstreet
2013-01-29 22:06                                       ` Tejun Heo
2013-01-29 18:04                             ` [PATCH] module: Convert to generic percpu refcounts Kent Overstreet
2013-01-28 18:07   ` [PATCH] generic dynamic per cpu refcounting Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130128184933.GC26407@google.com \
    --to=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.