All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kent Overstreet <koverstreet@google.com>
To: Oleg Nesterov <oleg@redhat.com>
Cc: tj@kernel.org, srivatsa.bhat@linux.vnet.ibm.com,
	rusty@rustcorp.com.au, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] generic dynamic per cpu refcounting
Date: Mon, 28 Jan 2013 10:07:35 -0800	[thread overview]
Message-ID: <20130128180735.GY26407@google.com> (raw)
In-Reply-To: <20130125180941.GA16896@redhat.com>

On Fri, Jan 25, 2013 at 07:09:41PM +0100, Oleg Nesterov wrote:
> (add lkml)
> 
> On 01/24, Kent Overstreet wrote:
> >
> > This has already been on lkml and is in Andrew's tree, Tejun just asked
> > me to send it out again:
> 
> I'll try to read this code later, just a couple of questions after a quick
> glance. Sorry this was already discussed...

No worries, it wasn't that widely circulated.

> > +struct percpu_ref {
> > +	atomic64_t		count;
> > +	unsigned long		pcpu_count;
> > +};
> 
> The code looks a bit tricky mostly because you pack state/pointer/jiffies
> into ->pcpu_count. The same for ->count.

Yes, it is.

> I assume that you have a good reason to shrink the sizeof(percpu_ref), but
> I am curious: who is the user of this thing?

Right now - just the aio code, but the idea was to make it as close to a
drop in replacement for atomic_t + atomic_get()/atomic_dec_and_test() as
possible.

> > + * percpu_ref_get - increment a dynamic percpu refcount
> > + *
> > + * Increments @ref and possibly converts it to percpu counters. Must be called
> > + * with rcu_read_lock() held, and may potentially drop/reacquire rcu_read_lock()
> > + * to allocate percpu counters - if sleeping/allocation isn't safe for some
> > + * other reason (e.g. a spinlock), see percpu_ref_get_noalloc().
> 
> And this looks strange. It must be called under rcu_read_lock(), but
> ->rcu_read_lock_nesting must be == 1. Otherwise rcu_read_unlock() in
> percpu_ref_alloc() won't work.
> 
> Again, I think you have a reason, but could you explain? IOW, why we
> can't make it might_sleep() instead? The fast path can do rcu_read_lock()
> itself.

It's stupid and contorted because I didn't have any better ideas when I
first wrote it and haven't fixed it yet.

> > +static inline void percpu_ref_get_noalloc(struct percpu_ref *ref)
> > +{
> > +	__percpu_ref_get(ref, false);
> > +}
> 
> and this could be percpu_ref_get_atomic().
> 
> Once again, I am not arguing, just can't understand.

Same deal, I'm going to get rid of the two different versions.

> > +void __percpu_ref_get(struct percpu_ref *ref, bool alloc)
> > +{
> > +	unsigned long pcpu_count;
> > +	uint64_t v;
> > +
> > +	pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> > +
> > +	if (REF_STATUS(pcpu_count) == PCPU_REF_PTR) {
> > +		/* for rcu - we're not using rcu_dereference() */
> > +		smp_read_barrier_depends();
> > +		__this_cpu_inc(*((unsigned __percpu *) pcpu_count));
> 
> The comment looks confusing a bit... smp_read_barrier_depends() is not
> for rcu, we obviously need it to access (unsigned __percpu *) pcpu_count.
> But yes, since we didn't use rcu_dereference() we have to add it by hand.

Yeah - originally I was using rcu_dereference(), but sparse hated
combining __percpu and __rcu and I couldn't get it to stop complaining.

> 
> > +int percpu_ref_kill(struct percpu_ref *ref)
> > +{
> > ...
> > +	if (status == PCPU_REF_PTR) {
> > +		unsigned count = 0, cpu;
> > +
> > +		synchronize_rcu();
> > +
> > +		for_each_possible_cpu(cpu)
> > +			count += *per_cpu_ptr((unsigned __percpu *) pcpu_count, cpu);
> > +
> > +		pr_debug("global %lli pcpu %i",
> > +			 atomic64_read(&ref->count) & PCPU_COUNT_MASK,
> > +			 (int) count);
> > +
> > +		atomic64_add((int) count, &ref->count);
> > +		smp_wmb();
> > +		/* Between setting global count and setting PCPU_REF_DEAD */
> > +		ref->pcpu_count = PCPU_REF_DEAD;
> 
> The coment explains what the code does, but not why ;)

That seems like a more straightforward barrier than most... we need the
refcount to be consistent before setting the state to dead :P

> I guess this is for percpu_ref_put(), and this wmb() pairs with implicit
> mb() implied by atomic64_dec_return().

Yeah. I expanded the comment there a bit...

> 
> > +		free_percpu((unsigned __percpu *) pcpu_count);
> 
> I guess it could be freed right after for_each_possible_cpu() above, but
> this doesn't matter.

I think that'd be better though, I'll switch it.

      parent reply	other threads:[~2013-01-28 18:07 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20130124232024.GA584@google.com>
2013-01-25 18:09 ` [PATCH] generic dynamic per cpu refcounting Oleg Nesterov
2013-01-25 18:29   ` Oleg Nesterov
2013-01-28 18:10     ` Kent Overstreet
2013-01-28 18:50       ` Oleg Nesterov
2013-01-25 19:11   ` Oleg Nesterov
2013-01-28 18:15     ` Kent Overstreet
2013-01-28 18:27       ` Tejun Heo
2013-01-28 18:49         ` Kent Overstreet
2013-01-28 18:55           ` Tejun Heo
2013-01-28 20:22             ` Kent Overstreet
2013-01-28 20:27               ` Tejun Heo
2013-01-28 20:55                 ` Kent Overstreet
2013-01-28 21:18                   ` Tejun Heo
2013-01-28 21:24                     ` Kent Overstreet
2013-01-28 21:28                       ` Tejun Heo
2013-01-28 21:36                         ` Tejun Heo
2013-01-28 21:48                           ` Kent Overstreet
2013-01-28 21:45                         ` Kent Overstreet
2013-01-28 21:50                           ` Tejun Heo
2013-01-29 16:39                             ` Kent Overstreet
2013-01-29 19:29                               ` Tejun Heo
2013-01-29 19:51                                 ` Kent Overstreet
2013-01-29 20:02                                   ` Tejun Heo
2013-01-29 21:45                                     ` Kent Overstreet
2013-01-29 22:06                                       ` Tejun Heo
2013-01-29 18:04                             ` [PATCH] module: Convert to generic percpu refcounts Kent Overstreet
2013-01-28 18:07   ` Kent Overstreet [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130128180735.GY26407@google.com \
    --to=koverstreet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=rusty@rustcorp.com.au \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.