public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Rusty Russell <rusty@rustcorp.com.au>
To: Kent Overstreet <koverstreet@google.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-aio@kvack.org
Cc: akpm@linux-foundation.org,
	Kent Overstreet <koverstreet@google.com>,
	Zach Brown <zab@redhat.com>, Felipe Balbi <balbi@ti.com>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Mark Fasheh <mfasheh@suse.com>, Joel Becker <jlbec@evilplan.org>,
	Jens Axboe <axboe@kernel.dk>,
	Asai Thambi S P <asamymuthupa@micron.com>,
	Selvan Mani <smani@micron.com>,
	Sam Bradshaw <sbradshaw@micron.com>,
	Jeff Moyer <jmoyer@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
	Benjamin LaHaise <bcrl@kvack.org>, Tejun Heo <tj@kernel.org>,
	Oleg Nesterov <oleg@redhat.com>,
	Christoph Lameter <cl@linux-foundation.org>,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH 04/21] Generic percpu refcounting
Date: Thu, 16 May 2013 09:56:19 +0930	[thread overview]
Message-ID: <87y5bfzs5w.fsf@rustcorp.com.au> (raw)
In-Reply-To: <1368494338-7069-5-git-send-email-koverstreet@google.com>

Kent Overstreet <koverstreet@google.com> writes:
> This implements a refcount with similar semantics to
> atomic_get()/atomic_dec_and_test() - but percpu.

Ah!  This is why I was CC'd... Now I understand.  Thanks :)

Delighted to see someone chasing this.  I had an implementation of such
a thing last decade, but the slowmode pattern didn't make for trivial
kref conversions, so I dropped it.

Note: I haven't read the other feedback yet, so ignore if dups.

> +int percpu_ref_init(struct percpu_ref *ref);

Why not just run is slow mode when allocation fails?  Things which can't
fail make for simpler use.

> +int percpu_ref_tryget(struct percpu_ref *ref);
> +int percpu_ref_put_initial_ref(struct percpu_ref *ref);

This is part of a slightly different pattern: the owned refcount.

In fact, I think that's the most sane pattern to use (but I could be
wrong; does the AIO stuff fit?). If so, promote this to the first class
citizen, and if necessary expose kill as __percpu_ref_kill()?

(I might suggest percpu_ref_owner_put() as a name, in fact).

> +/**
> + * percpu_ref_get - increment a dynamic percpu refcount
> + *
> + * Analagous to atomic_inc().
> +  */
> +static inline void percpu_ref_get(struct percpu_ref *ref)
> +{
> +	unsigned __percpu *pcpu_count;
> +
> +	preempt_disable();
> +
> +	pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> +	if (pcpu_count)
> +		__this_cpu_inc(*pcpu_count);
> +	else
> +		atomic_inc(&ref->count);
> +
> +	preempt_enable();
> +}

s/preempt_disable()/rcu_read_lock()/ ?

> +/**
> + * percpu_ref_put - decrement a dynamic percpu refcount
> + *
> + * Returns true if the result is 0, otherwise false; only checks for the ref
> + * hitting 0 after percpu_ref_kill() has been called. Analagous to
> + * atomic_dec_and_test().
> + */
> +static inline int percpu_ref_put(struct percpu_ref *ref)
> +{
> +	unsigned __percpu *pcpu_count;
> +	int ret = 0;
> +
> +	preempt_disable();
> +
> +	pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> +	if (pcpu_count)
> +		__this_cpu_dec(*pcpu_count);
> +	else
> +		ret = atomic_dec_and_test(&ref->count);
> +
> +	preempt_enable();
> +
> +	return ret;
> +}

Here too.  And if you don't put unlikely() in this code, you lose kernel
hacker points :)

And int/true/false is for old-timers.

> +
> +unsigned percpu_ref_count(struct percpu_ref *ref);
> +int percpu_ref_kill(struct percpu_ref *ref);
> +
> +/**
> + * percpu_ref_dead - check if a dynamic percpu refcount is shutting down
> + *
> + * Returns true if percpu_ref_kill() has been called on @ref, false otherwise.
> + */
> +static inline int percpu_ref_dead(struct percpu_ref *ref)
> +{
> +	return ref->pcpu_count == NULL;
> +}

Can you unexpose these?  I think percpu_ref_init(), ...get(), ...put()
and ...put_initial() are a nicer API.

> +int percpu_ref_kill(struct percpu_ref *ref)
> +{
> +	unsigned __percpu *pcpu_count;
> +	unsigned __percpu *old;
> +	unsigned count = 0;
> +	int cpu;
> +
> +	pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> +	do {
> +		if (!pcpu_count)
> +			return 0;
> +
> +		old = pcpu_count;
> +		pcpu_count = cmpxchg(&ref->pcpu_count, old, NULL);
> +	} while (pcpu_count != old);

This is more complex than it needs to be, no?


        pcpu_count = ACCESS_ONCE(ref->pcpu_count);
        if (!pcpu_count)
                return 0;
        if (cmpxchg(&ref->pcpu_count, pcpu_count, NULL) == NULL)
                return 0;

Of course, if all callers use the owner pattern, this is simply:

        pcpu_count = ACCESS_ONCE(ref->pcpu_count);
        BUG_ON(!pcpu_count);

> +	synchronize_sched();

synchronize_rcu() ?

> +	for_each_possible_cpu(cpu)
> +		count += *per_cpu_ptr(pcpu_count, cpu);
> +
> +	free_percpu(pcpu_count);
> +
> +	pr_debug("global %lli pcpu %i",
> +		 (int64_t) atomic_read(&ref->count), (int) count);
> +
> +	atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);
> +
> +	return 1;
> +}
> +
> +/**
> + * percpu_ref_put_initial_ref - safely drop the initial ref
> + *
> + * A percpu refcount needs a shutdown sequence before dropping the initial ref,
> + * to put it back into single atomic_t mode with the appropriate barriers so
> + * that percpu_ref_put() can safely check for it hitting 0 - this does so.
> + *
> + * Returns true if @ref hit 0.
> + */
> +int percpu_ref_put_initial_ref(struct percpu_ref *ref)
> +{
> +	if (percpu_ref_kill(ref)) {
> +		return percpu_ref_put(ref);
> +	} else {
> +		WARN_ON(1);
> +		return 0;
> +	}
> +}

Note that percpu_ref_restore_initial_ref() is also possible, and may be
useful for the module code... (or percpu_ref_owner_get).

Great stuff!
Rusty.

  parent reply	other threads:[~2013-05-16  1:07 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-14  1:18 AIO refactoring/performance improvements/cancellation Kent Overstreet
2013-05-14  1:18 ` [PATCH 01/21] aio: fix kioctx not being freed after cancellation at exit time Kent Overstreet
2013-05-14  1:18 ` [PATCH 02/21] aio: reqs_active -> reqs_available Kent Overstreet
2013-05-14  1:18 ` [PATCH 03/21] aio: percpu reqs_available Kent Overstreet
2013-05-14  1:18 ` [PATCH 04/21] Generic percpu refcounting Kent Overstreet
2013-05-14 13:51   ` Oleg Nesterov
2013-05-15  8:21     ` Kent Overstreet
2013-05-14 14:59   ` Tejun Heo
2013-05-14 15:28     ` Oleg Nesterov
2013-05-15  9:00       ` Kent Overstreet
2013-05-15  8:58     ` Kent Overstreet
2013-05-15 17:37       ` Tejun Heo
2013-05-28 23:47         ` Kent Overstreet
2013-05-29  1:11           ` Tejun Heo
2013-05-29  4:59           ` Rusty Russell
2013-05-31 20:12             ` Kent Overstreet
2013-05-14 21:59   ` Tejun Heo
2013-05-14 22:15     ` Tejun Heo
2013-05-15  9:07     ` Kent Overstreet
2013-05-15 17:56       ` Tejun Heo
2013-05-16  0:26   ` Rusty Russell [this message]
2013-05-14  1:18 ` [PATCH 05/21] aio: percpu ioctx refcount Kent Overstreet
2013-05-14  1:18 ` [PATCH 06/21] aio: io_cancel() no longer returns the io_event Kent Overstreet
2013-05-14  1:18 ` [PATCH 07/21] aio: Don't use ctx->tail unnecessarily Kent Overstreet
2013-05-14  1:18 ` [PATCH 08/21] aio: Kill aio_rw_vect_retry() Kent Overstreet
2013-05-14  1:18 ` [PATCH 09/21] aio: Kill unneeded kiocb members Kent Overstreet
2013-05-14  1:18 ` [PATCH 10/21] aio: Kill ki_users Kent Overstreet
2013-05-14  1:18 ` [PATCH 11/21] aio: Kill ki_dtor Kent Overstreet
2013-05-14  1:18 ` [PATCH 12/21] aio: convert the ioctx list to radix tree Kent Overstreet
2013-05-14  1:18 ` [PATCH 13/21] block: prep work for batch completion Kent Overstreet
2013-05-14  1:18 ` [PATCH 14/21] block, aio: batch completion for bios/kiocbs Kent Overstreet
2013-05-14  1:18 ` [PATCH 15/21] virtio-blk: convert to batch completion Kent Overstreet
2013-05-14  1:18 ` [PATCH 16/21] mtip32xx: " Kent Overstreet
2013-05-14  1:18 ` [PATCH 17/21] Percpu tag allocator Kent Overstreet
2013-05-14 13:48   ` Oleg Nesterov
2013-05-14 14:24     ` Oleg Nesterov
2013-05-15  9:34       ` Kent Overstreet
2013-05-15  9:25     ` Kent Overstreet
2013-05-15 15:41       ` Oleg Nesterov
2013-05-15 16:10         ` Oleg Nesterov
2013-06-10 23:20         ` Kent Overstreet
2013-06-11 17:42           ` Oleg Nesterov
2013-05-14 15:03   ` Tejun Heo
2013-05-15 20:19   ` Andi Kleen
2013-05-14  1:18 ` [PATCH 18/21] aio: Allow cancellation without a cancel callback, new kiocb lookup Kent Overstreet
2013-05-14  1:18 ` [PATCH 19/21] aio/usb: Update cancellation for new synchonization Kent Overstreet
2013-05-14  1:18 ` [PATCH 20/21] direct-io: Set dio->io_error directly Kent Overstreet
2013-05-14  1:18 ` [PATCH 21/21] block: Bio cancellation Kent Overstreet
2013-05-15 17:52   ` Jens Axboe
2013-05-15 19:29     ` Kent Overstreet
2013-05-15 20:01       ` Jens Axboe
2013-05-31 22:52         ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y5bfzs5w.fsf@rustcorp.com.au \
    --to=rusty@rustcorp.com.au \
    --cc=akpm@linux-foundation.org \
    --cc=asamymuthupa@micron.com \
    --cc=axboe@kernel.dk \
    --cc=balbi@ti.com \
    --cc=bcrl@kvack.org \
    --cc=cl@linux-foundation.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=jlbec@evilplan.org \
    --cc=jmoyer@redhat.com \
    --cc=koverstreet@google.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mfasheh@suse.com \
    --cc=mingo@redhat.com \
    --cc=oleg@redhat.com \
    --cc=sbradshaw@micron.com \
    --cc=smani@micron.com \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox