From: Rusty Russell <rusty@rustcorp.com.au>
To: Kent Overstreet <koverstreet@google.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-aio@kvack.org
Cc: akpm@linux-foundation.org,
Kent Overstreet <koverstreet@google.com>,
Zach Brown <zab@redhat.com>, Felipe Balbi <balbi@ti.com>,
Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Mark Fasheh <mfasheh@suse.com>, Joel Becker <jlbec@evilplan.org>,
Jens Axboe <axboe@kernel.dk>,
Asai Thambi S P <asamymuthupa@micron.com>,
Selvan Mani <smani@micron.com>,
Sam Bradshaw <sbradshaw@micron.com>,
Jeff Moyer <jmoyer@redhat.com>, Al Viro <viro@zeniv.linux.org.uk>,
Benjamin LaHaise <bcrl@kvack.org>, Tejun Heo <tj@kernel.org>,
Oleg Nesterov <oleg@redhat.com>,
Christoph Lameter <cl@linux-foundation.org>,
Ingo Molnar <mingo@redhat.com>
Subject: Re: [PATCH 04/21] Generic percpu refcounting
Date: Thu, 16 May 2013 09:56:19 +0930 [thread overview]
Message-ID: <87y5bfzs5w.fsf@rustcorp.com.au> (raw)
In-Reply-To: <1368494338-7069-5-git-send-email-koverstreet@google.com>
Kent Overstreet <koverstreet@google.com> writes:
> This implements a refcount with similar semantics to
> atomic_get()/atomic_dec_and_test() - but percpu.
Ah! This is why I was CC'd... Now I understand. Thanks :)
Delighted to see someone chasing this. I had an implementation of such
a thing last decade, but the slowmode pattern didn't make for trivial
kref conversions, so I dropped it.
Note: I haven't read the other feedback yet, so ignore if dups.
> +int percpu_ref_init(struct percpu_ref *ref);
Why not just run is slow mode when allocation fails? Things which can't
fail make for simpler use.
> +int percpu_ref_tryget(struct percpu_ref *ref);
> +int percpu_ref_put_initial_ref(struct percpu_ref *ref);
This is part of a slightly different pattern: the owned refcount.
In fact, I think that's the most sane pattern to use (but I could be
wrong; does the AIO stuff fit?). If so, promote this to the first class
citizen, and if necessary expose kill as __percpu_ref_kill()?
(I might suggest percpu_ref_owner_put() as a name, in fact).
> +/**
> + * percpu_ref_get - increment a dynamic percpu refcount
> + *
> + * Analagous to atomic_inc().
> + */
> +static inline void percpu_ref_get(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> +
> + preempt_disable();
> +
> + pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> + if (pcpu_count)
> + __this_cpu_inc(*pcpu_count);
> + else
> + atomic_inc(&ref->count);
> +
> + preempt_enable();
> +}
s/preempt_disable()/rcu_read_lock()/ ?
> +/**
> + * percpu_ref_put - decrement a dynamic percpu refcount
> + *
> + * Returns true if the result is 0, otherwise false; only checks for the ref
> + * hitting 0 after percpu_ref_kill() has been called. Analagous to
> + * atomic_dec_and_test().
> + */
> +static inline int percpu_ref_put(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> + int ret = 0;
> +
> + preempt_disable();
> +
> + pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> + if (pcpu_count)
> + __this_cpu_dec(*pcpu_count);
> + else
> + ret = atomic_dec_and_test(&ref->count);
> +
> + preempt_enable();
> +
> + return ret;
> +}
Here too. And if you don't put unlikely() in this code, you lose kernel
hacker points :)
And int/true/false is for old-timers.
> +
> +unsigned percpu_ref_count(struct percpu_ref *ref);
> +int percpu_ref_kill(struct percpu_ref *ref);
> +
> +/**
> + * percpu_ref_dead - check if a dynamic percpu refcount is shutting down
> + *
> + * Returns true if percpu_ref_kill() has been called on @ref, false otherwise.
> + */
> +static inline int percpu_ref_dead(struct percpu_ref *ref)
> +{
> + return ref->pcpu_count == NULL;
> +}
Can you unexpose these? I think percpu_ref_init(), ...get(), ...put()
and ...put_initial() are a nicer API.
> +int percpu_ref_kill(struct percpu_ref *ref)
> +{
> + unsigned __percpu *pcpu_count;
> + unsigned __percpu *old;
> + unsigned count = 0;
> + int cpu;
> +
> + pcpu_count = ACCESS_ONCE(ref->pcpu_count);
> +
> + do {
> + if (!pcpu_count)
> + return 0;
> +
> + old = pcpu_count;
> + pcpu_count = cmpxchg(&ref->pcpu_count, old, NULL);
> + } while (pcpu_count != old);
This is more complex than it needs to be, no?
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
if (!pcpu_count)
return 0;
if (cmpxchg(&ref->pcpu_count, pcpu_count, NULL) == NULL)
return 0;
Of course, if all callers use the owner pattern, this is simply:
pcpu_count = ACCESS_ONCE(ref->pcpu_count);
BUG_ON(!pcpu_count);
> + synchronize_sched();
synchronize_rcu() ?
> + for_each_possible_cpu(cpu)
> + count += *per_cpu_ptr(pcpu_count, cpu);
> +
> + free_percpu(pcpu_count);
> +
> + pr_debug("global %lli pcpu %i",
> + (int64_t) atomic_read(&ref->count), (int) count);
> +
> + atomic_add((int) count - PCPU_COUNT_BIAS, &ref->count);
> +
> + return 1;
> +}
> +
> +/**
> + * percpu_ref_put_initial_ref - safely drop the initial ref
> + *
> + * A percpu refcount needs a shutdown sequence before dropping the initial ref,
> + * to put it back into single atomic_t mode with the appropriate barriers so
> + * that percpu_ref_put() can safely check for it hitting 0 - this does so.
> + *
> + * Returns true if @ref hit 0.
> + */
> +int percpu_ref_put_initial_ref(struct percpu_ref *ref)
> +{
> + if (percpu_ref_kill(ref)) {
> + return percpu_ref_put(ref);
> + } else {
> + WARN_ON(1);
> + return 0;
> + }
> +}
Note that percpu_ref_restore_initial_ref() is also possible, and may be
useful for the module code... (or percpu_ref_owner_get).
Great stuff!
Rusty.
next prev parent reply other threads:[~2013-05-16 1:07 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-05-14 1:18 AIO refactoring/performance improvements/cancellation Kent Overstreet
2013-05-14 1:18 ` [PATCH 01/21] aio: fix kioctx not being freed after cancellation at exit time Kent Overstreet
2013-05-14 1:18 ` [PATCH 02/21] aio: reqs_active -> reqs_available Kent Overstreet
2013-05-14 1:18 ` [PATCH 03/21] aio: percpu reqs_available Kent Overstreet
2013-05-14 1:18 ` [PATCH 04/21] Generic percpu refcounting Kent Overstreet
2013-05-14 13:51 ` Oleg Nesterov
2013-05-15 8:21 ` Kent Overstreet
2013-05-14 14:59 ` Tejun Heo
2013-05-14 15:28 ` Oleg Nesterov
2013-05-15 9:00 ` Kent Overstreet
2013-05-15 8:58 ` Kent Overstreet
2013-05-15 17:37 ` Tejun Heo
2013-05-28 23:47 ` Kent Overstreet
2013-05-29 1:11 ` Tejun Heo
2013-05-29 4:59 ` Rusty Russell
2013-05-31 20:12 ` Kent Overstreet
2013-05-14 21:59 ` Tejun Heo
2013-05-14 22:15 ` Tejun Heo
2013-05-15 9:07 ` Kent Overstreet
2013-05-15 17:56 ` Tejun Heo
2013-05-16 0:26 ` Rusty Russell [this message]
2013-05-14 1:18 ` [PATCH 05/21] aio: percpu ioctx refcount Kent Overstreet
2013-05-14 1:18 ` [PATCH 06/21] aio: io_cancel() no longer returns the io_event Kent Overstreet
2013-05-14 1:18 ` [PATCH 07/21] aio: Don't use ctx->tail unnecessarily Kent Overstreet
2013-05-14 1:18 ` [PATCH 08/21] aio: Kill aio_rw_vect_retry() Kent Overstreet
2013-05-14 1:18 ` [PATCH 09/21] aio: Kill unneeded kiocb members Kent Overstreet
2013-05-14 1:18 ` [PATCH 10/21] aio: Kill ki_users Kent Overstreet
2013-05-14 1:18 ` [PATCH 11/21] aio: Kill ki_dtor Kent Overstreet
2013-05-14 1:18 ` [PATCH 12/21] aio: convert the ioctx list to radix tree Kent Overstreet
2013-05-14 1:18 ` [PATCH 13/21] block: prep work for batch completion Kent Overstreet
2013-05-14 1:18 ` [PATCH 14/21] block, aio: batch completion for bios/kiocbs Kent Overstreet
2013-05-14 1:18 ` [PATCH 15/21] virtio-blk: convert to batch completion Kent Overstreet
2013-05-14 1:18 ` [PATCH 16/21] mtip32xx: " Kent Overstreet
2013-05-14 1:18 ` [PATCH 17/21] Percpu tag allocator Kent Overstreet
2013-05-14 13:48 ` Oleg Nesterov
2013-05-14 14:24 ` Oleg Nesterov
2013-05-15 9:34 ` Kent Overstreet
2013-05-15 9:25 ` Kent Overstreet
2013-05-15 15:41 ` Oleg Nesterov
2013-05-15 16:10 ` Oleg Nesterov
2013-06-10 23:20 ` Kent Overstreet
2013-06-11 17:42 ` Oleg Nesterov
2013-05-14 15:03 ` Tejun Heo
2013-05-15 20:19 ` Andi Kleen
2013-05-14 1:18 ` [PATCH 18/21] aio: Allow cancellation without a cancel callback, new kiocb lookup Kent Overstreet
2013-05-14 1:18 ` [PATCH 19/21] aio/usb: Update cancellation for new synchonization Kent Overstreet
2013-05-14 1:18 ` [PATCH 20/21] direct-io: Set dio->io_error directly Kent Overstreet
2013-05-14 1:18 ` [PATCH 21/21] block: Bio cancellation Kent Overstreet
2013-05-15 17:52 ` Jens Axboe
2013-05-15 19:29 ` Kent Overstreet
2013-05-15 20:01 ` Jens Axboe
2013-05-31 22:52 ` Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87y5bfzs5w.fsf@rustcorp.com.au \
--to=rusty@rustcorp.com.au \
--cc=akpm@linux-foundation.org \
--cc=asamymuthupa@micron.com \
--cc=axboe@kernel.dk \
--cc=balbi@ti.com \
--cc=bcrl@kvack.org \
--cc=cl@linux-foundation.org \
--cc=gregkh@linuxfoundation.org \
--cc=jlbec@evilplan.org \
--cc=jmoyer@redhat.com \
--cc=koverstreet@google.com \
--cc=linux-aio@kvack.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mfasheh@suse.com \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=sbradshaw@micron.com \
--cc=smani@micron.com \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
--cc=zab@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox