linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org,
	jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk,
	tytso@mit.edu, Oleg Nesterov <oleg@redhat.com>,
	srivatsa.bhat@linux.vnet.ibm.com,
	Christoph Lameter <cl@linux.com>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Rusty Russell <rusty@rustcorp.com.au>
Subject: Re: [PATCH 23/32] Generic dynamic per cpu refcounting
Date: Thu, 24 Jan 2013 16:51:36 -0800	[thread overview]
Message-ID: <20130125005136.GE2373@mtj.dyndns.org> (raw)
In-Reply-To: <1356573611-18590-26-git-send-email-koverstreet@google.com>

(cc'ing percpu / rcu crowd)

Hello, Kent.

On Wed, Dec 26, 2012 at 06:00:02PM -0800, Kent Overstreet wrote:
> This implements a refcount with similar semantics to
> atomic_get()/atomic_dec_and_test(), that starts out as just an atomic_t
> but dynamically switches to per cpu refcounting when the rate of
> gets/puts becomes too high.

I'm not sure this is necessary.  Percpu memory is expensive but not
that expensive.  Perpcu memories are tightly packed and if you
allocate, say, 4 bytes, it's really gonna be 4 bytes for each possible
CPU, and the number of possible CPUs is determined during boot to the
number of CPUs the platform may have while booted.  ie. On machines w/
8 CPU threads which don't have extra CPU sockets or don't support CPU
hotplugging (most don't), nr_possible_cpus would exactly be 8 and you
would be allocating 32 bytes of memory per each 4 byte percpu
allocation.

Memory size usually having very strong correlation with the number of
CPUs on the system, it usually isn't worth optimizing out percpu
allocation like this.  Especially not for a single counter.

Maybe this one is way more ambitious than I think but it seems quite a
bit over engineered.

> It also implements two stage shutdown, as we need it to tear down the
> percpu counts. Before dropping the initial refcount, you must call
> percpu_ref_kill(); this puts the refcount in "shutting down mode" and
> switches back to a single atomic refcount with the appropriate barriers
> (synchronize_rcu()).

Maybe if we have tryget() which only succeeds if the counter is alive,
we can replace moulde refcnt with this?  Rusty?

> +static void percpu_ref_alloc(struct percpu_ref *ref, unsigned __user *pcpu_count)
> +{
> +	unsigned __percpu *new;
> +	unsigned long last = (unsigned long) pcpu_count;
> +	unsigned long now = jiffies;
> +
> +	now <<= PCPU_STATUS_BITS;
> +	now |= PCPU_REF_NONE;
> +
> +	if (now - last <= HZ << PCPU_STATUS_BITS) {
> +		rcu_read_unlock();
> +		new = alloc_percpu(unsigned);
> +		rcu_read_lock();

I suppose RCU is used to make sure the dying status is visible while
trying to drain percpu counters?  Requiring rcu locking for refcnt is
very unusure and it would probably be better to use
synchronize_sched[_expedited]() instead in combination w/ preemp or
irq flipping.

> +		if (!new)
> +			goto update_time;
> +
> +		BUG_ON(((unsigned long) new) & PCPU_STATUS_MASK);
> +
> +		if (cmpxchg(&ref->pcpu_count, pcpu_count, new) != pcpu_count)
> +			free_percpu(new);
> +		else
> +			pr_debug("created");
> +	} else {
> +update_time:	new = (void *) now;
> +		cmpxchg(&ref->pcpu_count, pcpu_count, new);
> +	}
> +}

The above function needs a lot more documentation on synchronization
and just general operation.

Overall, I don't know.  It feels quite over-engineered.  I really
don't think dynamic allocation is justified.  It also makes the
interface prone to misuse.  It'll be easy to have mixed alloc and
noalloc sites and then lose alloc ones or just foget about the
distinction and end up with refcnts which never convert to percpu one
and there will be no way to easily identify those.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

  parent reply	other threads:[~2013-01-25  0:51 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-27  1:59 [PATCH 00/32] AIO performance improvements/cleanups, v3 Kent Overstreet
2012-12-27  1:59 ` [PATCH 01/32] mm: remove old aio use_mm() comment Kent Overstreet
2012-12-27  1:59 ` [PATCH 02/32] aio: remove dead code from aio.h Kent Overstreet
2012-12-27  1:59 ` [PATCH 03/32] gadget: remove only user of aio retry Kent Overstreet
2012-12-27  1:59 ` [PATCH 04/32] aio: remove retry-based AIO Kent Overstreet
2012-12-29  7:36   ` Hillf Danton
2013-01-07 22:12     ` Kent Overstreet
2012-12-29  7:47   ` Hillf Danton
2013-01-07 22:15     ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 05/32] char: add aio_{read,write} to /dev/{null,zero} Kent Overstreet
2012-12-27  1:59 ` [PATCH 06/32] aio: Kill return value of aio_complete() Kent Overstreet
2012-12-27  1:59 ` [PATCH 07/32] aio: kiocb_cancel() Kent Overstreet
2012-12-27  1:59 ` [PATCH 08/32] aio: Move private stuff out of aio.h Kent Overstreet
2012-12-27  1:59 ` [PATCH 09/32] aio: dprintk() -> pr_debug() Kent Overstreet
2012-12-27  1:59 ` [PATCH 10/32] aio: do fget() after aio_get_req() Kent Overstreet
2012-12-27  1:59 ` [PATCH 11/32] aio: Make aio_put_req() lockless Kent Overstreet
2012-12-27  1:59 ` [PATCH 12/32] aio: Refcounting cleanup Kent Overstreet
2012-12-27  1:59 ` [PATCH 13/32] wait: Add wait_event_hrtimeout() Kent Overstreet
2012-12-27 10:37   ` Fubo Chen
2013-01-03 23:08   ` Andrew Morton
2013-01-08  0:09     ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 14/32] aio: Make aio_read_evt() more efficient, convert to hrtimers Kent Overstreet
2013-01-03 23:19   ` Andrew Morton
2013-01-08  0:28     ` Kent Overstreet
2013-01-08  1:00       ` Andrew Morton
2013-01-08  1:28         ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 15/32] aio: Use flush_dcache_page() Kent Overstreet
2012-12-27  1:59 ` [PATCH 16/32] aio: Use cancellation list lazily Kent Overstreet
2012-12-27  1:59 ` [PATCH 17/32] aio: Change reqs_active to include unreaped completions Kent Overstreet
2012-12-27  1:59 ` [PATCH 18/32] aio: Kill batch allocation Kent Overstreet
2012-12-27  1:59 ` [PATCH 19/32] aio: Kill struct aio_ring_info Kent Overstreet
2012-12-27  1:59 ` [PATCH 20/32] aio: Give shared kioctx fields their own cachelines Kent Overstreet
2013-01-03 23:25   ` Andrew Morton
2013-01-07 23:48     ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 21/32] aio: reqs_active -> reqs_available Kent Overstreet
2012-12-27  2:00 ` [PATCH 22/32] aio: percpu reqs_available Kent Overstreet
2012-12-27  2:00 ` [PATCH 23/32] Generic dynamic per cpu refcounting Kent Overstreet
2013-01-03 22:48   ` Andrew Morton
2013-01-07 23:47     ` Kent Overstreet
2013-01-08  1:03       ` [PATCH] percpu-refcount: Sparse fixes Kent Overstreet
2013-01-25  0:51   ` Tejun Heo [this message]
2013-01-25  1:13     ` [PATCH 23/32] Generic dynamic per cpu refcounting Kent Overstreet
2013-01-25  2:03       ` Tejun Heo
2013-01-25  2:09         ` Tejun Heo
2013-01-28 17:48           ` Kent Overstreet
2013-01-28 18:18             ` Tejun Heo
2013-01-25  6:15     ` Rusty Russell
2013-01-28 17:53       ` Kent Overstreet
2013-01-28 17:59         ` Tejun Heo
2013-01-28 18:32           ` Kent Overstreet
2013-01-28 18:57             ` Christoph Lameter
2013-02-08 14:44   ` Tejun Heo
2013-02-08 14:49     ` Jens Axboe
2013-02-08 17:50       ` Andrew Morton
2013-02-08 21:27       ` Kent Overstreet
2013-02-11 14:21         ` Jeff Moyer
2013-02-08 21:17     ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 24/32] aio: Percpu ioctx refcount Kent Overstreet
2012-12-27  2:00 ` [PATCH 25/32] aio: use xchg() instead of completion_lock Kent Overstreet
2013-01-03 23:34   ` Andrew Morton
2013-01-07 23:21     ` Kent Overstreet
2013-01-07 23:35       ` Andrew Morton
2013-01-08  0:01         ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 26/32] aio: Don't include aio.h in sched.h Kent Overstreet
2012-12-27  2:00 ` [PATCH 27/32] aio: Kill ki_key Kent Overstreet
2012-12-27  2:00 ` [PATCH 28/32] aio: Kill ki_retry Kent Overstreet
2012-12-27  2:00 ` [PATCH 29/32] block, aio: Batch completion for bios/kiocbs Kent Overstreet
2013-01-04  9:22   ` Jens Axboe
2013-01-07 23:34     ` Kent Overstreet
2013-01-08 15:33       ` Jeff Moyer
2013-01-08 16:06         ` Kent Overstreet
2013-01-08 16:15           ` Jeff Moyer
2013-01-08 16:48             ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 30/32] virtio-blk: Convert to batch completion Kent Overstreet
2012-12-27  2:00 ` [PATCH 31/32] mtip32xx: " Kent Overstreet
2012-12-27  2:00 ` [PATCH 32/32] aio: Smoosh struct kiocb Kent Overstreet
2013-01-04  9:22 ` [PATCH 00/32] AIO performance improvements/cleanups, v3 Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130125005136.GE2373@mtj.dyndns.org \
    --to=tj@kernel.org \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=cl@linux.com \
    --cc=jmoyer@redhat.com \
    --cc=koverstreet@google.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rusty@rustcorp.com.au \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).