All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org,
	jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk,
	tytso@mit.edu
Subject: Re: [PATCH 25/32] aio: use xchg() instead of completion_lock
Date: Mon, 7 Jan 2013 15:35:24 -0800	[thread overview]
Message-ID: <20130107153524.eb2f1223.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130107232115.GF26407@google.com>

On Mon, 7 Jan 2013 15:21:15 -0800
Kent Overstreet <koverstreet@google.com> wrote:

> On Thu, Jan 03, 2013 at 03:34:14PM -0800, Andrew Morton wrote:
> > On Wed, 26 Dec 2012 18:00:04 -0800
> > Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > So, for sticking kiocb completions on the kioctx ringbuffer, we need a
> > > lock - it unfortunately can't be lockless.
> > > 
> > > When the kioctx is shared between threads on different cpus and the rate
> > > of completions is high, this lock sees quite a bit of contention - in
> > > terms of cacheline contention it's the hottest thing in the aio
> > > subsystem.
> > > 
> > > That means, with a regular spinlock, we're going to take a cache miss
> > > to grab the lock, then another cache miss when we touch the data the
> > > lock protects - if it's on the same cacheline as the lock, other cpus
> > > spinning on the lock are going to be pulling it out from under us as
> > > we're using it.
> > > 
> > > So, we use an old trick to get rid of this second forced cache miss -
> > > make the data the lock protects be the lock itself, so we grab them both
> > > at once.
> > 
> > Boy I hope you got that right.
> > 
> > Did you consider using bit_spin_lock() on the upper bit of `tail'? 
> > We've done that in other places and we at least know that it works. 
> > And it has the optimisations for CONFIG_SMP=n, understands
> > CONFIG_DEBUG_SPINLOCK, has arch-specific optimisations, etc.
> 
> I hadn't thought of that - I think it'd suffer from the same problem as
> a regular spinlock, where you grab the lock, then go to grab your data
> but a different CPU grabbed the cacheline you need...

Either you didn't understand my suggestion or I didn't understand your
patch :(

I'm suggesting that we use the msot significant bit *of the data* as
that data's lock.  Obviously, all uses of that data would then mask that
bit out.

That way, the data will be brought into CPU cache when the lock is
acquired.  And when other CPUs attempt to acquire the lock, they won't
steal the cacheline.

This is assuming that an unsuccessful test_and_set_bit_lock() won't
grab the cacheline, which is hopefully true but I don't know.  If this
turns out to be false then we could add a test_bit() loop to
bit_spin_lock(), or perhaps rework bit_spin_lock() to not do the
test_and_set_bit_lock() unless test_bit() has just returned 0.

> But the lock debugging would be nice. It'd probably work to make
> something generic like bit_spinlock() that also returns some value - or,
> the recent patches for making spinlocks back off will also help with
> this problem. So maybe between that and batch completion this patch
> could be dropped at some point.
> 
> So, yeah. The code's plenty tested and I went over the barriers, it
> already had all the needed barriers due to the ringbuffer... and I've
> done this sort of thing elsewhere too. But it certaintly is a hack and I
> wouldn't be sad to see it go.

Yes, there are a lot of issues with adding a new locking primitive and
in some ways they get worse when they're open-coded like this.  If
there's any way at all of using a standard lock instead of KentLocks
then we should do this.

--
To unsubscribe, send a message with 'unsubscribe linux-aio' in
the body to majordomo@kvack.org.  For more info on Linux AIO,
see: http://www.kvack.org/aio/
Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

WARNING: multiple messages have this Message-ID (diff)
From: Andrew Morton <akpm@linux-foundation.org>
To: Kent Overstreet <koverstreet@google.com>
Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org,
	linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org,
	jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk,
	tytso@mit.edu
Subject: Re: [PATCH 25/32] aio: use xchg() instead of completion_lock
Date: Mon, 7 Jan 2013 15:35:24 -0800	[thread overview]
Message-ID: <20130107153524.eb2f1223.akpm@linux-foundation.org> (raw)
In-Reply-To: <20130107232115.GF26407@google.com>

On Mon, 7 Jan 2013 15:21:15 -0800
Kent Overstreet <koverstreet@google.com> wrote:

> On Thu, Jan 03, 2013 at 03:34:14PM -0800, Andrew Morton wrote:
> > On Wed, 26 Dec 2012 18:00:04 -0800
> > Kent Overstreet <koverstreet@google.com> wrote:
> > 
> > > So, for sticking kiocb completions on the kioctx ringbuffer, we need a
> > > lock - it unfortunately can't be lockless.
> > > 
> > > When the kioctx is shared between threads on different cpus and the rate
> > > of completions is high, this lock sees quite a bit of contention - in
> > > terms of cacheline contention it's the hottest thing in the aio
> > > subsystem.
> > > 
> > > That means, with a regular spinlock, we're going to take a cache miss
> > > to grab the lock, then another cache miss when we touch the data the
> > > lock protects - if it's on the same cacheline as the lock, other cpus
> > > spinning on the lock are going to be pulling it out from under us as
> > > we're using it.
> > > 
> > > So, we use an old trick to get rid of this second forced cache miss -
> > > make the data the lock protects be the lock itself, so we grab them both
> > > at once.
> > 
> > Boy I hope you got that right.
> > 
> > Did you consider using bit_spin_lock() on the upper bit of `tail'? 
> > We've done that in other places and we at least know that it works. 
> > And it has the optimisations for CONFIG_SMP=n, understands
> > CONFIG_DEBUG_SPINLOCK, has arch-specific optimisations, etc.
> 
> I hadn't thought of that - I think it'd suffer from the same problem as
> a regular spinlock, where you grab the lock, then go to grab your data
> but a different CPU grabbed the cacheline you need...

Either you didn't understand my suggestion or I didn't understand your
patch :(

I'm suggesting that we use the msot significant bit *of the data* as
that data's lock.  Obviously, all uses of that data would then mask that
bit out.

That way, the data will be brought into CPU cache when the lock is
acquired.  And when other CPUs attempt to acquire the lock, they won't
steal the cacheline.

This is assuming that an unsuccessful test_and_set_bit_lock() won't
grab the cacheline, which is hopefully true but I don't know.  If this
turns out to be false then we could add a test_bit() loop to
bit_spin_lock(), or perhaps rework bit_spin_lock() to not do the
test_and_set_bit_lock() unless test_bit() has just returned 0.

> But the lock debugging would be nice. It'd probably work to make
> something generic like bit_spinlock() that also returns some value - or,
> the recent patches for making spinlocks back off will also help with
> this problem. So maybe between that and batch completion this patch
> could be dropped at some point.
> 
> So, yeah. The code's plenty tested and I went over the barriers, it
> already had all the needed barriers due to the ringbuffer... and I've
> done this sort of thing elsewhere too. But it certaintly is a hack and I
> wouldn't be sad to see it go.

Yes, there are a lot of issues with adding a new locking primitive and
in some ways they get worse when they're open-coded like this.  If
there's any way at all of using a standard lock instead of KentLocks
then we should do this.


  reply	other threads:[~2013-01-07 23:35 UTC|newest]

Thread overview: 152+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-27  1:59 [PATCH 00/32] AIO performance improvements/cleanups, v3 Kent Overstreet
2012-12-27  1:59 ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 01/32] mm: remove old aio use_mm() comment Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 02/32] aio: remove dead code from aio.h Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 03/32] gadget: remove only user of aio retry Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 04/32] aio: remove retry-based AIO Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-29  7:36   ` Hillf Danton
2012-12-29  7:36     ` Hillf Danton
2013-01-07 22:12     ` Kent Overstreet
2013-01-07 22:12       ` Kent Overstreet
2012-12-29  7:47   ` Hillf Danton
2012-12-29  7:47     ` Hillf Danton
2013-01-07 22:15     ` Kent Overstreet
2013-01-07 22:15       ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 05/32] char: add aio_{read,write} to /dev/{null,zero} Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 06/32] aio: Kill return value of aio_complete() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 07/32] aio: kiocb_cancel() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 08/32] aio: Move private stuff out of aio.h Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 09/32] aio: dprintk() -> pr_debug() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 10/32] aio: do fget() after aio_get_req() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 11/32] aio: Make aio_put_req() lockless Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 12/32] aio: Refcounting cleanup Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 13/32] wait: Add wait_event_hrtimeout() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27 10:37   ` Fubo Chen
2012-12-27 10:37     ` Fubo Chen
2013-01-03 23:08   ` Andrew Morton
2013-01-03 23:08     ` Andrew Morton
2013-01-08  0:09     ` Kent Overstreet
2013-01-08  0:09       ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 14/32] aio: Make aio_read_evt() more efficient, convert to hrtimers Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2013-01-03 23:19   ` Andrew Morton
2013-01-03 23:19     ` Andrew Morton
2013-01-08  0:28     ` Kent Overstreet
2013-01-08  0:28       ` Kent Overstreet
2013-01-08  1:00       ` Andrew Morton
2013-01-08  1:00         ` Andrew Morton
2013-01-08  1:28         ` Kent Overstreet
2013-01-08  1:28           ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 15/32] aio: Use flush_dcache_page() Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 16/32] aio: Use cancellation list lazily Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 17/32] aio: Change reqs_active to include unreaped completions Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 18/32] aio: Kill batch allocation Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 19/32] aio: Kill struct aio_ring_info Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2012-12-27  1:59 ` [PATCH 20/32] aio: Give shared kioctx fields their own cachelines Kent Overstreet
2012-12-27  1:59   ` Kent Overstreet
2013-01-03 23:25   ` Andrew Morton
2013-01-03 23:25     ` Andrew Morton
2013-01-07 23:48     ` Kent Overstreet
2013-01-07 23:48       ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 21/32] aio: reqs_active -> reqs_available Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 22/32] aio: percpu reqs_available Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 23/32] Generic dynamic per cpu refcounting Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-03 22:48   ` Andrew Morton
2013-01-03 22:48     ` Andrew Morton
2013-01-07 23:47     ` Kent Overstreet
2013-01-07 23:47       ` Kent Overstreet
2013-01-08  1:03       ` [PATCH] percpu-refcount: Sparse fixes Kent Overstreet
2013-01-08  1:03         ` Kent Overstreet
2013-01-25  0:51   ` [PATCH 23/32] Generic dynamic per cpu refcounting Tejun Heo
2013-01-25  0:51     ` Tejun Heo
2013-01-25  1:13     ` Kent Overstreet
2013-01-25  1:13       ` Kent Overstreet
2013-01-25  2:03       ` Tejun Heo
2013-01-25  2:03         ` Tejun Heo
2013-01-25  2:09         ` Tejun Heo
2013-01-25  2:09           ` Tejun Heo
2013-01-28 17:48           ` Kent Overstreet
2013-01-28 17:48             ` Kent Overstreet
2013-01-28 18:18             ` Tejun Heo
2013-01-28 18:18               ` Tejun Heo
2013-01-25  6:15     ` Rusty Russell
2013-01-28 17:53       ` Kent Overstreet
2013-01-28 17:53         ` Kent Overstreet
2013-01-28 17:59         ` Tejun Heo
2013-01-28 17:59           ` Tejun Heo
2013-01-28 18:32           ` Kent Overstreet
2013-01-28 18:32             ` Kent Overstreet
2013-01-28 18:57             ` Christoph Lameter
2013-01-28 18:57               ` Christoph Lameter
2013-02-08 14:44   ` Tejun Heo
2013-02-08 14:44     ` Tejun Heo
2013-02-08 14:49     ` Jens Axboe
2013-02-08 14:49       ` Jens Axboe
2013-02-08 17:50       ` Andrew Morton
2013-02-08 17:50         ` Andrew Morton
2013-02-08 21:27       ` Kent Overstreet
2013-02-08 21:27         ` Kent Overstreet
2013-02-11 14:21         ` Jeff Moyer
2013-02-11 14:21           ` Jeff Moyer
2013-02-08 21:17     ` Kent Overstreet
2013-02-08 21:17       ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 24/32] aio: Percpu ioctx refcount Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 25/32] aio: use xchg() instead of completion_lock Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-03 23:34   ` Andrew Morton
2013-01-07 23:21     ` Kent Overstreet
2013-01-07 23:21       ` Kent Overstreet
2013-01-07 23:35       ` Andrew Morton [this message]
2013-01-07 23:35         ` Andrew Morton
2013-01-08  0:01         ` Kent Overstreet
2013-01-08  0:01           ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 26/32] aio: Don't include aio.h in sched.h Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 27/32] aio: Kill ki_key Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 28/32] aio: Kill ki_retry Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 29/32] block, aio: Batch completion for bios/kiocbs Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-04  9:22   ` Jens Axboe
2013-01-04  9:22     ` Jens Axboe
2013-01-07 23:34     ` Kent Overstreet
2013-01-07 23:34       ` Kent Overstreet
2013-01-08 15:33       ` Jeff Moyer
2013-01-08 15:33         ` Jeff Moyer
2013-01-08 16:06         ` Kent Overstreet
2013-01-08 16:06           ` Kent Overstreet
2013-01-08 16:15           ` Jeff Moyer
2013-01-08 16:15             ` Jeff Moyer
2013-01-08 16:48             ` Kent Overstreet
2013-01-08 16:48               ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 30/32] virtio-blk: Convert to batch completion Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 31/32] mtip32xx: " Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2012-12-27  2:00 ` [PATCH 32/32] aio: Smoosh struct kiocb Kent Overstreet
2012-12-27  2:00   ` Kent Overstreet
2013-01-04  9:22 ` [PATCH 00/32] AIO performance improvements/cleanups, v3 Jens Axboe
2013-01-04  9:22   ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130107153524.eb2f1223.akpm@linux-foundation.org \
    --to=akpm@linux-foundation.org \
    --cc=axboe@kernel.dk \
    --cc=bcrl@kvack.org \
    --cc=jmoyer@redhat.com \
    --cc=koverstreet@google.com \
    --cc=linux-aio@kvack.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    --cc=zab@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.