From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754293Ab3AGXVV (ORCPT ); Mon, 7 Jan 2013 18:21:21 -0500 Received: from mail-pb0-f54.google.com ([209.85.160.54]:62621 "EHLO mail-pb0-f54.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752470Ab3AGXVT (ORCPT ); Mon, 7 Jan 2013 18:21:19 -0500 Date: Mon, 7 Jan 2013 15:21:15 -0800 From: Kent Overstreet To: Andrew Morton Cc: linux-kernel@vger.kernel.org, linux-aio@kvack.org, linux-fsdevel@vger.kernel.org, zab@redhat.com, bcrl@kvack.org, jmoyer@redhat.com, axboe@kernel.dk, viro@zeniv.linux.org.uk, tytso@mit.edu Subject: Re: [PATCH 25/32] aio: use xchg() instead of completion_lock Message-ID: <20130107232115.GF26407@google.com> References: <1356573611-18590-1-git-send-email-koverstreet@google.com> <1356573611-18590-28-git-send-email-koverstreet@google.com> <20130103153414.23b0b913.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130103153414.23b0b913.akpm@linux-foundation.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 03, 2013 at 03:34:14PM -0800, Andrew Morton wrote: > On Wed, 26 Dec 2012 18:00:04 -0800 > Kent Overstreet wrote: > > > So, for sticking kiocb completions on the kioctx ringbuffer, we need a > > lock - it unfortunately can't be lockless. > > > > When the kioctx is shared between threads on different cpus and the rate > > of completions is high, this lock sees quite a bit of contention - in > > terms of cacheline contention it's the hottest thing in the aio > > subsystem. > > > > That means, with a regular spinlock, we're going to take a cache miss > > to grab the lock, then another cache miss when we touch the data the > > lock protects - if it's on the same cacheline as the lock, other cpus > > spinning on the lock are going to be pulling it out from under us as > > we're using it. > > > > So, we use an old trick to get rid of this second forced cache miss - > > make the data the lock protects be the lock itself, so we grab them both > > at once. > > Boy I hope you got that right. > > Did you consider using bit_spin_lock() on the upper bit of `tail'? > We've done that in other places and we at least know that it works. > And it has the optimisations for CONFIG_SMP=n, understands > CONFIG_DEBUG_SPINLOCK, has arch-specific optimisations, etc. I hadn't thought of that - I think it'd suffer from the same problem as a regular spinlock, where you grab the lock, then go to grab your data but a different CPU grabbed the cacheline you need... But the lock debugging would be nice. It'd probably work to make something generic like bit_spinlock() that also returns some value - or, the recent patches for making spinlocks back off will also help with this problem. So maybe between that and batch completion this patch could be dropped at some point. So, yeah. The code's plenty tested and I went over the barriers, it already had all the needed barriers due to the ringbuffer... and I've done this sort of thing elsewhere too. But it certaintly is a hack and I wouldn't be sad to see it go.