From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755929AbYFFLxS (ORCPT ); Fri, 6 Jun 2008 07:53:18 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753416AbYFFLxJ (ORCPT ); Fri, 6 Jun 2008 07:53:09 -0400 Received: from mail.suse.de ([195.135.220.2]:46664 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753331AbYFFLxI (ORCPT ); Fri, 6 Jun 2008 07:53:08 -0400 Date: Fri, 6 Jun 2008 13:53:05 +0200 From: Nick Piggin To: Linus Torvalds Cc: Ingo Molnar , David Howells , Ulrich Drepper , Linux Kernel Mailing List , Andrew Morton Subject: Re: [PATCH 0/3] 64-bit futexes: Intro Message-ID: <20080606115305.GA20345@wotan.suse.de> References: <4840CE51.9060109@redhat.com> <4840D63F.2090407@redhat.com> <20080602185433.GB4081@elte.hu> <20080606012749.GA12187@wotan.suse.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.9i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jun 05, 2008 at 08:37:19PM -0700, Linus Torvalds wrote: > > > On Fri, 6 Jun 2008, Nick Piggin wrote: > > > > What you *could* maybe do, to slightly speed up the reader fastpath, at > > the expense of the writer fastpath, is to also have the active writer add > > 4 to the count too, so your unlock can start with a lock xadd -4, count > > in order to get the write-intent on the cacheline straight up. > > Yes, nice idea. It avoids the possible unnecessary S->M transition, but > the downside is that it effectively slows down the write unlock by making > it do two atomic ops even for the fastpath. So if I were to _only_ care > about the reader path, I think it would be a great idea, but as it is, the > current non-contended write case is actually pretty close to optimal, and > doing the unconditional xaddl on the unlock path would slow that one down. Yeah, it is a case of a large slowdown for write for a small speedup for read (pity the API doesn't have explicit read and write unlocks -- were they too lazy to type the last bit, or did they expect people to lose track of whether they had a read or write lock? :P) Anyway, it's obviously a tradeoff you'd just have to carefully benchmark in real situations. > > I'd be more interested to know why this code can't be evolved into a full > > rwlock implementation? This is a rather standard (though neat) looking rwlock > > -- so my question is what can the patented 64-bit futex locks do that this > > can't, or what can they do faster? > > Quite frankly - and this was my argument the whole time - I do not believe > consider things like timeouts etc. Timeouts are "hard" to handle because > they mean that you cannot use any kind of trivially incrementing "ticket > locks" with sequence numbers (because we may have to just avoid a sequence > if it times out), so the sequence number approach that we now use for > kernel spinlocks was not an option. I didn't actually *write* the timeout > versions, of course, but given the structure of the locks they really > should be very straightforward. > > [ Half-way subtle thing: a writer that times out needs to be very careful > that it doesn't lose a wakeup event, but futexes actually make that part > pretty easy - since FUTEX_WAIT returns whether you got woken up or not, > you can just decide to wake up the next write-waiter if you cannot get > the lock immediately and have to exit due to a timeout. ] > > But I really haven't tested my rwlocks very exhaustively, and I did not > verify that they actualyl scale with lots of CPU's, for example. I > literally only have dual-core CPU's in use at home, right now, nothing > fancier. Somebody with dual-socket quads would be a lot better off, and > the more the merrier, of course. Well... a single lock is only going to be so scalable. I don't see how it could be done really significantly better? Maybe a small factor of improvement if you were to concentrate on the contended case (but you wouldn't want to do that anyway)