From mboxrd@z Thu Jan 1 00:00:00 1970 From: Waiman Long Subject: Re: [PATCH v7 1/4] qrwlock: A queue read/write lock implementation Date: Fri, 22 Nov 2013 15:35:06 -0500 Message-ID: <528FBFFA.1000807@hp.com> References: <1385147087-26588-1-git-send-email-Waiman.Long@hp.com> <1385147087-26588-2-git-send-email-Waiman.Long@hp.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Return-path: Received: from g4t0016.houston.hp.com ([15.201.24.19]:8133 "EHLO g4t0016.houston.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755521Ab3KVUfT (ORCPT ); Fri, 22 Nov 2013 15:35:19 -0500 In-Reply-To: Sender: linux-arch-owner@vger.kernel.org List-ID: To: Linus Torvalds Cc: Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Arnd Bergmann , "linux-arch@vger.kernel.org" , the arch/x86 maintainers , Linux Kernel Mailing List , Peter Zijlstra , Steven Rostedt , Andrew Morton , Michel Lespinasse , Andi Kleen , Rik van Riel , "Paul E. McKenney" , Raghavendra K T , George Spelvin , Tim Chen , Aswin Chandramouleeswaran , Scott J Norton On 11/22/2013 02:14 PM, Linus Torvalds wrote: > On Fri, Nov 22, 2013 at 11:04 AM, Waiman Long wrote: >> In term of single-thread performance (no contention), a 256K >> lock/unlock loop was run on a 2.4GHz and 2.93Ghz Westmere x86-64 >> CPUs. The following table shows the average time (in ns) for a single >> lock/unlock sequence (including the looping and timing overhead): >> >> Lock Type 2.4GHz 2.93GHz >> --------- ------ ------- >> Ticket spinlock 14.9 12.3 >> Read lock 17.0 13.5 >> Write lock 17.0 13.5 >> Queue read lock 16.0 13.4 >> Queue write lock 9.2 7.8 > Can you verify for me that you re-did those numbers? Because it used > to be that the fair queue write lock was slower than the numbers you > now quote.. > > Was the cost of the fair queue write lock purely in the extra > conditional testing for whether the lock was supposed to be fair or > not, and now that you dropped that, it's fast? If so, then that's an > extra argument for the old conditional fair/unfair being complete > garbage. Yes, the extra latency of the fair lock in earlier patch is due to the need to do a second cmpxchg(). That can be avoided by doing a read first, but that is not good for good cache. So I optimized it for the default unfair lock. By supporting only one version, there is no need to do a second cmpxchg anymore. > Alternatively, maybe you just took the old timings, and the above > numbers are for the old unfair code, and *not* for the actual patch > you sent out? > > So please double-check and verify. > > Linus I reran the timing test on the 2.93GHz processor. The timing is the practically the same. I reused the old one for the 2.4GHz processor. Regards, Longman