From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S936735Ab3DJOJc (ORCPT ); Wed, 10 Apr 2013 10:09:32 -0400 Received: from relay3.sgi.com ([192.48.152.1]:55747 "EHLO relay.sgi.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S936542Ab3DJOJa (ORCPT ); Wed, 10 Apr 2013 10:09:30 -0400 Date: Wed, 10 Apr 2013 09:09:25 -0500 From: Robin Holt To: Linus Torvalds Cc: Ingo Molnar , Waiman Long , Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , "Paul E. McKenney" , David Howells , Dave Jones , Clark Williams , Peter Zijlstra , Davidlohr Bueso , Linux Kernel Mailing List , "Chandramouleeswaran, Aswin" , Peter Zijlstra , Andrew Morton , tony.luck@intel.com Subject: Re: [PATCH RFC 1/3] mutex: Make more scalable by doing less atomic operations Message-ID: <20130410140925.GD3672@sgi.com> References: <1365087258-7169-1-git-send-email-Waiman.Long@hp.com> <1365087258-7169-2-git-send-email-Waiman.Long@hp.com> <20130408124223.GA10093@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Apr 08, 2013 at 07:38:39AM -0700, Linus Torvalds wrote: > On Mon, Apr 8, 2013 at 5:42 AM, Ingo Molnar wrote: > > > > AFAICS the main performance trade-off is the following: when the owner CPU unlocks > > the mutex, we'll poll it via a read first, which turns the cacheline into > > shared-read MESI state. Then we notice that its content signals 'lock is > > available', and we attempt the trylock again. > > > > This increases lock latency in the few-contended-tasks case slightly - and we'd > > like to know by precisely how much, not just for a generic '10-100 users' case > > which does not tell much about the contention level. > > We had this problem for *some* lock where we used a "read + cmpxchg" > in the hotpath and it caused us problems due to two cacheline state > transitions (first to shared, then to exclusive). It was faster to > just assume it was unlocked and try to do an immediate cmpxchg. > > But iirc it is a non-issue for this case, because this is only about > the contended slow path. > > I forget where we saw the case where we should *not* read the initial > value, though. Anybody remember? I think you might be remembering ia64. Fairly early on, I recall there being a change in the spinlocks where we did not check them before just trying to acquire. Thanks, Robin