From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753303AbcGSQxn (ORCPT ); Tue, 19 Jul 2016 12:53:43 -0400 Received: from mga03.intel.com ([134.134.136.65]:11652 "EHLO mga03.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752473AbcGSQxk (ORCPT ); Tue, 19 Jul 2016 12:53:40 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.28,390,1464678000"; d="scan'208";a="1019915373" Message-ID: <1468947205.31332.40.camel@intel.com> Subject: Re: [RFC] locking/mutex: Fix starvation of sleeping waiters From: Imre Deak Reply-To: imre.deak@intel.com To: Jason Low , Peter Zijlstra Cc: linux-kernel@vger.kernel.org, Ingo Molnar , Chris Wilson , Daniel Vetter , Ville Syrj??l?? , Waiman Long , Davidlohr Bueso , jason.low2@hp.com Date: Tue, 19 Jul 2016 19:53:25 +0300 In-Reply-To: <1468864069.2367.21.camel@j-VirtualBox> References: <1468858607-20481-1-git-send-email-imre.deak@intel.com> <20160718171537.GC6862@twins.programming.kicks-ass.net> <1468864069.2367.21.camel@j-VirtualBox> Organization: Intel Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.18.2 Mime-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On ma, 2016-07-18 at 10:47 -0700, Jason Low wrote: > On Mon, 2016-07-18 at 19:15 +0200, Peter Zijlstra wrote: > > On Mon, Jul 18, 2016 at 07:16:47PM +0300, Imre Deak wrote: > > > Currently a thread sleeping on a mutex wait queue can be delayed > > > indefinitely by other threads managing to steal the lock, that is > > > acquiring the lock out-of-order before the sleepers. I noticed > > > this via > > > a testcase (see the Reference: below) where one CPU was unlocking > > > / > > > relocking a mutex in a tight loop while another CPU was delayed > > > indefinitely trying to wake up and get the lock but losing out to > > > the > > > first CPU and going back to sleep: > > > > > > CPU0:                        CPU1: > > > mutex_lock->acquire > > >                              mutex_lock->sleep > > > mutex_unlock->wake CPU1 > > >                              wakeup > > > mutex_lock->acquire > > >                              trylock fail->sleep > > > mutex_unlock->wake CPU1 > > >                              wakeup > > > mutex_lock->acquire > > >                              trylock fail->sleep > > > ...      ... > > > > > > To fix this we can make sure that CPU1 makes progress by avoiding > > > the > > > fastpath locking, optimistic spinning and trylocking if there is > > > any > > > waiter on the list.  The corresponding check can be done without > > > holding > > > wait_lock, since the goal is only to make sure sleepers make > > > progress > > > and not to guarantee that the locking will happen in FIFO order. > > > > I think we went over this before, that will also completely destroy > > performance under a number of workloads. > > Yup, once a thread becomes a waiter, all other threads will need to > follow suit, so this change would effectively disable optimistic > spinning in some workloads. > > A few months ago, we worked on patches that allow the waiter to > return > to optimistic spinning to help reduce starvation. Longman sent out a > version 3 patch set, and it sounded like we were fine with the > concept. Thanks, with v4 he just sent I couldn't trigger the above problem. However this only works if mutex spinning is enabled, if it's disabled I still hit the problem due to the other forms of lock stealing. So could we prevent these if mutex spinning is anyway disabled? --Imre