From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1751843AbbCOVtN (ORCPT <rfc822;w@1wt.eu>);
	Sun, 15 Mar 2015 17:49:13 -0400
Received: from mail-wg0-f52.google.com ([74.125.82.52]:33009 "EHLO
	mail-wg0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1750973AbbCOVtM (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Sun, 15 Mar 2015 17:49:12 -0400
Message-ID: <5505FE53.1060807@gmail.com>
Date: Sun, 15 Mar 2015 23:49:07 +0200
From: Matthias Bonne <lemonlime51@gmail.com>
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/20130306 Thunderbird/17.0.3
MIME-Version: 1.0
To: Davidlohr Bueso <dave@stgolabs.net>
CC: Yann Droneaud <ydroneaud@opteya.com>, kernelnewbies@kernelnewbies.org,
        linux-kernel@vger.kernel.org, Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@redhat.com>
Subject: Re: Question on mutex code
References: <54F64E10.7050801@gmail.com>  <1425992639.3991.11.camel@opteya.com> <5504BECB.50605@gmail.com>  <1426381401.28068.68.camel@stgolabs.net> <1426381746.28068.70.camel@stgolabs.net>
In-Reply-To: <1426381746.28068.70.camel@stgolabs.net>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 03/15/15 03:09, Davidlohr Bueso wrote:
> On Sat, 2015-03-14 at 18:03 -0700, Davidlohr Bueso wrote:
>> Good analysis, but not quite accurate for one simple fact: mutex
>> trylocks _only_ use fastpaths (obviously just depend on the counter
>> cmpxchg to 0), so you never fallback to the slowpath you are mentioning,
>> thus the race is non existent. Please see the arch code.
>
> For debug we use the trylock slowpath, but so does everything else, so
> again you cannot hit this scenario.
>
>

You are correct of course - this is why I said that
CONFIG_DEBUG_MUTEXES must be enabled for this to happen. Can you
explain why this scenario is still not possible in the debug case?

The debug case uses mutex-null.h, which contains these macros:

#define __mutex_fastpath_lock(count, fail_fn)           fail_fn(count)
#define __mutex_fastpath_lock_retval(count)             (-1)
#define __mutex_fastpath_unlock(count, fail_fn)         fail_fn(count)
#define __mutex_fastpath_trylock(count, fail_fn)        fail_fn(count)
#define __mutex_slowpath_needs_to_unlock()              1

So both mutex_trylock() and mutex_unlock() always use the slow paths.
The slowpath for mutex_unlock() is __mutex_unlock_slowpath(), which
simply calls __mutex_unlock_common_slowpath(), and the latter starts
like this:

         /*
          * As a performance measurement, release the lock before doing 
other
          * wakeup related duties to follow. This allows other tasks to 
acquire
          * the lock sooner, while still handling cleanups in past 
unlock calls.
          * This can be done as we do not enforce strict equivalence 
between the
          * mutex counter and wait_list.
          *
          *
          * Some architectures leave the lock unlocked in the fastpath 
failure
          * case, others need to leave it locked. In the later case we 
have to
          * unlock it here - as the lock counter is currently 0 or negative.
          */
         if (__mutex_slowpath_needs_to_unlock())
                 atomic_set(&lock->count, 1);

         spin_lock_mutex(&lock->wait_lock, flags);
         [...]

So the counter is set to 1 before taking the spinlock, which I think
might cause the race. Did I miss something?