From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Date: Mon, 4 Jan 2016 16:59:15 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Andy Lutomirski <luto@amacapital.net>
Cc: Dominique Martinet <dominique.martinet@cea.fr>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@kernel.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	V9FS Developers <v9fs-developer@lists.sourceforge.net>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>
Subject: Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a
 race
Message-ID: <20160104155915.GI6344@twins.programming.kicks-ass.net>
References: <CALCETrX8ukASsKfTgmN6SzDVG5L61GrSc6i26SvKu1OQt7_xTA@mail.gmail.com>
 <20151207224643.GA10531@nautica>
 <CALCETrU7SOhTi0g0=tvcirn_C4s9oTkhaV8BfbDf2F90+himJA@mail.gmail.com>
 <20151208023331.GJ20997@ZenIV.linux.org.uk>
 <CALCETrU1GKGXsp4gmnWJXh2Lp=dBp7K--qt1ViUOKyxpVWBtMQ@mail.gmail.com>
 <20151209062316.GA29917@nautica>
 <20151209064542.GW20997@ZenIV.linux.org.uk>
 <CALCETrX=-1hfy_MgZX5HswTOG8WqUHrMCN+c0KHoZYQSHHB_7Q@mail.gmail.com>
 <20151224105149.GA24863@nautica>
 <CALCETrW5bz0jmx+NP_UJGdVmHdPS_-hzTecRrec-Ed+8RY=tgQ@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALCETrW5bz0jmx+NP_UJGdVmHdPS_-hzTecRrec-Ed+8RY=tgQ@mail.gmail.com>
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-fsdevel.vger.kernel.org>

On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote:
> [add cc's]
> 
> Hi scheduler people:
> 
> This is relatively easy for me to reproduce.  Any hints for debugging
> it?  Could we really have a bug in which processes that are
> schedulable as a result of mutex unlock aren't always reliably
> scheduled?

I would expect that to cause wide-spread fail, then again, virt is known
to tickle timing issues that are improbable on actual hardware so
anything is possible.

Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config
here).

If its really easy you could start by tracing events/sched/sched_switch
events/sched/sched_wakeup, those would be the actual scheduling events.

Without DEBUG_MUTEXES there's the MUTEX_SPIN_ON_OWNER code that could
still confuse things, but that's mutex internal and not scheduler
related.

If it ends up being the SPIN_ON_OWNER bits we'll have to cook up some
extra debug patches.