From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Date: Mon, 4 Jan 2016 16:59:15 +0100 From: Peter Zijlstra To: Andy Lutomirski Cc: Dominique Martinet , Thomas Gleixner , Ingo Molnar , Al Viro , "linux-kernel@vger.kernel.org" , V9FS Developers , Linux FS Devel Subject: Re: [V9fs-developer] Hang triggered by udev coldplug, looks like a race Message-ID: <20160104155915.GI6344@twins.programming.kicks-ass.net> References: <20151207224643.GA10531@nautica> <20151208023331.GJ20997@ZenIV.linux.org.uk> <20151209062316.GA29917@nautica> <20151209064542.GW20997@ZenIV.linux.org.uk> <20151224105149.GA24863@nautica> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org List-ID: On Tue, Dec 29, 2015 at 10:43:26PM -0800, Andy Lutomirski wrote: > [add cc's] > > Hi scheduler people: > > This is relatively easy for me to reproduce. Any hints for debugging > it? Could we really have a bug in which processes that are > schedulable as a result of mutex unlock aren't always reliably > scheduled? I would expect that to cause wide-spread fail, then again, virt is known to tickle timing issues that are improbable on actual hardware so anything is possible. Does it reproduce with DEBUG_MUTEXES set? (I'm not seeing a .config here). If its really easy you could start by tracing events/sched/sched_switch events/sched/sched_wakeup, those would be the actual scheduling events. Without DEBUG_MUTEXES there's the MUTEX_SPIN_ON_OWNER code that could still confuse things, but that's mutex internal and not scheduler related. If it ends up being the SPIN_ON_OWNER bits we'll have to cook up some extra debug patches.