From mboxrd@z Thu Jan 1 00:00:00 1970 From: Josef Bacik Subject: Re: worker list corruption crash Date: Fri, 27 Apr 2012 09:41:02 -0400 Message-ID: <20120427134102.GA2088@localhost.localdomain> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Chris Mason , Josef Bacik , Linux BTRFS To: Daniel J Blueman Return-path: In-Reply-To: List-ID: On Fri, Apr 27, 2012 at 10:26:27AM +0800, Daniel J Blueman wrote: > In 3.4-rc4, I've come across worker list corruption while scrubbing, > leading to (in two separate cases) warning [1] and crashing [2]. The > connection with scrubbing is likely the increased rate of worker > threads starting and stopping. > > In btrfs_stop_workers, access to worker->worker_list is done without > holding worker->lock (it is in all other callsites). We can't take > worker->lock there due to lock inversion deadlock (as it is the outer > lock), and if we drop the workers->lock to acquire worker->lock and > then workers->lock, we can't guarantee worker is still valid. > > If feels like a global workers list pointer should be used and it's > lock should be the outer one to avoid this scenario, or maybe I'm > missing something? > I think you are missing something, as I read it we're always holding workers->lock when we touch the worker_list, so we should be safe, so I wonder what could be going on here... Josef