From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Jarosch Subject: Re: raid1 boot regression in 2.6.37 [bisected] Date: Tue, 12 Apr 2011 16:05:52 +0200 Message-ID: <201104121605.52443.thomas.jarosch@intra2net.com> References: <201103251725.21180.thomas.jarosch@intra2net.com> <20110405134629.664b946c@notabene.brown> <20110406101600.GB4142@mtj.dyndns.org> Mime-Version: 1.0 Content-Type: Text/Plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <20110406101600.GB4142@mtj.dyndns.org> Sender: linux-raid-owner@vger.kernel.org To: Tejun Heo Cc: NeilBrown , linux-raid@vger.kernel.org List-Id: linux-raid.ids Hello Neil, On Wednesday, 6. April 2011 12:16:00 Tejun Heo wrote: > > To put it another way matching your description Tejun, the put path has > > a chance to run firstly while mddev_find is waiting for the spinlock, > > and then while flush_workqueue is waiting for the rest of the put path > > to complete. > > I don't think the logic is wrong per-se. It's more likely that the > implemented code doesn't really follow the model described by the > logic. > > Probably the best way would be reproducing the problem and throwing in > some diagnostic code to tell the sequence of events? If work is being > queued first but it still ends up busy looping, that would be a bug in > flush_workqueue(), but I think it's more likely that the restart > condition somehow triggers in an unexpected way without the work item > queued as expected. I can test any debug patch you want, the box is in a test lab anyway. Best regards, Thomas