From mboxrd@z Thu Jan 1 00:00:00 1970 From: Linus Torvalds Subject: Re: 2.6.30-rc8 Oops whilst booting Date: Mon, 8 Jun 2009 10:21:55 -0700 (PDT) Message-ID: References: <200906061959.55592.chris2553@googlemail.com> <200906062215.30571.chris2553@googlemail.com> <1244381140.30664.12.camel@ht.satnam> <1244413881.18742.31.camel@ht.satnam> <2f9e3044bafcae848f74a1492b0ea471.squirrel@neil.brown.name> <1244460875.12644.2.camel@ht.satnam> <1244479879.4079.284.camel@mulgrave.site> Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Return-path: Received: from smtp1.linux-foundation.org ([140.211.169.13]:49118 "EHLO smtp1.linux-foundation.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750894AbZFHRXf (ORCPT ); Mon, 8 Jun 2009 13:23:35 -0400 In-Reply-To: <1244479879.4079.284.camel@mulgrave.site> Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: James Bottomley Cc: Chris Clayton , Jaswinder Singh Rajput , NeilBrown , linux-kernel@vger.kernel.org, scsi , Tejun Heo , Arjan van de Ven On Mon, 8 Jun 2009, James Bottomley wrote: > > The root cause is a reordering of the devices caused by the async code. That's NULL information. OF COURSE the root cause is the async code. We know that. We're looking for the specifics. In particular, before that commit, at most you will wait for too _much_. In other words, it's a "good" wait. Your commit caused it to wait for less, and that then showed a bug. Not all that surprising - it's now not waiting enough. You tried to avoid a deadlock situation of waiting for too much, but you avoided the deadlock by now waiting for too little. I also think that your code is simply buggy. As far as I can tell, int he case of having both running and pending events, you'll always pick the pending cookie. But it's the _running_ cookie that has the lower event number, isn't it? I dunno. It all looks very fishy to me. Linus