From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752623Ab3GXVzV (ORCPT ); Wed, 24 Jul 2013 17:55:21 -0400 Received: from 1wt.eu ([62.212.114.60]:33422 "EHLO 1wt.eu" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751822Ab3GXVzT (ORCPT ); Wed, 24 Jul 2013 17:55:19 -0400 Date: Wed, 24 Jul 2013 23:55:15 +0200 From: Willy Tarreau To: "Rich, Jason" Cc: "linux-kernel@vger.kernel.org" , "Stoltenberg, Matthew" , James.Bottomley@suse.de Subject: Re: Panic at _blk_run_queue on 2.6.32 Message-ID: <20130724215515.GA31938@1wt.eu> References: <636295BFF4A001418A00F46569A2CD2B161CE88B@US-PLNO-EXM01-P.global.tektronix.net> <20130710202729.GA18877@1wt.eu> <636295BFF4A001418A00F46569A2CD2B161DF7B0@US-PLNO-EXM01-P.global.tektronix.net> <636295BFF4A001418A00F46569A2CD2B161EEE7E@US-PLNO-EXM01-P.global.tektronix.net> <20130722090351.GB7957@1wt.eu> <636295BFF4A001418A00F46569A2CD2B161F4A7C@US-PLNO-EXM01-P.global.tektronix.net> <636295BFF4A001418A00F46569A2CD2B161F7FBC@US-PLNO-EXM01-P.global.tektronix.net> <20130724214847.GA31914@1wt.eu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130724214847.GA31914@1wt.eu> User-Agent: Mutt/1.4.2.3i Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Jason, > Thanks to you first. I'll look for any potentially missing patch after > this one in newer kernels and will keep you informed. If I can't find > anything, I'll need James' advice on the subject, and maybe we'll need > more information about your setup, etc... > > > 0ccd644ce6a803b4f7ae5b3b4da614b8a51037cc is the first bad commit > > commit 0ccd644ce6a803b4f7ae5b3b4da614b8a51037cc > > Author: James Bottomley > > Date: Fri Apr 22 10:39:59 2011 -0500 > > put stricter guards on queue dead checks > > > > commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b upstream. (...) I just found this patch from James which got merged in 2.6.39 and backported to 2.6.32 which was merged into 2.6.32.40 : commit c055f5b2614b4f758ae6cc86733f31fa4c2c5844 Author: James Bottomley Date: Sun May 1 09:42:07 2011 -0500 [SCSI] fix oops in scsi_run_queue() The recent commit closing the race window in device teardown: commit 86cbfb5607d4b81b1a993ff689bbd2addd5d3a9b Author: James Bottomley Date: Fri Apr 22 10:39:59 2011 -0500 [SCSI] put stricter guards on queue dead checks is causing a potential NULL deref in scsi_run_queue() because the q->queuedata may already be NULL by the time this function is called. Since we shouldn't be running a queue that is being torn down, simply add a NULL check in scsi_run_queue() to forestall this. Tested-by: Jim Schutt Cc: stable@kernel.org Signed-off-by: James Bottomley So it is possible that your bisection stopped on the first bug which hides the real one, but this one was fixed in your faulty kernel. I suggest that you retry on 2.6.32.40 alone, and if it works, then bisect again between 40 and 42 (which I seem to remember was the first faulty one). Best regards, Willy