From mboxrd@z Thu Jan 1 00:00:00 1970 From: James Bottomley Subject: Re: [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ Date: Sun, 09 Nov 2008 09:22:47 -0600 Message-ID: <1226244167.19841.3.camel@localhost.localdomain> References: Mime-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 7bit Return-path: Received: from accolon.hansenpartnership.com ([76.243.235.52]:39655 "EHLO accolon.hansenpartnership.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754971AbYKIPWv (ORCPT ); Sun, 9 Nov 2008 10:22:51 -0500 In-Reply-To: Sender: linux-scsi-owner@vger.kernel.org List-Id: linux-scsi@vger.kernel.org To: bugme-daemon@bugzilla.kernel.org Cc: linux-scsi@vger.kernel.org, yanmin_zhang@linux.intel.com On Sat, 2008-11-08 at 19:50 -0800, bugme-daemon@bugzilla.kernel.org wrote: > http://bugzilla.kernel.org/show_bug.cgi?id=11990 > > Summary: Kernel hang in spin_unlock_irq from scsi_request_fn from > do_IRQ > Product: IO/Storage > Version: 2.5 > KernelVersion: 2.6.28-rc3 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: SCSI > AssignedTo: linux-scsi@vger.kernel.org > ReportedBy: vandrove@vc.cvut.cz > > > Latest working kernel version: commit c8d7aa after 2.6.28-rc2 > Earliest failing kernel version: commit 920da6 after 2.6.28-rc2 > Distribution: Debian > Hardware Environment: sata_sil24, amd64, 2cpu > Software Environment: 64bit kernel, 32bit userspace, preemptible kernel > Problem Description: > > When I/O is under stress, from time to time CPU1 hangs, most probably due to > endless stream of interrupts. Backtrace printed either by kernel's softlockup > detection or alt-sysrq-p is below (written down; I/O is dead when this > happens). > > _spin_unlock_irq + 0x30 (after sti) > scsi_request_fn + 0x1b9 (after spin_unlock_irq(shost->host_lock) at > not_ready:) > blk_invoke_request_fn > __blk_runqueue > scsi_run_queue > scsi_next_command > scsi_end_request > scsi_io_completion > scsi_finish_command > scsi_softirq_done > blk_done_softirq > __do_softirq > call_softirq > do_softirq > irqexit > do_IRQ > ret_from_intr > > native_safe_halt > trace_hardirqs_on > default_idle > c1e_idle > cpu_idle > start_secondary > > Steps to reproduce: > > It seems to occur under heavy I/O (updatedb, dumping core from ~3GB app), but I > was not able to trigger it reliably - most reliable is hard resetting box, then > it occurs in ~80% cases when replaying journals on disks connected to > sata_sil24 (through PMP, but problem does not seem to occur on 2.6.28-rc2 with > Jens's PMP patches). This looks identical to http://bugzilla.kernel.org/show_bug.cgi?id=11898 Could you see if this refinement of the discussed patches fixes it for you? Thanks, James --- diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c index f5d3b96..e09a661 100644 --- a/drivers/scsi/scsi_lib.c +++ b/drivers/scsi/scsi_lib.c @@ -606,6 +606,7 @@ static void scsi_run_queue(struct request_queue *q) } list_del_init(&sdev->starved_entry); + starved_head = NULL; spin_unlock(shost->host_lock); spin_lock(sdev->request_queue->queue_lock); @@ -620,6 +621,12 @@ static void scsi_run_queue(struct request_queue *q) spin_unlock(sdev->request_queue->queue_lock); spin_lock(shost->host_lock); + if (unlikely(!list_empty(&sdev->starved_entry))) + /* + * sdev got put back on the starved list + * so finish starved handling + */ + break; } spin_unlock_irqrestore(shost->host_lock, flags);