[Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
@ 2008-11-09  3:50 bugme-daemon
  2008-11-09  3:51 ` [Bug 11990] " bugme-daemon
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09  3:50 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990

           Summary: Kernel hang in spin_unlock_irq from scsi_request_fn from
                    do_IRQ
           Product: IO/Storage
           Version: 2.5
     KernelVersion: 2.6.28-rc3
          Platform: All
        OS/Version: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: SCSI
        AssignedTo: linux-scsi@vger.kernel.org
        ReportedBy: vandrove@vc.cvut.cz


Latest working kernel version: commit c8d7aa after 2.6.28-rc2
Earliest failing kernel version: commit 920da6 after 2.6.28-rc2
Distribution: Debian
Hardware Environment: sata_sil24, amd64, 2cpu
Software Environment: 64bit kernel, 32bit userspace, preemptible kernel
Problem Description:

When I/O is under stress, from time to time CPU1 hangs, most probably due to
endless stream of interrupts.  Backtrace printed either by kernel's softlockup
detection or alt-sysrq-p is below (written down; I/O is dead when this
happens).

_spin_unlock_irq + 0x30  (after sti)
scsi_request_fn + 0x1b9  (after spin_unlock_irq(shost->host_lock) at
not_ready:)
blk_invoke_request_fn
__blk_runqueue
scsi_run_queue
scsi_next_command
scsi_end_request
scsi_io_completion
scsi_finish_command
scsi_softirq_done
blk_done_softirq
__do_softirq
call_softirq
do_softirq
irqexit
do_IRQ
ret_from_intr
<EOI>
native_safe_halt
trace_hardirqs_on
default_idle
c1e_idle
cpu_idle
start_secondary

Steps to reproduce:

It seems to occur under heavy I/O (updatedb, dumping core from ~3GB app), but I
was not able to trigger it reliably - most reliable is hard resetting box, then
it occurs in ~80% cases when replaying journals on disks connected to
sata_sil24 (through PMP, but problem does not seem to occur on 2.6.28-rc2 with
Jens's PMP patches).


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 11990] Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
@ 2008-11-09  3:51 ` bugme-daemon
  2008-11-09  9:57 ` bugme-daemon
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09  3:51 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990


vandrove@vc.cvut.cz changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
OtherBugsDependingO|                            |11808
              nThis|                            |
         Regression|0                           |1




-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 11990] Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
  2008-11-09  3:51 ` [Bug 11990] " bugme-daemon
@ 2008-11-09  9:57 ` bugme-daemon
  2008-11-09 15:22 ` [Bug 11990] New: " James Bottomley
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09  9:57 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990





------- Comment #1 from vandrove@vc.cvut.cz  2008-11-09 01:57 -------
Apparently my lower bound test kernel did not had Jens's patch to use tagged
queuing from SCSI layer applied.  I've discovered reliable test case (write 4GB
of data concurrently to every attached disk), and found that reverting all 4
SATA tagged queueing related checkins gets rid of the problem:

43a49cbdf31e812c0d8f553d433b09b421f5d52c
3070f69b66b7ab2f02d8a2500edae07039c38508
e013e13bf605b9e6b702adffbe2853cfc60e7806
2fca5ccf97d2c28bcfce44f5b07d85e74e3cd18e


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
  2008-11-09  3:51 ` [Bug 11990] " bugme-daemon
  2008-11-09  9:57 ` bugme-daemon
@ 2008-11-09 15:22 ` James Bottomley
  2008-11-09 15:23 ` [Bug 11990] " bugme-daemon
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: James Bottomley @ 2008-11-09 15:22 UTC (permalink / raw)
  To: bugme-daemon; +Cc: linux-scsi, yanmin_zhang

On Sat, 2008-11-08 at 19:50 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11990
> 
>            Summary: Kernel hang in spin_unlock_irq from scsi_request_fn from
>                     do_IRQ
>            Product: IO/Storage
>            Version: 2.5
>      KernelVersion: 2.6.28-rc3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: vandrove@vc.cvut.cz
> 
> 
> Latest working kernel version: commit c8d7aa after 2.6.28-rc2
> Earliest failing kernel version: commit 920da6 after 2.6.28-rc2
> Distribution: Debian
> Hardware Environment: sata_sil24, amd64, 2cpu
> Software Environment: 64bit kernel, 32bit userspace, preemptible kernel
> Problem Description:
> 
> When I/O is under stress, from time to time CPU1 hangs, most probably due to
> endless stream of interrupts.  Backtrace printed either by kernel's softlockup
> detection or alt-sysrq-p is below (written down; I/O is dead when this
> happens).
> 
> _spin_unlock_irq + 0x30  (after sti)
> scsi_request_fn + 0x1b9  (after spin_unlock_irq(shost->host_lock) at
> not_ready:)
> blk_invoke_request_fn
> __blk_runqueue
> scsi_run_queue
> scsi_next_command
> scsi_end_request
> scsi_io_completion
> scsi_finish_command
> scsi_softirq_done
> blk_done_softirq
> __do_softirq
> call_softirq
> do_softirq
> irqexit
> do_IRQ
> ret_from_intr
> <EOI>
> native_safe_halt
> trace_hardirqs_on
> default_idle
> c1e_idle
> cpu_idle
> start_secondary
> 
> Steps to reproduce:
> 
> It seems to occur under heavy I/O (updatedb, dumping core from ~3GB app), but I
> was not able to trigger it reliably - most reliable is hard resetting box, then
> it occurs in ~80% cases when replaying journals on disks connected to
> sata_sil24 (through PMP, but problem does not seem to occur on 2.6.28-rc2 with
> Jens's PMP patches).

This looks identical to

http://bugzilla.kernel.org/show_bug.cgi?id=11898

Could you see if this refinement of the discussed patches fixes it for
you?

Thanks,

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5d3b96..e09a661 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -606,6 +606,7 @@ static void scsi_run_queue(struct request_queue *q)
 		}
 
 		list_del_init(&sdev->starved_entry);
+		starved_head = NULL;
 		spin_unlock(shost->host_lock);
 
 		spin_lock(sdev->request_queue->queue_lock);
@@ -620,6 +621,12 @@ static void scsi_run_queue(struct request_queue *q)
 		spin_unlock(sdev->request_queue->queue_lock);
 
 		spin_lock(shost->host_lock);
+		if (unlikely(!list_empty(&sdev->starved_entry)))
+			/* 
+			 * sdev got put back on the starved list
+			 * so finish starved handling
+			 */
+			break;
 	}
 	spin_unlock_irqrestore(shost->host_lock, flags);
 



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Bug 11990] Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
                   ` (2 preceding siblings ...)
  2008-11-09 15:22 ` [Bug 11990] New: " James Bottomley
@ 2008-11-09 15:23 ` bugme-daemon
  2008-11-09 19:01 ` bugme-daemon
  2008-11-09 19:01 ` bugme-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09 15:23 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990





------- Comment #2 from anonymous@kernel-bugs.osdl.org  2008-11-09 07:22 -------
Reply-To: James.Bottomley@HansenPartnership.com

On Sat, 2008-11-08 at 19:50 -0800, bugme-daemon@bugzilla.kernel.org
wrote:
> http://bugzilla.kernel.org/show_bug.cgi?id=11990
> 
>            Summary: Kernel hang in spin_unlock_irq from scsi_request_fn from
>                     do_IRQ
>            Product: IO/Storage
>            Version: 2.5
>      KernelVersion: 2.6.28-rc3
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: SCSI
>         AssignedTo: linux-scsi@vger.kernel.org
>         ReportedBy: vandrove@vc.cvut.cz
> 
> 
> Latest working kernel version: commit c8d7aa after 2.6.28-rc2
> Earliest failing kernel version: commit 920da6 after 2.6.28-rc2
> Distribution: Debian
> Hardware Environment: sata_sil24, amd64, 2cpu
> Software Environment: 64bit kernel, 32bit userspace, preemptible kernel
> Problem Description:
> 
> When I/O is under stress, from time to time CPU1 hangs, most probably due to
> endless stream of interrupts.  Backtrace printed either by kernel's softlockup
> detection or alt-sysrq-p is below (written down; I/O is dead when this
> happens).
> 
> _spin_unlock_irq + 0x30  (after sti)
> scsi_request_fn + 0x1b9  (after spin_unlock_irq(shost->host_lock) at
> not_ready:)
> blk_invoke_request_fn
> __blk_runqueue
> scsi_run_queue
> scsi_next_command
> scsi_end_request
> scsi_io_completion
> scsi_finish_command
> scsi_softirq_done
> blk_done_softirq
> __do_softirq
> call_softirq
> do_softirq
> irqexit
> do_IRQ
> ret_from_intr
> <EOI>
> native_safe_halt
> trace_hardirqs_on
> default_idle
> c1e_idle
> cpu_idle
> start_secondary
> 
> Steps to reproduce:
> 
> It seems to occur under heavy I/O (updatedb, dumping core from ~3GB app), but I
> was not able to trigger it reliably - most reliable is hard resetting box, then
> it occurs in ~80% cases when replaying journals on disks connected to
> sata_sil24 (through PMP, but problem does not seem to occur on 2.6.28-rc2 with
> Jens's PMP patches).

This looks identical to

http://bugzilla.kernel.org/show_bug.cgi?id=11898

Could you see if this refinement of the discussed patches fixes it for
you?

Thanks,

James

---

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5d3b96..e09a661 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -606,6 +606,7 @@ static void scsi_run_queue(struct request_queue *q)
                }

                list_del_init(&sdev->starved_entry);
+               starved_head = NULL;
                spin_unlock(shost->host_lock);

                spin_lock(sdev->request_queue->queue_lock);
@@ -620,6 +621,12 @@ static void scsi_run_queue(struct request_queue *q)
                spin_unlock(sdev->request_queue->queue_lock);

                spin_lock(shost->host_lock);
+               if (unlikely(!list_empty(&sdev->starved_entry)))
+                       /* 
+                        * sdev got put back on the starved list
+                        * so finish starved handling
+                        */
+                       break;
        }
        spin_unlock_irqrestore(shost->host_lock, flags);



-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Bug 11990] Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
                   ` (3 preceding siblings ...)
  2008-11-09 15:23 ` [Bug 11990] " bugme-daemon
@ 2008-11-09 19:01 ` bugme-daemon
  2008-11-09 19:01 ` bugme-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09 19:01 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990


rjw@sisk.pl changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |DUPLICATE




------- Comment #3 from rjw@sisk.pl  2008-11-09 11:01 -------


*** This bug has been marked as a duplicate of bug 11898 ***


-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Bug 11990] Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ
  2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
                   ` (4 preceding siblings ...)
  2008-11-09 19:01 ` bugme-daemon
@ 2008-11-09 19:01 ` bugme-daemon
  5 siblings, 0 replies; 7+ messages in thread
From: bugme-daemon @ 2008-11-09 19:01 UTC (permalink / raw)
  To: linux-scsi

http://bugzilla.kernel.org/show_bug.cgi?id=11990


rjw@sisk.pl changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |CLOSED




-- 
Configure bugmail: http://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-11-09 19:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-09  3:50 [Bug 11990] New: Kernel hang in spin_unlock_irq from scsi_request_fn from do_IRQ bugme-daemon
2008-11-09  3:51 ` [Bug 11990] " bugme-daemon
2008-11-09  9:57 ` bugme-daemon
2008-11-09 15:22 ` [Bug 11990] New: " James Bottomley
2008-11-09 15:23 ` [Bug 11990] " bugme-daemon
2008-11-09 19:01 ` bugme-daemon
2008-11-09 19:01 ` bugme-daemon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox