linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.6.22 oops kernel BUG at block/elevator.c:366!
@ 2007-08-29 10:41 Arkadiusz Miskiewicz
  2007-08-29 13:15 ` Arkadiusz Miskiewicz
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2007-08-29 10:41 UTC (permalink / raw)
  To: linux-scsi

Hello,

I'm trying to get stable kernel for Promise SuperTrak 
X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
like this (while doing rsync):

kernel BUG at block/elevator.c:366!
invalid opcode: 0000 [1] SMP
CPU 1
Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs scsi_wait_scan sd_mod stex scsi_mod
Pid: 1139:#0, comm: xfsbufd Not tainted 2.6.22.5-0.2 #1
RIP: 0010:[<ffffffff8033f5da>]  [<ffffffff8033f5da>] elv_rb_del+0x3a/0x40
RSP: 0000:ffff8100759b1c00  EFLAGS: 00010046
RAX: ffff81000d1f5428 RBX: ffff81000d1f5428 RCX: ffff81007c1a1a00
RDX: 0000000000000000 RSI: ffff81000d1f53b0 RDI: ffff81007c102af0
RBP: ffff81000d1f53b0 R08: ffff81004a9dab50 R09: 0000000000000000
R10: 0000000000000000 R11: ffffffff880072c0 R12: ffff81007c102ac0
R13: ffff81007c1a1a00 R14: 0000000000000004 R15: ffff81007c102b18
FS:  00002ba2cafc9be0(0000) GS:ffff81007d0a5b40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002ba2cab5a158 CR3: 000000003c5ce000 CR4: 00000000000006e0
Process xfsbufd (pid: 1139[#0], threadinfo ffff8100759b0000, task ffff81007cac1040)
Stack:  0000000000000001 ffff81007c102ac0 ffff81000d1f53b0 ffffffff8034abe8
 0000000000000246 ffff81000d1f53b0 ffff81007c1a1a00 ffff81007c102ac0
 ffff81007c0f2d08 0000000000000004 ffff81007c102b18 ffffffff8034ad55
Call Trace:
 [<ffffffff8034abe8>] cfq_remove_request+0x78/0x1b0
 [<ffffffff8034ad55>] cfq_dispatch_insert+0x35/0x70
 [<ffffffff8034b61f>] cfq_dispatch_requests+0x1bf/0x3a0
 [<ffffffff8033f11f>] elv_next_request+0x3f/0x150
 [<ffffffff80243b04>] lock_timer_base+0x34/0x70
 [<ffffffff88007329>] :scsi_mod:scsi_request_fn+0x69/0x3d0
 [<ffffffff80343d46>] __make_request+0xe6/0x5d0
 [<ffffffff8034158b>] generic_make_request+0x18b/0x230
 [<ffffffff8034438a>] submit_bio+0x5a/0xf0
 [<ffffffff8808d1e9>] :xfs:_xfs_buf_ioapply+0x199/0x340
 [<ffffffff8808e099>] :xfs:xfs_buf_iorequest+0x29/0x80
 [<ffffffff88092fbb>] :xfs:xfs_bdstrat_cb+0x3b/0x50
 [<ffffffff8808e3c2>] :xfs:xfsbufd+0x92/0x140
 [<ffffffff8808e330>] :xfs:xfsbufd+0x0/0x140
 [<ffffffff8024fa3b>] kthread+0x4b/0x80
 [<ffffffff8020b0a8>] child_rip+0xa/0x12
 [<ffffffff8024f9f0>] kthread+0x0/0x80
 [<ffffffff8020b09e>] child_rip+0x0/0x12


Code: 0f 0b eb fe 66 90 48 83 ec 08 49 89 f8 48 89 f8 31 c9 eb 09
RIP  [<ffffffff8033f5da>] elv_rb_del+0x3a/0x40
 RSP <ffff8100759b1c00>


I can reproduce it without bigger problem.


Here are the same oopses on 2.6.20:
http://paste.stgraber.org/3138

This is 1 x dual core athlon64 on asus m2npv mainboard, 2GB RAM.
There is hw raid on fasttrack 16350 only (no software one).

Has anyone seen this ?

Going to try without cfq.

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
  2007-08-29 10:41 Arkadiusz Miskiewicz
@ 2007-08-29 13:15 ` Arkadiusz Miskiewicz
  2007-08-29 17:10   ` Arkadiusz Miskiewicz
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2007-08-29 13:15 UTC (permalink / raw)
  To: linux-scsi

On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
> Hello,
>
> I'm trying to get stable kernel for Promise SuperTrak
> X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
> like this (while doing rsync):

With anticipatory:

berta login: ------------[ cut here ]------------
kernel BUG at block/as-iosched.c:1084!
invalid opcode: 0000 [1] SMP
CPU 1
Modules linked in: softdog sch_sfq forcedeth ext3 jbd mbcache dm_mod xfs scsi_wait_scan sd_mod stex scsi_mod
Pid: 32:#0, comm: kblockd/1 Not tainted 2.6.22.5-0.2 #1
RIP: 0010:[<ffffffff80349028>]  [<ffffffff80349028>] as_dispatch_request+0x438/0x460
RSP: 0018:ffff81007d1fddc0  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff81007c765a00 RCX: 00000000ffffffff
RDX: ffff81007c765a28 RSI: 0000000000000000 RDI: ffff81007c54ad08
RBP: 0000000000000000 R08: ffffffffffffffff R09: ffff81006a289d80
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000001 R14: 0000000000000000 R15: ffff81007cf85048
FS:  00002ba4421e8b00(0000) GS:ffff81007d0a5b40(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 00002ba46298f000 CR3: 0000000050951000 CR4: 00000000000006e0
Process kblockd/1 (pid: 32[#0], threadinfo ffff81007d1fc000, task ffff81007d1db040)
Stack:  ffff81007c54ad08 ffff81007cf85000 ffff81007cf7e000 ffff81007d1fde00
 ffff81006a289cc0 ffffffff8033f11f 0000000000000287 ffffffff88000fa8
 ffff81001646a6f8 0000000000000000 ffff81007cf85000 ffff81007cf7e000
Call Trace:
 [<ffffffff8033f11f>] elv_next_request+0x3f/0x150
 [<ffffffff88000fa8>] :scsi_mod:scsi_dispatch_cmd+0x1c8/0x310
 [<ffffffff88007329>] :scsi_mod:scsi_request_fn+0x69/0x3d0
 [<ffffffff80347b30>] as_work_handler+0x0/0x50
 [<ffffffff80347b5c>] as_work_handler+0x2c/0x50
 [<ffffffff8024b94c>] run_workqueue+0xcc/0x170
 [<ffffffff8024c3a0>] worker_thread+0x0/0x110
 [<ffffffff8024c3a0>] worker_thread+0x0/0x110
 [<ffffffff8024c443>] worker_thread+0xa3/0x110
 [<ffffffff8024fe10>] autoremove_wake_function+0x0/0x30
 [<ffffffff8024c3a0>] worker_thread+0x0/0x110
 [<ffffffff8024c3a0>] worker_thread+0x0/0x110
 [<ffffffff8024fa3b>] kthread+0x4b/0x80
 [<ffffffff8020b0a8>] child_rip+0xa/0x12
 [<ffffffff8024f9f0>] kthread+0x0/0x80
 [<ffffffff8020b09e>] child_rip+0x0/0x12


Code: 0f 0b eb fe 0f 0b eb fe 31 ed c7 83 b8 00 00 00 01 00 00 00
RIP  [<ffffffff80349028>] as_dispatch_request+0x438/0x460
 RSP <ffff81007d1fddc0>

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
  2007-08-29 13:15 ` Arkadiusz Miskiewicz
@ 2007-08-29 17:10   ` Arkadiusz Miskiewicz
  0 siblings, 0 replies; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2007-08-29 17:10 UTC (permalink / raw)
  To: linux-scsi

On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
> On Wednesday 29 of August 2007, Arkadiusz Miskiewicz wrote:
> > Hello,
> >
> > I'm trying to get stable kernel for Promise SuperTrak
> > X16350 hardware. So far 2.6.20, 2.6.21 and 2.6.22 oopsed
> > like this (while doing rsync):
>
> With anticipatory:
>
> berta login: ------------[ cut here ]------------
> kernel BUG at block/as-iosched.c:1084!

One more information: I'm currently running 2.6.19 for few hours and the oops 
doesn't happen. Looks like some regression introduced between 2.6.19 and 
2.6.20.

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
       [not found]   ` <20070829181648.GD7932@kernel.dk>
@ 2007-08-30 11:29     ` Arkadiusz Miskiewicz
  2007-09-10  9:17       ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Arkadiusz Miskiewicz @ 2007-08-30 11:29 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel, linux-scsi, Ed Lin

On Wednesday 29 of August 2007, Jens Axboe wrote:
> On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > On Wednesday 29 of August 2007, Jens Axboe wrote:
> > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > > > On Wednesday 29 of August 2007, Jens Axboe wrote:
> > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > > > > > I guess I should sent these here since it looks like not scsi bug
> > > > > > anyway.
> > > > >
> > > > > It's stex, right? It seems to have some issues with multiple
> > > > > completions of commands, which craps out the block layer of course.
> > > >
> > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine
> > > > in that version.
> > > >
> > > > So scsi bug ... 8-)
> > >
> > > And you based that conclusion on what exactly?
> >
> > Isn't drivers/scsi/* handled by linux-scsi@? (that's what I mean)
>
> Yep indeed, I thought you meant that it was a scsi bug (and not an stex
> one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see
> if that works, though.

Looks like this bug is known for months :-(

Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch (that 
unfortunately serialises access to storage devices, well...)

There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842

I'm running 2.6.22 with that patch now, did huge (few hours) rsync that 
previously caused oopses and now everything works properly.

Can we get some form of this patch into Linus tree?

-- 
Arkadiusz Miśkiewicz        PLD/Linux Team
arekm / maven.pl            http://ftp.pld-linux.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: 2.6.22 oops kernel BUG at block/elevator.c:366!
  2007-08-30 11:29     ` 2.6.22 oops kernel BUG at block/elevator.c:366! Arkadiusz Miskiewicz
@ 2007-09-10  9:17       ` Andrew Morton
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Morton @ 2007-09-10  9:17 UTC (permalink / raw)
  To: Arkadiusz Miskiewicz
  Cc: Jens Axboe, linux-kernel, linux-scsi, Ed Lin, Jeff Garzik,
	James Bottomley

On Thu, 30 Aug 2007 13:29:37 +0200 Arkadiusz Miskiewicz <arekm@maven.pl> wrote:

> On Wednesday 29 of August 2007, Jens Axboe wrote:
> > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > > On Wednesday 29 of August 2007, Jens Axboe wrote:
> > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > > > > On Wednesday 29 of August 2007, Jens Axboe wrote:
> > > > > > On Wed, Aug 29 2007, Arkadiusz Miskiewicz wrote:
> > > > > > > I guess I should sent these here since it looks like not scsi bug
> > > > > > > anyway.
> > > > > >
> > > > > > It's stex, right? It seems to have some issues with multiple
> > > > > > completions of commands, which craps out the block layer of course.
> > > > >
> > > > > Yes, stex. I'm staying with 2.6.19 in that case since it works fine
> > > > > in that version.
> > > > >
> > > > > So scsi bug ... 8-)
> > > >
> > > > And you based that conclusion on what exactly?

Could be viewed as a scsi deficiency at least.  Is it unheard of for
"independent" queues to have shared resources?  If so, then yeah, perhaps
some driver-private locking as James suggested is appropriate.  But if
other drivers face similar problems then perhaps it is something which scsi
core should offer support for.

But whatever.  The situation is that Ed suggested a fix eight months ago,
James suggested enhancements and afaict nobody did anything more, and
machines which use this driver are still crashing.

<checks>

OK, Ed's email client breaks message threading, so you need to hyperjump to
a "different" thread a few days later, in which Ed points out that qla4xxx
also has a shared tag queue.

Ed's email client proceeds to splatter the discussion all over the Jan 2007
archive.  Ed finds a possible bug in qla4xxx.  Jens proposes a block patch.
Ed disagrees, Jeff agrees with Ed, discussion dies, driver still
crashing..


> > > Isn't drivers/scsi/* handled by linux-scsi@? (that's what I mean)
> >
> > Yep indeed, I thought you meant that it was a scsi bug (and not an stex
> > one). You could try and copy the 2.6.19 stex driver into 2.6.20 and see
> > if that works, though.
> 
> Looks like this bug is known for months :-(
> 
> Ed Lin pointed to http://lkml.org/lkml/2007/1/23/268 with possible patch (that 
> unfortunately serialises access to storage devices, well...)
> 
> There is also: http://bugzilla.kernel.org/show_bug.cgi?id=7842
> 
> I'm running 2.6.22 with that patch now, did huge (few hours) rsync that 
> previously caused oopses and now everything works properly.
> 
> Can we get some form of this patch into Linus tree?

Here's Ed's patch again.  As a suboptimal driver is better than a crashing
one, perhaps we should merge it until we can sort out something better?



From: "Ed Lin" <ed.lin@promise.com>

The block layer uses lock to protect request queue.  Every scsi device has
a unique request queue, and queue lock is the default lock in struct
request_queue.  This is good for normal cases.  But for a host with shared
queue tag (e.g.  stex controllers), a queue lock per device means the
shared queue tag is not protected when multiple devices are accessed at a
same time.  This patch is a simple fix for this situation by introducing a
host queue lock to protect shared queue tag.  Without this patch we will
see various kernel panics (including the BUG() and kernel errors in
blk_queue_start_tag and blk_queue_end_tag of ll_rw_blk.c) when accessing
another in smp kernels).

Signed-off-by: Ed Lin <ed.lin@promise.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 drivers/scsi/scsi_lib.c  |    2 +-
 drivers/scsi/stex.c      |    2 ++
 include/scsi/scsi_host.h |    3 +++
 3 files changed, 6 insertions(+), 1 deletion(-)

diff -puN drivers/scsi/scsi_lib.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host
+++ a/drivers/scsi/scsi_lib.c
@@ -1670,7 +1670,7 @@ struct request_queue *__scsi_alloc_queue
 {
 	struct request_queue *q;
 
-	q = blk_init_queue(request_fn, NULL);
+	q = blk_init_queue(request_fn, shost->req_q_lock);
 	if (!q)
 		return NULL;
 
diff -puN drivers/scsi/stex.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host drivers/scsi/stex.c
--- a/drivers/scsi/stex.c~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host
+++ a/drivers/scsi/stex.c
@@ -1234,6 +1234,8 @@ stex_probe(struct pci_dev *pdev, const s
 	if (err)
 		goto out_free_irq;
 
+	spin_lock_init(&host->__req_q_lock);
+	host->req_q_lock = &host->__req_q_lock;
 	err = scsi_init_shared_tag_map(host, host->can_queue);
 	if (err) {
 		printk(KERN_ERR DRV_NAME "(%s): init shared queue failed\n",
diff -puN include/scsi/scsi_host.h~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host include/scsi/scsi_host.h
--- a/include/scsi/scsi_host.h~scsi-use-lock-per-host-instead-of-per-device-for-shared-queue-tag-host
+++ a/include/scsi/scsi_host.h
@@ -503,6 +503,9 @@ struct Scsi_Host {
 	spinlock_t		default_lock;
 	spinlock_t		*host_lock;
 
+	spinlock_t		__req_q_lock;
+	spinlock_t		*req_q_lock;/* protect shared block queue tag */
+
 	struct mutex		scan_mutex;/* serialize scanning activity */
 
 	struct list_head	eh_cmd_q;
_


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-09-10  9:18 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <200708291950.14513.arekm@maven.pl>
     [not found] ` <200708292011.47134.arekm@maven.pl>
     [not found]   ` <20070829181648.GD7932@kernel.dk>
2007-08-30 11:29     ` 2.6.22 oops kernel BUG at block/elevator.c:366! Arkadiusz Miskiewicz
2007-09-10  9:17       ` Andrew Morton
2007-08-29 10:41 Arkadiusz Miskiewicz
2007-08-29 13:15 ` Arkadiusz Miskiewicz
2007-08-29 17:10   ` Arkadiusz Miskiewicz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).