* [parisc-linux] 53c700.c problems with tags?
@ 2004-08-31 10:39 Jochen Friedrich
2004-09-02 3:18 ` Grant Grundler
0 siblings, 1 reply; 4+ messages in thread
From: Jochen Friedrich @ 2004-08-31 10:39 UTC (permalink / raw)
To: parisc
Hi,
when doing heavy io on my 715/64, i sometimes get the following panic:
scsi0 (0:0) Target is suffering from tag starvation.
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 f2 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x00 00 00 00 00 00
scsi0: Bus Reset detected, executing command 00000000, slot 00000000, dsp
00358] failing command because of reset, slot 00008788, cmnd 103c3000
failing command because of reset, slot 000088bc, cmnd 103c3200
failing command because of reset, slot 000089f0, cmnd 103c4800
failing command because of reset, slot 00008d8c, cmnd 103c3a00
failing command because of reset, slot 00008ff4, cmnd 103c3a00
failing command because of reset, slot 0000925c, cmnd 103c3400
failing command because of reset, slot 00009390, cmnd 103c4c00
failing command because of reset, slot 00009994, cmnd 103c4e00
failing command because of reset, slot 00009bfc, cmnd 100f8c00
failing command because of reset, slot 00009f98, cmnd 103c4600
failing command because of reset, slot 0000a334, cmnd 103c4200
failing command because of reset, slot 0000a468, cmnd 103c4a00
failing command because of reset, slot 0000a6d0, cmnd 103c3800
failing command because of reset, slot 0000a938, cmnd 103c4400
failing command because of reset, slot 0000aa6c, cmnd 100f8e00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 4a 00 00 08 00
scsi0: (0:0) Synchronous at offset 8, period 100ns
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 9a 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 65 ca 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 01 4c c0 00 00 10 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 ca 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 7a 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 3a 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 8a 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 aa 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 e2 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 78 e2 00 00 20 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 01 4c 98 00 00 20 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 79 0a 00 00 08 00
scsi0 (0:0) New error handler wants to abort command
0x28 00 00 18 66 2a 00 00 08 00
scsi0 (0:0) New error handler wants device reset
0x28 00 00 18 66 f2 00 00 08 00
scsi0 (0:0) New error handler wants BUS reset, cmd 103c3a00
0x28 00 00 18 66 f2 00 00 08 00
scsi0: Bus Reset detected, executing command 00000000, slot 00000000, dsp
00358]scsi0: (0:0) Synchronous at offset 8, period 100ns
SLOTS FULL, but count is 8, should be 64
Stack Dump:
10370980: 0004ff0e 101ba834 10370900 10264aa0
10370970: 00000000 101ba548 103b8a00 103b8a68
10370960: 00000010 00000020 1010e2c8 0000001f
10370950: 90080000 ffffffba 00000000 00000002
10370940: f000b858 f0000704 00000000 00000001
10370930: 102f6660 100fa474 100fa49c 1027637c
Kernel addresses on the stack:
[<101ba834>] [<101ba548>] [<1010e2c8>] [<101b319c>]
[<101ab4a0>] [<101ac228>] [<101b3b80>] [<101204ac>]
[<101b319c>] [<1013df08>] [<101bb920>] [<10118604>]
[<1013f3d4>] [<101ac228>] [<101abf70>] [<1014e684>]
[<101ba834>] [<101205fc>] [<101204ac>] [<101b30e4>]
[<101201f4>] [<101207c4>] [<10106c4c>] [<10106cf4>]
[<101352bc>] [<101628d4>] [<101355b8>] [<10162b5c>]
[<10121580>] [<10162b5c>] [<1012071c>] [<1012150c>]
[<10129124>] [<1011f4bc>]
Kernel Fault: Code=26 regs=10370980 (Addr=00000114)
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW: 00000000000001001111111100001110 Not tainted
r00-03 00000000 10264810 101ba548 101abdb4
r04-07 10375c00 103b9e00 10311810 00000000
r08-11 103b9eb8 1027637c 100fa49c 100fa474
r12-15 102f6660 00000001 00000000 f0000704
r16-19 f000b858 00000002 00000000 00000000
r20-23 00000001 0000001f 102d13a0 102f6010
r24-27 00000001 00000001 00000003 10254010
r28-31 00000000 ffffe531 10370980 1011b954
sr0-3 00000000 00000129 00000000 00000129
sr4-7 00000000 00000000 00000000 00000000
IASQ: 00000000 00000000 IAOQ: 101ba548 101ba54c
IIR: 6b850228 ISR: 00000000 IOR: 00000114
CPU: 0 CR30: 10370000 CR31: 102e0000
ORIG_R28: 00000002
I tried decreasing the number of tags in 53c700.h, but that made the
problem even worse. However, increasing the number to an insane 128 fixed
the problem for me (and the number of active tags really goes up to 80
while doing heavy io and back to 0 at the end, so i don't blame the
devices for the problem). For now i guess there might be something fishy
with the "starving tag" detection and the attempt to "fix" the situation.
Kernel: 2.4.26-pa7, but any older kernel (tested since 2.4.21) shows the
exact same behaviour.
Thanks,
Jochen
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [parisc-linux] 53c700.c problems with tags?
2004-08-31 10:39 [parisc-linux] 53c700.c problems with tags? Jochen Friedrich
@ 2004-09-02 3:18 ` Grant Grundler
2004-09-02 10:48 ` Jochen Friedrich
0 siblings, 1 reply; 4+ messages in thread
From: Grant Grundler @ 2004-09-02 3:18 UTC (permalink / raw)
To: Jochen Friedrich; +Cc: parisc
On Tue, Aug 31, 2004 at 12:39:12PM +0200, Jochen Friedrich wrote:
> Hi,
>
> when doing heavy io on my 715/64, i sometimes get the following panic:
>
> scsi0 (0:0) Target is suffering from tag starvation.
Searching for "tag starvation" at
http://lists.parisc-linux.org/
yields:
http://lists.parisc-linux.org/pipermail/parisc-linux/2004-August/024357.html
hth,
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [parisc-linux] 53c700.c problems with tags?
2004-09-02 3:18 ` Grant Grundler
@ 2004-09-02 10:48 ` Jochen Friedrich
2004-09-02 15:45 ` Grant Grundler
0 siblings, 1 reply; 4+ messages in thread
From: Jochen Friedrich @ 2004-09-02 10:48 UTC (permalink / raw)
To: Grant Grundler; +Cc: parisc
Hi Grant,
> Searching for "tag starvation" at
> http://lists.parisc-linux.org/
>
> yields:
> http://lists.parisc-linux.org/pipermail/parisc-linux/2004-August/024357.html
Thanks. However, my post was not a question how to get rid of the problem
(my servers are running fine) but just a hint that there might be a bug in
the tag handling code of 53c700.c. I didn't really look at the code hard
enough, but to me it looks like if the tag queue is full, the most likely
next free tag is searched. If it isn't free, but another tag is, the
message is printed and some badness happens and the kernel halts.
Setting the queue depth to 1 always fixes the problem, because the drive
has no chance to reorder the tags.
Setting the queue depth to something as big as 128 also fixes the problem,
because the queue never gets that full.
Thanks,
Jochen
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [parisc-linux] 53c700.c problems with tags?
2004-09-02 10:48 ` Jochen Friedrich
@ 2004-09-02 15:45 ` Grant Grundler
0 siblings, 0 replies; 4+ messages in thread
From: Grant Grundler @ 2004-09-02 15:45 UTC (permalink / raw)
To: Jochen Friedrich; +Cc: parisc
On Thu, Sep 02, 2004 at 12:48:01PM +0200, Jochen Friedrich wrote:
> Thanks. However, my post was not a question how to get rid of the problem
> (my servers are running fine) but just a hint that there might be a bug in
> the tag handling code of 53c700.c.
ah ok. I didn't realize you were trying share another clue as to
what the problem might be. We know there were (are?) problems
with 53c700 tag handling.
> I didn't really look at the code hard enough, but to me it looks like
Well, I haven't looked at that driver recently...
> if the tag queue is full, the most likely
> next free tag is searched. If it isn't free, but another tag is, the
> message is printed
The tag number is no guarantee of ordering when using unordered
queue tags (LIFO and FIFO are the other two types of tags).
The driver needs to use some sort of time stamp to determine
tag starvation and not based on which tag numbers happen to be in use.
The message shouldn't be printed unless a tag doesn't complete for a while.
e.g. 3 seconds or more after other tagged IOs (which started later) have
completed.
> and some badness happens and the kernel halts.
My guess is the "tag starvation" message implies assumptions
about how unordered tags work are wrong. Someone needs to review
the allocation/deallocation of IO requests to make sure the driver
don't assume the next/prev tag is not used.
> Setting the queue depth to 1 always fixes the problem, because the drive
> has no chance to reorder the tags.
>
> Setting the queue depth to something as big as 128 also fixes the problem,
> because the queue never gets that full.
ok - thanks for the addional info.
grant
_______________________________________________
parisc-linux mailing list
parisc-linux@lists.parisc-linux.org
http://lists.parisc-linux.org/mailman/listinfo/parisc-linux
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2004-09-02 15:45 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-08-31 10:39 [parisc-linux] 53c700.c problems with tags? Jochen Friedrich
2004-09-02 3:18 ` Grant Grundler
2004-09-02 10:48 ` Jochen Friedrich
2004-09-02 15:45 ` Grant Grundler
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox