* aic7xxx & st: BUG at include/asm/dma-mapping.h:37
@ 2003-08-07 3:14 Adam Kropelin
2003-08-07 21:00 ` Kai Makisara
0 siblings, 1 reply; 8+ messages in thread
From: Adam Kropelin @ 2003-08-07 3:14 UTC (permalink / raw)
To: linux-scsi; +Cc: gibbs, Kai.Makisara
When trying to read from my SCSI tape drive using the wrong block size I
get the BUG trace shown below. I ran into this by accident after writing
a tape with variable blocksize and then trying to 'dd' from it using a
fixed blocksize.
The BUG trace is from 2.6.0-test2, but it's also reproducable on -test1
and test2-mm3. The box is running SMP + PREEMPT. SCSI boot-time messages
are shown below.
Steps to reproduce:
mt -f /dev/st0 setblk 0 # Set variable block size
dd if=/dev/zero of=/dev/st0 bs=1237 count=1 # Write an unusual block
mt -f /dev/st0 setblk 512 # Set block size to 512 fixed
dd if=/dev/st0 bs=512 # BUG
--Adam
kernel BUG at include/asm/dma-mapping.h:37!
invalid operand: 0000 [#1]
CPU: 0
EIP: 0060:[<c02b2eaf>] Not tainted
EFLAGS: 00010046
EIP is at ahc_linux_run_device_queue+0x3ef/0x8d0
eax: dfd88820 ebx: 00000001 ecx: dfd837e0 edx: 00000000
esi: dffa4038 edi: dfd400c6 ebp: dfd44068 esp: de0d3cf0
ds: 007b es: 007b ss: 0068
Process dd (pid: 1462, threadinfo=de0d2000 task=dfcf0d00)
Stack: 00000040 dfd837e0 dfd837e0 dfd400c0 c02b310a dfd40080 dfd400c0 00000040
00000040 dfd837e0 00000000 dfd837e0 c02ae6e2 c03e3d60 00000246 dfd400c0
dfd8ec00 00000000 00000001 dfd837e0 c02ae6e2 dfd8ec00 dfd362a0 00000000
Call Trace:
[<c02b310a>] ahc_linux_run_device_queue+0x64a/0x8d0
[<c02ae6e2>] ahc_linux_queue+0x222/0x270
[<c02ae6e2>] ahc_linux_queue+0x222/0x270
[<c01249f1>] add_timer+0x81/0xc0
[<c029355b>] scsi_dispatch_cmd+0x15b/0x1b0
[<c02936f0>] scsi_done+0x0/0x70
[<c0297f37>] scsi_request_fn+0x257/0x320
[<c025cf38>] blk_insert_request+0x78/0xb0
[<c025cf42>] blk_insert_request+0x82/0xb0
[<c0296d96>] scsi_insert_special_req+0x26/0x30
[<c0296e91>] scsi_do_req+0x71/0x80
[<c0292fda>] scsi_allocate_request+0x1a/0x60
[<c02b69dc>] st_do_scsi+0x10c/0x150
[<c02b6820>] st_sleep_done+0x0/0xb0
[<c02b9a24>] st_int_ioctl+0x6d4/0xa40
[<c02b8676>] read_tape+0x266/0x3b0
[<c02b7bce>] setup_buffering+0x6e/0x100
[<c02b8a35>] st_read+0x275/0x3b0
[<c0146076>] do_brk+0x116/0x1e0
[<c01510ea>] vfs_read+0xaa/0xe0
[<c01512df>] sys_read+0x2f/0x50
[<c01090ef>] syscall_call+0x7/0xb
Code: 0f 0b 25 00 40 cd 37 c0 85 db 74 38 31 c9 89 da 90 8b 04 0e
<6>note: dd[1462] exited with preempt_count 1
-------------------
SCSI dmesg (scsi0 is another aic7xxx but is not involved in this
scenario, AFAICT):
scsi1 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 6.2.35
<Adaptec 2940 SCSI adapter>
aic7870: Single Channel A, SCSI Id=7, 16/253 SCBs
(scsi1:A:6): 10.000MB/s transfers (10.000MHz, offset 15)
Vendor: Quantum Model: DLT4000 Rev: D473
Type: Sequential-Access ANSI SCSI revision: 02
st: Version 20030622, fixed bufsize 32768, s/g segs 256
Attached scsi tape st0 at scsi1, channel 0, id 6, lun 0
st0: try direct i/o: yes, max page reachable by HBA 1048575
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-07 3:14 aic7xxx & st: BUG at include/asm/dma-mapping.h:37 Adam Kropelin
@ 2003-08-07 21:00 ` Kai Makisara
2003-08-08 0:19 ` Adam Kropelin
0 siblings, 1 reply; 8+ messages in thread
From: Kai Makisara @ 2003-08-07 21:00 UTC (permalink / raw)
To: Adam Kropelin; +Cc: linux-scsi, gibbs
No solution but some test results...
On Wed, 6 Aug 2003, Adam Kropelin wrote:
> When trying to read from my SCSI tape drive using the wrong block size I
> get the BUG trace shown below. I ran into this by accident after writing
> a tape with variable blocksize and then trying to 'dd' from it using a
> fixed blocksize.
>
> The BUG trace is from 2.6.0-test2, but it's also reproducable on -test1
> and test2-mm3. The box is running SMP + PREEMPT. SCSI boot-time messages
> are shown below.
>
> Steps to reproduce:
> mt -f /dev/st0 setblk 0 # Set variable block size
> dd if=/dev/zero of=/dev/st0 bs=1237 count=1 # Write an unusual block
> mt -f /dev/st0 setblk 512 # Set block size to 512 fixed
> dd if=/dev/st0 bs=512 # BUG
>
I tried this with a DDS4 drive and sym53c8xx_2 (with bs=1236 because the
sym driver does not like odd counts on wide bus). The result was correct,
i.e., the last dd fails and there is a message in syslog about incorrect
block size.
> --Adam
>
> kernel BUG at include/asm/dma-mapping.h:37!
This comes from 'BUG_ON(direction == DMA_NONE)' in dma_map_sg(). I
inserted a printk into st_do_scsi() in st.c to see if DMA_NONE is used for
some command that should transfer data. No errors seen in these printk
results.
--
Kai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-07 21:00 ` Kai Makisara
@ 2003-08-08 0:19 ` Adam Kropelin
2003-08-08 4:32 ` Kai Makisara
2003-08-08 17:30 ` Kai Mäkisara
0 siblings, 2 replies; 8+ messages in thread
From: Adam Kropelin @ 2003-08-08 0:19 UTC (permalink / raw)
To: Kai Makisara; +Cc: linux-scsi, gibbs
On Fri, Aug 08, 2003 at 12:00:56AM +0300, Kai Makisara wrote:
> No solution but some test results...
>
> On Wed, 6 Aug 2003, Adam Kropelin wrote:
>
> > Steps to reproduce:
> > mt -f /dev/st0 setblk 0 # Set variable block size
> > dd if=/dev/zero of=/dev/st0 bs=1237 count=1 # Write an unusual block
> > mt -f /dev/st0 setblk 512 # Set block size to 512 fixed
> > dd if=/dev/st0 bs=512 # BUG
> >
> I tried this with a DDS4 drive and sym53c8xx_2 (with bs=1236 because the
> sym driver does not like odd counts on wide bus). The result was correct,
The exact block size doesn't seem to matter; I picked 1237 pretty much
at random. Anything different from the fixed block size appears to cause the
problem (e.g., 510, 511, 513 and 514 all BUG here, but 512 does not).
> i.e., the last dd fails and there is a message in syslog about incorrect
> block size.
I do occasionally see a report from st about an incorrect block size
immediately before the oops. Most of the time the oops prevents that
message from displaying (I think) but I've seen it a couple of times.
My knowledge of the SCSI subsystem is weak, so forgive me if this is a
stupid question. If st knows the block size to be invalid, why does the
io ever get submitted at all? Perhaps the io gets issued and
completed, at which time st learns the block size was wrong, warns about
it, and somehow the next command thru the SCSI layer hoses up? I'm
really just grasping at straws here (as you can tell).
> > kernel BUG at include/asm/dma-mapping.h:37!
>
> This comes from 'BUG_ON(direction == DMA_NONE)' in dma_map_sg(). I
> inserted a printk into st_do_scsi() in st.c to see if DMA_NONE is used for
> some command that should transfer data. No errors seen in these printk
> results.
I tried aic7xxx_old and hit the same BUG so Justin's driver would seem
to be exonerated. The new backtrace is below. The commonality is clearly
the st driver and the SCSI midlayer. Perhaps I'll try to verbosify
things a bit in that area and see what comes up.
--Adam
kernel BUG at include/asm/dma-mapping.h:37!
invalid operand: 0000 [#1]
CPU: 1
EIP: 0060:[<c02aa956>] Not tainted
EFLAGS: 00010046
EIP is at aic7xxx_buildscb+0x196/0x300
eax: 00000001 ebx: e7dd6000 ecx: e7da8000 edx: e7de91a0
esi: e7de91fa edi: e7d95038 ebp: e7de4800 esp: e4657d48
ds: 007b es: 007b ss: 0068
Process dd (pid: 1436, threadinfo=e4656000 task=e4b38d40)
Stack: 00000001 e7da8000 e7de9260 00000246 e4656000 c01249f1 e7de4800 e7de91a0
00000000 e7de6dbc c02aabb7 e7de6dbc e7de91a0 e7de4800 e4656000 00000297
e7de91a0 e7de6c00 c029355b e7de91a0 c02936f0 00000001 e7de91a0 e7dd6000
Call Trace:
[<c01249f1>] add_timer+0x81/0xc0
[<c02aabb7>] aic7xxx_queue+0xf7/0x140
[<c029355b>] scsi_dispatch_cmd+0x15b/0x1b0
[<c02936f0>] scsi_done+0x0/0x70
[<c0297f37>] scsi_request_fn+0x257/0x320
[<c025cf38>] blk_insert_request+0x78/0xb0
[<c025cf42>] blk_insert_request+0x82/0xb0
[<c0296d96>] scsi_insert_special_req+0x26/0x30
[<c0296e91>] scsi_do_req+0x71/0x80
[<c0292fda>] scsi_allocate_request+0x1a/0x60
[<c02ad70c>] st_do_scsi+0x10c/0x150
[<c02ad550>] st_sleep_done+0x0/0xb0
[<c02b0754>] st_int_ioctl+0x6d4/0xa40
[<c02af3a6>] read_tape+0x266/0x3b0
[<c02ae8fe>] setup_buffering+0x6e/0x100
[<c02af765>] st_read+0x275/0x3b0
[<c0146076>] do_brk+0x116/0x1e0
[<c01510ea>] vfs_read+0xaa/0xe0
[<c01512df>] sys_read+0x2f/0x50
[<c01090ef>] syscall_call+0x7/0xb
Code: 0f 0b 25 00 80 37 37 c0 8b 04 24 85 c0 74 3e 8b 0c 24 31 d2
<6>note: dd[1436] exited with preempt_count 1
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-08 0:19 ` Adam Kropelin
@ 2003-08-08 4:32 ` Kai Makisara
2003-08-08 17:30 ` Kai Mäkisara
1 sibling, 0 replies; 8+ messages in thread
From: Kai Makisara @ 2003-08-08 4:32 UTC (permalink / raw)
To: Adam Kropelin; +Cc: Kai Makisara, linux-scsi, gibbs
On Thu, 7 Aug 2003, Adam Kropelin wrote:
> On Fri, Aug 08, 2003 at 12:00:56AM +0300, Kai Makisara wrote:
...
> My knowledge of the SCSI subsystem is weak, so forgive me if this is a
> stupid question. If st knows the block size to be invalid, why does the
> io ever get submitted at all? Perhaps the io gets issued and
> completed, at which time st learns the block size was wrong, warns about
> it, and somehow the next command thru the SCSI layer hoses up? I'm
> really just grasping at straws here (as you can tell).
>
The size of the next block on tape is not known until the block is read.
The read is done and it returns 'check condition' and the sense data says
what happens. st then logs this and returns error for the read. This is
standard behavior for tapes.
> > > kernel BUG at include/asm/dma-mapping.h:37!
> >
> > This comes from 'BUG_ON(direction == DMA_NONE)' in dma_map_sg(). I
> > inserted a printk into st_do_scsi() in st.c to see if DMA_NONE is used for
> > some command that should transfer data. No errors seen in these printk
> > results.
>
> I tried aic7xxx_old and hit the same BUG so Justin's driver would seem
> to be exonerated. The new backtrace is below. The commonality is clearly
> the st driver and the SCSI midlayer. Perhaps I'll try to verbosify
> things a bit in that area and see what comes up.
>
I will try to get my old Adaptec working tonight or during the weekend and
see if I can reproduce the error. It seems to be dependent on the SCSI
adapter.
--
Kai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-08 0:19 ` Adam Kropelin
2003-08-08 4:32 ` Kai Makisara
@ 2003-08-08 17:30 ` Kai Mäkisara
2003-08-08 17:58 ` Mr. James W. Laferriere
2003-08-11 17:41 ` Adam Kropelin
1 sibling, 2 replies; 8+ messages in thread
From: Kai Mäkisara @ 2003-08-08 17:30 UTC (permalink / raw)
To: Adam Kropelin; +Cc: linux-scsi, gibbs
Thus spake Adam Kropelin (akropel1@rochester.rr.com):
> On Fri, Aug 08, 2003 at 12:00:56AM +0300, Kai Makisara wrote:
...
> I tried aic7xxx_old and hit the same BUG so Justin's driver would seem
> to be exonerated. The new backtrace is below. The commonality is clearly
> the st driver and the SCSI midlayer. Perhaps I'll try to verbosify
> things a bit in that area and see what comes up.
>
I reproduced the problem with an aha2940. The patch at the end of this
message fixes the problem but I must think a little more before
sending a permanent fix (at least some comments must be added).
In case someone wonders what this has to do with the symptoms: the bug
lead to setting sr_use_sg to one even when the byte count was zero and
DMA direction DMA_NONE. The byte count in the first s/g segment was
non-zero. Different SCSI HBA drivers handle this inconsistency
differently and this is why I have not seen this in my tests.
This bug has been in st for quite a long time. Thanks for finally
detecting it :-)
--
Kai
----------------------------8<--------------------------------------
--- linux-2.6/drivers/scsi/st.c.2.6t2 2003-08-08 19:17:54.000000000 +0300
+++ linux-2.6/drivers/scsi/st.c 2003-08-08 19:56:49.000000000 +0300
@@ -2340,6 +2340,7 @@
int timeout;
long ltmp;
int ioctl_result;
+ int saved_do_dio;
int chg_eof = TRUE;
unsigned char cmd[MAX_COMMAND_SIZE];
Scsi_Request *SRpnt;
@@ -2609,8 +2610,15 @@
return (-ENOSYS);
}
+ /* Save the direct i/o state in case this is called from
+ error recovery */
+ saved_do_dio = STp->buffer->do_dio;
+ STp->buffer->do_dio = 0;
+
SRpnt = st_do_scsi(NULL, STp, cmd, datalen, direction,
timeout, MAX_RETRIES, TRUE);
+
+ STp->buffer->do_dio = saved_do_dio;
if (!SRpnt)
return (STp->buffer)->syscall_result;
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-08 17:30 ` Kai Mäkisara
@ 2003-08-08 17:58 ` Mr. James W. Laferriere
2003-08-09 7:09 ` Kai Makisara
2003-08-11 17:41 ` Adam Kropelin
1 sibling, 1 reply; 8+ messages in thread
From: Mr. James W. Laferriere @ 2003-08-08 17:58 UTC (permalink / raw)
To: Kai Mäkisara; +Cc: Linux Scsi maillist
Hello Kai , Is this (or something simular) necessary for 2.4.x ?
Tia , JimL
On Fri, 8 Aug 2003, Kai [iso-8859-1] Mäkisara wrote:
> Thus spake Adam Kropelin (akropel1@rochester.rr.com):
> > On Fri, Aug 08, 2003 at 12:00:56AM +0300, Kai Makisara wrote:
> ...
> > I tried aic7xxx_old and hit the same BUG so Justin's driver would seem
> > to be exonerated. The new backtrace is below. The commonality is clearly
> > the st driver and the SCSI midlayer. Perhaps I'll try to verbosify
> > things a bit in that area and see what comes up.
> I reproduced the problem with an aha2940. The patch at the end of this
> message fixes the problem but I must think a little more before
> sending a permanent fix (at least some comments must be added).
> In case someone wonders what this has to do with the symptoms: the bug
> lead to setting sr_use_sg to one even when the byte count was zero and
> DMA direction DMA_NONE. The byte count in the first s/g segment was
> non-zero. Different SCSI HBA drivers handle this inconsistency
> differently and this is why I have not seen this in my tests.
> This bug has been in st for quite a long time. Thanks for finally
> detecting it :-)
--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-08 17:58 ` Mr. James W. Laferriere
@ 2003-08-09 7:09 ` Kai Makisara
0 siblings, 0 replies; 8+ messages in thread
From: Kai Makisara @ 2003-08-09 7:09 UTC (permalink / raw)
To: Mr. James W. Laferriere; +Cc: Linux Scsi maillist
On Fri, 8 Aug 2003, Mr. James W. Laferriere wrote:
> Hello Kai , Is this (or something simular) necessary for 2.4.x ?
No. This bug crept in while adding direct transfers between the user
buffer and SCSI HBA at 2.5.32.
--
Kai
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: aic7xxx & st: BUG at include/asm/dma-mapping.h:37
2003-08-08 17:30 ` Kai Mäkisara
2003-08-08 17:58 ` Mr. James W. Laferriere
@ 2003-08-11 17:41 ` Adam Kropelin
1 sibling, 0 replies; 8+ messages in thread
From: Adam Kropelin @ 2003-08-11 17:41 UTC (permalink / raw)
To: Kai Mäkisara, linux-scsi
[Justin Gibbs dropped from the CC: list because he probably doesn't want
this thread filling his inbox any longer...]
On Fri, Aug 08, 2003 at 08:30:03PM +0300, Kai Mäkisara wrote:
> Thus spake Adam Kropelin (akropel1@rochester.rr.com):
>
> > On Fri, Aug 08, 2003 at 12:00:56AM +0300, Kai Makisara wrote:
> ...
> > I tried aic7xxx_old and hit the same BUG so Justin's driver would seem
> > to be exonerated. The new backtrace is below. The commonality is clearly
> > the st driver and the SCSI midlayer. Perhaps I'll try to verbosify
> > things a bit in that area and see what comes up.
> >
> I reproduced the problem with an aha2940. The patch at the end of this
> message fixes the problem but I must think a little more before
> sending a permanent fix (at least some comments must be added).
This does indeed solve the problem on my end (tested on -test3). I'd be
happy to test any future patch you might come up with.
> This bug has been in st for quite a long time. Thanks for finally
> detecting it :-)
No problem...thanks for taking such quick action!
--Adam
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2003-08-11 17:44 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2003-08-07 3:14 aic7xxx & st: BUG at include/asm/dma-mapping.h:37 Adam Kropelin
2003-08-07 21:00 ` Kai Makisara
2003-08-08 0:19 ` Adam Kropelin
2003-08-08 4:32 ` Kai Makisara
2003-08-08 17:30 ` Kai Mäkisara
2003-08-08 17:58 ` Mr. James W. Laferriere
2003-08-09 7:09 ` Kai Makisara
2003-08-11 17:41 ` Adam Kropelin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox