public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* BUG() in asm/pci.h:142 with 2.4.13
@ 2001-10-25 10:07 Christian Hammers
  2001-10-25 10:18 ` Jens Axboe
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Christian Hammers @ 2001-10-25 10:07 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christian Hammers

Hello

My system crashed several times now with 2.4.11-pre6 and 2.4.13
(pre6 because it was the first one I got that fixed some 2GB RAM memory
allocation bug).

2.4.13 was the easiest one to reproduce: when starting the tape backup
to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
(Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
two PIII and 2GB of RAM it crashed immediately with the error attached
below. The machine was under "stresstest-simulation" load at this time.

The tape_backup.pl uses the "mt" and "cpio" commands to access /dev/nst0.

Maybe worth noting is, that the system crashed another time yesterday 
after replacing the external SCSI RAID Chassis/Controller (not the
disks in it) and just this moment with another message (see below).

Any help or hints appreciated! 
[please keep me Cc'ed as I'm not subscribed to this list]

bye,

 -christian-


kernel: kernel BUG at /usr/local/src/kernel/linux-2.4.13/include/asm/pci.h:142!
kernel: invalid operand: 0000
kernel: CPU:    1
kernel: EIP:    0010:[ahc_linux_run_device_queue+899/2144]    Not tainted
kernel: EFLAGS: 00010082
kernel: eax: 00000048   ebx: f7bb5650   ecx: c0275a88   edx: 00010071
kernel: esi: c5915a30   edi: 00000000   ebp: c5915a30   esp: e9ae3e14
kernel: ds: 0018   es: 0018   ss: 0018
kernel: Process tape_backup.pl (pid: 4366, stackpage=e9ae3000)
kernel: Stack: c024e100 0000008e f7bbec00 e9ae3e6c 00000000 00000000 f5358de0 0000000e 
kernel:        f7bbec10 00000007 00000007 401af000 41ffffff 00000004 c5915600 c01b0e09  
kernel:        f7bbec00 c301fee0 00000202 d35ce1d4 c5915600 f7bbfa20 00000096 c01a5f76     
kernel: Call Trace: [ahc_linux_queue+361/424] [scsi_dispatch_cmd+354/632] [scsi_done+0/200] [scsi_request_fn+752/820] [__scsi_insert_special+110/128]  
kernel:    [scsi_insert_special_req+26/32] [scsi_do_req+284/324] [<f8a8940b>] [<f8a89240>] [<f8a8aad1>] [sys_write+143/196] 
kernel:    [system_call+51/56] 
kernel: 
kernel: Code: 0f 0b eb 18 90 83 7e 04 00 75 14 68 90 00 00 00 68 00 e1 24 

#
# The output from the other SCSI crash. This came from remote syslogging
# and console.
#

kernel: scsi0:0:0:0: Attempting to queue an ABORT message
kernel: (scsi0:A:0:0): Queuing a recovery SCB
kernel: scsi0:0:0:0: Device is disconnected, re-queuing SCB  
kernel: Recovery code sleeping
kernel: (scsi0:A:0:0): Abort Tag Message Sent
kernel: (scsi0:A:0:0): SCB 153 - Abort Tag Completed.
kernel: Recovery SCB completes
kernel: Recovery code awake   
kernel: aic7xxx_abort returns 8194
kernel: scsi0:0:0:0: Attempting to queue an ABORT message



Some more debugging help:

mtv-server:/usr/local/src/kernel/linux-2.4.13/include/asm# lspci    
00:00.0 Host bridge: VIA Technologies, Inc. VT82C691 [Apollo PRO] (rev c4)
00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598 [Apollo MVP3 AGP]
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super] (rev 40)
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586 IDE [Apollo] (rev
06)
00:07.2 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 1a)
00:07.3 USB Controller: VIA Technologies, Inc. VT82C586B USB (rev 1a)
00:07.4 SMBus: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
00:0a.0 Ethernet controller: 3Com Corporation 3c905B 100BaseTX [Cyclone]
(rev 30)
00:0c.0 SCSI storage controller: Adaptec 7892A (rev 02)
01:00.0 VGA compatible controller: ATI Technologies Inc Rage XL AGP (rev
27)

mtv-server:~$ cat /proc/scsi/scsi 
Attached devices: 
Host: scsi0 Channel: 00 Id: 00 Lun: 00
  Vendor: easyRAID Model:  U3              Rev: 0001
  Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 02 Lun: 00
  Vendor: HP       Model: C1537A           Rev: L708
  Type:   Sequential-Access                ANSI SCSI revision: 02


-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 10:07 BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
@ 2001-10-25 10:18 ` Jens Axboe
  2001-10-25 11:11 ` Jens Axboe
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2001-10-25 10:18 UTC (permalink / raw)
  To: Christian Hammers; +Cc: linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: multipart/mixed; boundary="0F1p//8PRICkK4MW", Size: 29 bytes --]

<<< No Message Collected >>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 10:07 BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
  2001-10-25 10:18 ` Jens Axboe
@ 2001-10-25 11:11 ` Jens Axboe
  2001-10-25 17:23   ` Christian Hammers
  2001-10-25 20:10 ` BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
  2001-10-30 14:25 ` BUG() in asm/pci.h:142 with 2.4.13 (cause found!) Christian Hammers
  3 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2001-10-25 11:11 UTC (permalink / raw)
  To: Christian Hammers; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 964 bytes --]

On Thu, Oct 25 2001, Christian Hammers wrote:
> Hello
> 
> My system crashed several times now with 2.4.11-pre6 and 2.4.13
> (pre6 because it was the first one I got that fixed some 2GB RAM memory
> allocation bug).
> 
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.
> 
> The tape_backup.pl uses the "mt" and "cpio" commands to access /dev/nst0.
> 
> Maybe worth noting is, that the system crashed another time yesterday 
> after replacing the external SCSI RAID Chassis/Controller (not the
> disks in it) and just this moment with another message (see below).

Could you try this patch and see if it fixes the pci.h BUG at least?

-- 
Jens Axboe


[-- Attachment #2: sg-page-1 --]
[-- Type: text/plain, Size: 327 bytes --]

--- drivers/scsi/scsi_merge.c~	Thu Oct 25 12:15:35 2001
+++ drivers/scsi/scsi_merge.c	Thu Oct 25 12:16:20 2001
@@ -943,6 +943,7 @@
 		}
 		count++;
 		sgpnt[count - 1].address = bh->b_data;
+		sgpnt[count - 1].page = NULL;
 		sgpnt[count - 1].length += bh->b_size;
 		if (!dma_host) {
 			SCpnt->request_bufflen += bh->b_size;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 11:11 ` Jens Axboe
@ 2001-10-25 17:23   ` Christian Hammers
  2001-10-25 17:32     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Christian Hammers @ 2001-10-25 17:23 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Hello

On Thu, Oct 25, 2001 at 01:11:07PM +0200, Jens Axboe wrote:
> > 2.4.13 was the easiest one to reproduce: when starting the tape backup
> > to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
> > (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> > two PIII and 2GB of RAM it crashed immediately with the error attached
> > below. The machine was under "stresstest-simulation" load at this time.

> Could you try this patch and see if it fixes the pci.h BUG at least?
This patch did not prevent the crash. Again immediately after rewinding the
tape when it began to write. I'll try now the 2.4.12-ac6... and it works.

> Jens Axboe
bye,

 -christian- (happy about Alan having forked the kernel tree once ago..)

-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 17:23   ` Christian Hammers
@ 2001-10-25 17:32     ` Jens Axboe
  2001-10-25 17:47       ` Christian Hammers
  2001-10-26  0:25       ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) David S. Miller
  0 siblings, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2001-10-25 17:32 UTC (permalink / raw)
  To: Christian Hammers; +Cc: linux-kernel

On Thu, Oct 25 2001, Christian Hammers wrote:
> Hello
> 
> On Thu, Oct 25, 2001 at 01:11:07PM +0200, Jens Axboe wrote:
> > > 2.4.13 was the easiest one to reproduce: when starting the tape backup
> > > to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
> > > (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> > > two PIII and 2GB of RAM it crashed immediately with the error attached
> > > below. The machine was under "stresstest-simulation" load at this time.
> 
> > Could you try this patch and see if it fixes the pci.h BUG at least?
> This patch did not prevent the crash. Again immediately after rewinding the
> tape when it began to write. I'll try now the 2.4.12-ac6... and it works.

Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
look.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 17:32     ` Jens Axboe
@ 2001-10-25 17:47       ` Christian Hammers
  2001-10-26  0:25       ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) David S. Miller
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Hammers @ 2001-10-25 17:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-kernel

Hello

On Thu, Oct 25, 2001 at 07:32:48PM +0200, Jens Axboe wrote:
> > This patch did not prevent the crash. Again immediately after rewinding the
> > tape when it began to write. I'll try now the 2.4.12-ac6... and it works.
> Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> look.

The 2.4.12-ac6 crashed, too, when I killed the dd and cpio processes 
with SIGKILL. I got the extra scsi queue debug information on the console
but it was too much to write down. I now have a serial connection to
another computer and did "ln /dev/ttyS1 /dev/console" in the hope to be
able to save you all kernel output.

 -christian-

-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13
  2001-10-25 10:07 BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
  2001-10-25 10:18 ` Jens Axboe
  2001-10-25 11:11 ` Jens Axboe
@ 2001-10-25 20:10 ` Christian Hammers
  2001-10-30 14:25 ` BUG() in asm/pci.h:142 with 2.4.13 (cause found!) Christian Hammers
  3 siblings, 0 replies; 16+ messages in thread
From: Christian Hammers @ 2001-10-25 20:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-scsi

Hello

Now it crashed again when writing the tape but after writing 1GB to it.
This time I could capture the output of the extra new-queue debugging
output that I enabled in the kernel configuration (this time 2.4.12-ac6).

For the linux-scsi guys: if you need more information please take a look
at my previous posts to linux-kernel (same subject) or contact me directly.

bye,

 -christian-

On Thu, Oct 25, 2001 at 12:07:01PM +0200, Christian Hammers wrote:
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.

#
# console dump via minicom and serial line
#
 /USR/SBIN/CRON[10591]: (root) CMD
(/usr/local/maint/watchdog)
 kernel: scsi0:0:0:0: Attempting to queue an
ABORT message
 kernel: scsi0: Dumping Card State while idle, at
SEQADDR 0x8
 kernel: ACCUM = 0x0, SINDEX = 0x20, DINDEX =
0xe4, ARG_2 = 0x0
 kernel: HCNT = 0x0
 kernel: SCSISEQ = 0x12, SBLKCTL = 0xa
 kernel:  DFCNTRL = 0x0, DFSTATUS = 0x89
 kernel: LASTPHASE = 0x1, SCSISIGI = 0x0,
SXFRCTL0 = 0x80
 kernel: SSTAT0 = 0x0, SSTAT1 = 0x8
 kernel: SCSIPHASE = 0x0
 kernel: STACK == 0x3, 0x108, 0x160, 0x0
 kernel: SCB count = 248
 kernel: Kernel NEXTQSCB = 62
 kernel: Card NEXTQSCB = 62
 kernel: QINFIFO entries: 
 kernel: Waiting Queue entries: 
 kernel: Disconnected Queue entries: 6:150 12:210
2:178 29:28 23:181 13:61 24:7 
 kernel: QOUTFIFO entries: 
 kernel: Sequencer Free SCB List: 22 26 9 4 11 19
10 16 28 1 8 5 27 7 20 31 0 30 
25 17 21 3 14 15 18 
 kernel: Pending list: 150, 210, 178, 28, 181,
61, 7
 kernel: Kernel Free SCB list: 32 77 79 149 198
171 223 152 140 105 189 151 78 66
 199 88 6 224 138 177 67 84 194 191 23 246 215 24 160 185 225 230 93 174 49
241 110 2 20 147 170 240 33 59 
243 99 54 175 19 176 18 192 76 100 190 238 108 8 159 208 207 60 242 217 56
221 1 17 213 92 127 70 162 74 19
7 142 239 196 82 124 29 235 134 232 123 179 218 139 211 117 3 119 57 219
125 122 209 101 44 155 45 39 212 1
28 233 202 158 91 187 46 0 180 182 201 109 118 228 131 12 4 112 229 200 236
173 132 247 97 186 148 55 216 1
33 144 113 231 30 63 37 137 206 156 83 146 135 141 161 64 165 98 35 234 166
81 9 10 214 43 58 111 71 115 10
6 85 183 72 11 204 172 157 130 47 154 188 226 90 220 96 107 27 227 145 40
87 22 94 129 48 205 65 120 73 163
 69 26 41 86 103 68 169 53 5 237 167 42 51 195 15 38 80 13 168 21 89 52 16
114 50 193 36 136 75 25 34 95 14
 153 203 126 222 116 31 143 104 164 121 102 184 245 244 
 kernel: DevQ(0:0:0): 0 waiting
 kernel: DevQ(0:2:0): 0 waiting
 kernel: (scsi0:A:0:0): Queuing a recovery SCB
 kernel: scsi0:0:0:0: Device is disconnected,
re-queuing SCB
 kernel: Recovery code sleeping
 kernel: (scsi0:A:0:0): Abort Tag Message Sent
 kernel: (scsi0:A:0:0): SCB 7 - Abort Tag
Completed.
 kernel: Recovery SCB completes
 kernel: Recovery code awake
 kernel: aic7xxx_abort returns 0x2002
 sendmail[349]: rejecting connections on daemon
MTA: load average: 83
<80>xüxÀ<80>xÀ<80>x<ð<80>xx<xþ<80><80><80>øx<øxÀ<80>xü<80>x<ðøxüøxÀøx<80><80><80><80><80>xÀøø<80><80><80>


-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
  2001-10-25 17:32     ` Jens Axboe
  2001-10-25 17:47       ` Christian Hammers
@ 2001-10-26  0:25       ` David S. Miller
  2001-10-26  2:26         ` Jeff V. Merkey
  2001-10-28  1:34         ` Pete Harlan
  1 sibling, 2 replies; 16+ messages in thread
From: David S. Miller @ 2001-10-26  0:25 UTC (permalink / raw)
  To: axboe; +Cc: ch, harlan, linux-kernel

   From: Jens Axboe <axboe@suse.de>
   Date: Thu, 25 Oct 2001 19:32:48 +0200

   On Thu, Oct 25 2001, Christian Hammers wrote:
   > This patch did not prevent the crash. Again immediately after rewinding the
   > tape when it began to write. I'll try now the 2.4.12-ac6... and it works.
   
   Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
   look.

Can people try out this patch?  I believe this will fix the bug.

--- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
+++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001
@@ -3233,6 +3233,7 @@
 				break;
 			}
 		}
+		tb->sg[0].page = NULL;
 		if (tb->sg[segs].address == NULL) {
 			kfree(tb);
 			tb = NULL;
@@ -3264,6 +3265,7 @@
 					tb = NULL;
 					break;
 				}
+				tb->sg[segs].page = NULL;
 				tb->sg[segs].length = b_size;
 				got += b_size;
 				segs++;
@@ -3337,6 +3339,7 @@
 			normalize_buffer(STbuffer);
 			return FALSE;
 		}
+		STbuffer->sg[segs].page = NULL;
 		STbuffer->sg[segs].length = b_size;
 		STbuffer->sg_segs += 1;
 		got += b_size;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes
  2001-10-26  2:26         ` Jeff V. Merkey
@ 2001-10-26  1:32           ` David S. Miller
  2001-10-26  3:56             ` Jeff V. Merkey
  2001-10-26  1:33           ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) Christian Hammers
  1 sibling, 1 reply; 16+ messages in thread
From: David S. Miller @ 2001-10-26  1:32 UTC (permalink / raw)
  To: jmerkey; +Cc: axboe, ch, harlan, linux-kernel

   From: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
   Date: Thu, 25 Oct 2001 19:26:48 -0700
   
   Is this waht's causing the earlier bug I reported in 2.4.10?  If so 
   where is this patch so I can see if it fixes the problem.
   
The patch was in the email, can't you read? :-)
(You even quoted the patch in your reply!)

Anyways, my patch isn't relevant to your problem since the
bug I am fixing only can exist in 2.4.13 and later kernels.
Sorry.

   
   On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
   > Can people try out this patch?  I believe this will fix the bug.
   > 
   > --- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
   > +++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001

Franks a lot,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
  2001-10-26  2:26         ` Jeff V. Merkey
  2001-10-26  1:32           ` SCSI tape crashes David S. Miller
@ 2001-10-26  1:33           ` Christian Hammers
  1 sibling, 0 replies; 16+ messages in thread
From: Christian Hammers @ 2001-10-26  1:33 UTC (permalink / raw)
  To: Jeff V. Merkey; +Cc: David S. Miller, axboe, harlan, linux-kernel

Hi

Some addition: the kernel (at least the 2.4.11-pre6 worked well with the 
tape streamer before the day I replaced the external RAID chassis (broken
display) with a new one. My personal guess in this case is that the 
new RAID has a different firmware and maybe a bug that triggers the crash-
condition whenever a second device (here the scsi tape) tries to use the
bus, too.

Would this scenario fit into your idea of the bug?  

bye,

 -christian-

On Thu, Oct 25, 2001 at 07:26:48PM -0700, Jeff V. Merkey wrote:
> >    Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
> >    look.
> > 
> > Can people try out this patch?  I believe this will fix the bug.
> > 
> > --- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
> > +++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001
> > @@ -3233,6 +3233,7 @@

-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
  2001-10-26  0:25       ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) David S. Miller
@ 2001-10-26  2:26         ` Jeff V. Merkey
  2001-10-26  1:32           ` SCSI tape crashes David S. Miller
  2001-10-26  1:33           ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) Christian Hammers
  2001-10-28  1:34         ` Pete Harlan
  1 sibling, 2 replies; 16+ messages in thread
From: Jeff V. Merkey @ 2001-10-26  2:26 UTC (permalink / raw)
  To: David S. Miller; +Cc: axboe, ch, harlan, linux-kernel


David,

Is this waht's causing the earlier bug I reported in 2.4.10?  If so 
where is this patch so I can see if it fixes the problem.

Thanks,

Jeff


On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
>    From: Jens Axboe <axboe@suse.de>
>    Date: Thu, 25 Oct 2001 19:32:48 +0200
> 
>    On Thu, Oct 25 2001, Christian Hammers wrote:
>    > This patch did not prevent the crash. Again immediately after rewinding the
>    > tape when it began to write. I'll try now the 2.4.12-ac6... and it works.
>    
>    Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
>    look.
> 
> Can people try out this patch?  I believe this will fix the bug.
> 
> --- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
> +++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001
> @@ -3233,6 +3233,7 @@
>  				break;
>  			}
>  		}
> +		tb->sg[0].page = NULL;
>  		if (tb->sg[segs].address == NULL) {
>  			kfree(tb);
>  			tb = NULL;
> @@ -3264,6 +3265,7 @@
>  					tb = NULL;
>  					break;
>  				}
> +				tb->sg[segs].page = NULL;
>  				tb->sg[segs].length = b_size;
>  				got += b_size;
>  				segs++;
> @@ -3337,6 +3339,7 @@
>  			normalize_buffer(STbuffer);
>  			return FALSE;
>  		}
> +		STbuffer->sg[segs].page = NULL;
>  		STbuffer->sg[segs].length = b_size;
>  		STbuffer->sg_segs += 1;
>  		got += b_size;
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes
  2001-10-26  1:32           ` SCSI tape crashes David S. Miller
@ 2001-10-26  3:56             ` Jeff V. Merkey
  0 siblings, 0 replies; 16+ messages in thread
From: Jeff V. Merkey @ 2001-10-26  3:56 UTC (permalink / raw)
  To: David S. Miller; +Cc: axboe, ch, harlan, linux-kernel



David,

Thanks.  I'll wait to hear from Kai since it looks like something
related to his code.  I did find the patch, but was using mutt,
and you were correct, it was attached at the end of the email.

:-)

Jeff


On Thu, Oct 25, 2001 at 06:32:48PM -0700, David S. Miller wrote:
>    From: "Jeff V. Merkey" <jmerkey@vger.timpanogas.org>
>    Date: Thu, 25 Oct 2001 19:26:48 -0700
>    
>    Is this waht's causing the earlier bug I reported in 2.4.10?  If so 
>    where is this patch so I can see if it fixes the problem.
>    
> The patch was in the email, can't you read? :-)
> (You even quoted the patch in your reply!)
> 
> Anyways, my patch isn't relevant to your problem since the
> bug I am fixing only can exist in 2.4.13 and later kernels.
> Sorry.
> 
>    
>    On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
>    > Can people try out this patch?  I believe this will fix the bug.
>    > 
>    > --- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
>    > +++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001
> 
> Franks a lot,
> David S. Miller
> davem@redhat.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
  2001-10-26  0:25       ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) David S. Miller
  2001-10-26  2:26         ` Jeff V. Merkey
@ 2001-10-28  1:34         ` Pete Harlan
  1 sibling, 0 replies; 16+ messages in thread
From: Pete Harlan @ 2001-10-28  1:34 UTC (permalink / raw)
  To: David S. Miller; +Cc: axboe, ch, linux-kernel

On Thu, Oct 25, 2001 at 05:25:41PM -0700, David S. Miller wrote:
>    From: Jens Axboe <axboe@suse.de>
>    Date: Thu, 25 Oct 2001 19:32:48 +0200
> 
>    On Thu, Oct 25 2001, Christian Hammers wrote:
>    > This patch did not prevent the crash. Again immediately after
>    > rewinding the 
>    > tape when it began to write. I'll try now the 2.4.12-ac6... and
>    > it works.
>    
>    Ok, someone else is meddling with the scatterlist then. I'll take a 2nd
>    look.
> 
> Can people try out this patch?  I believe this will fix the bug.

Yessiree, that fixed the scsi tape lockups we had in 2.4.13.

Many thanks,

--Pete harlan@artselect.com


> --- drivers/scsi/st.c.~1~	Sun Oct 21 02:47:53 2001
> +++ drivers/scsi/st.c	Thu Oct 25 17:23:45 2001
> @@ -3233,6 +3233,7 @@
>  				break;
>  			}
>  		}
> +		tb->sg[0].page = NULL;
>  		if (tb->sg[segs].address == NULL) {
>  			kfree(tb);
>  			tb = NULL;
> @@ -3264,6 +3265,7 @@
>  					tb = NULL;
>  					break;
>  				}
> +				tb->sg[segs].page = NULL;
>  				tb->sg[segs].length = b_size;
>  				got += b_size;
>  				segs++;
> @@ -3337,6 +3339,7 @@
>  			normalize_buffer(STbuffer);
>  			return FALSE;
>  		}
> +		STbuffer->sg[segs].page = NULL;
>  		STbuffer->sg[segs].length = b_size;
>  		STbuffer->sg_segs += 1;
>  		got += b_size;

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: BUG() in asm/pci.h:142 with 2.4.13 (cause found!)
  2001-10-25 10:07 BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
                   ` (2 preceding siblings ...)
  2001-10-25 20:10 ` BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
@ 2001-10-30 14:25 ` Christian Hammers
  3 siblings, 0 replies; 16+ messages in thread
From: Christian Hammers @ 2001-10-30 14:25 UTC (permalink / raw)
  To: linux-kernel

Hello

The cause for my problems with crashing kernels when accessing the tape
drive were differences between the original external RAID that belongs
to the machine and a temporarily, nearly equal, RAID that we attached while
the original was send away for repair. 
The temp. RAID was Ultra3 160 and /proc/scsi/aic7xxx/0 showed me a cur.
speed of 160 while the "old" and working one only did Ultra2 with 80Mbit/s.

I don't know if it's a hardware incompatibility or if the Linux kernel 
drivers cannot handle this specific case.

The problem exists in 2.4.11-pre6, 2.4.13, 2.4.12-ac6 and 2.4.13 with
patched from axboa(?) and D. Miller.
 
bye,

 -christian-


On Thu, Oct 25, 2001 at 12:07:01PM +0200, Christian Hammers wrote:
...
> 2.4.13 was the easiest one to reproduce: when starting the tape backup
> to a HP DDS3/DAT Streamer (C1537A) via a Adaptec SCSI Controller 
> (Adaptec 7892A in /proc/pci) on a Gigabyte GA-6VTXD Dual Motherboard with
> two PIII and 2GB of RAM it crashed immediately with the error attached
> below. The machine was under "stresstest-simulation" load at this time.
...
> kernel: kernel BUG at /usr/local/src/kernel/linux-2.4.13/include/asm/pci.h:142!
...
> kernel: scsi0:0:0:0: Attempting to queue an ABORT message
> kernel: (scsi0:A:0:0): Queuing a recovery SCB
> kernel: scsi0:0:0:0: Device is disconnected, re-queuing SCB  
...

-- 
Christian Hammers    WESTEND GmbH - Aachen und Dueren     Tel 0241/701333-0
ch@westend.com     Internet & Security for Professionals    Fax 0241/911879
           WESTEND ist CISCO Systems Partner - Premium Certified


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
       [not found] ` <fa.j17q3gv.m6e1ju@ifi.uio.no>
@ 2001-10-30 16:58   ` Dan Maas
  2001-10-31  8:33     ` Jens Axboe
  0 siblings, 1 reply; 16+ messages in thread
From: Dan Maas @ 2001-10-30 16:58 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

> Can people try out this patch?  I believe this will fix the bug.
> + tb->sg[0].page = NULL;
>   if (tb->sg[segs].address == NULL) {

For the sake of making this clear to other kernel hackers (I got bitten by
it too) - starting with 2.4.13 you must zero out the fields of struct
scatterlist that you are not using. i.e. it is no longer sufficient to
simply set sg.address and sg.length, because junk might still be present in
the new sg.page field, and pci_map_*() will BUG() if both sg.address and
sg.page are non-zero.

Regards,
Dan



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13)
  2001-10-30 16:58   ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) Dan Maas
@ 2001-10-31  8:33     ` Jens Axboe
  0 siblings, 0 replies; 16+ messages in thread
From: Jens Axboe @ 2001-10-31  8:33 UTC (permalink / raw)
  To: Dan Maas; +Cc: David S. Miller, linux-kernel

On Tue, Oct 30 2001, Dan Maas wrote:
> > Can people try out this patch?  I believe this will fix the bug.
> > + tb->sg[0].page = NULL;
> >   if (tb->sg[segs].address == NULL) {
> 
> For the sake of making this clear to other kernel hackers (I got bitten by
> it too) - starting with 2.4.13 you must zero out the fields of struct
> scatterlist that you are not using. i.e. it is no longer sufficient to
> simply set sg.address and sg.length, because junk might still be present in
> the new sg.page field, and pci_map_*() will BUG() if both sg.address and
> sg.page are non-zero.

True, perhaps we should add a init_sg or something like that.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2001-10-31  8:34 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2001-10-25 10:07 BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
2001-10-25 10:18 ` Jens Axboe
2001-10-25 11:11 ` Jens Axboe
2001-10-25 17:23   ` Christian Hammers
2001-10-25 17:32     ` Jens Axboe
2001-10-25 17:47       ` Christian Hammers
2001-10-26  0:25       ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) David S. Miller
2001-10-26  2:26         ` Jeff V. Merkey
2001-10-26  1:32           ` SCSI tape crashes David S. Miller
2001-10-26  3:56             ` Jeff V. Merkey
2001-10-26  1:33           ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) Christian Hammers
2001-10-28  1:34         ` Pete Harlan
2001-10-25 20:10 ` BUG() in asm/pci.h:142 with 2.4.13 Christian Hammers
2001-10-30 14:25 ` BUG() in asm/pci.h:142 with 2.4.13 (cause found!) Christian Hammers
     [not found] <fa.cdhetrv.1828dgd@ifi.uio.no>
     [not found] ` <fa.j17q3gv.m6e1ju@ifi.uio.no>
2001-10-30 16:58   ` SCSI tape crashes (was Re: BUG() in asm/pci.h:142 with 2.4.13) Dan Maas
2001-10-31  8:33     ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox