PROBLEM: kernel crashes on RAID1 drive error

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* PROBLEM: kernel crashes on RAID1 drive error
@ 2004-10-20 22:08 Mark Rustad
  2004-10-21  8:45 ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rustad @ 2004-10-20 22:08 UTC (permalink / raw)
  To: linux-raid, linux-scsi

Folks,

I have been having trouble with kernel crashes resulting from RAID1 
component device failures. I have been testing the robustness of an 
embedded system and have been using a drive that is known to fail after 
a time under load. When this device returns a media error, I always 
wind up with either a kernel hang or reboot. In this environment, each 
drive has four partitions, each of which is part of a RAID1 with its 
partner on the other device. Swap is on md2 so even it should be 
robust.

I have gotten this result with the SuSE standard i386 smp kernels 
2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.

The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
controller with two Seagate drives on a SAF-TE bus. I run three or four 
dd commands copying /dev/md0 to /dev/null to provide the activity that 
stimulates the failure.

I suspect that something is going wrong in the retry of the failed I/O 
operations, but I'm really not familiar with any of this area of the 
kernel at all.

In one failure, I get the following messages from kernel 2.6.9:

raid1: Disk failure on sdb1, disabling device.
raid1: sdb1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
raid1: sdb1: rescheduling sector 184
raid1: sda1: redirecting sector 184 to another mirror
Incorrect number of segments after building list
counted 2, received 1
req nr_sec 0, cur_nr_sec 7
raid1: sda1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
Incorrect number of segments after building list
counted 2, received 1
req nr_sec 0, cur_nr_sec 7
raid1: sda1: rescheduling sector 184
raid1: sda1: redirecting sector 184 to another mirror
Incorrect number of segments after building list
counted 3, received 1
req nr_sec 0, cur_nr_sec 7
raid1: sda1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
Incorrect number of segments after building list
counted 2, received 1
---

The above messages go on essentially forever. At least until this 
activity itself causes something to wedge.

The other failure I get is an oops. Here is the output from ksymoops:

ksymoops 2.4.9 on i686 2.6.5-7.97-bigsmp.  Options used
      -v vmlinux (specified)
      -K (specified)
      -L (specified)
      -O (specified)
      -M (specified)

kernel BUG at /usr/src/linux-2.6.9/fs/buffer.c:614!
invalid operand: 0000 [#1]
CPU:    1
EIP:    0060:[<c014faf9>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.9-3d-1)
eax: 00000019   ebx: c0dc695c   ecx: c0dc695c   edx: 00000001
esi: 00000001   edi: 00000000   ebp: 00000000   esp: df9f7d30
ds: 007b   es: 007b   ss: 0068
Stack: dec21540 c0152128 00000000 00000000 c015214b dec21540 c0153338 
c0152956
        c02f26b9 f7cf1d80 df8aea00 f7cf1dc0 f7cf1dc0 df8aea00 c02f2738 
c013637e
        f7cf1dc0 00000001 df8aea00 00000000 c02f2815 00002002 d2a0ab00 
df9f7d94
Call Trace:
  [<c0152128>] end_bio_bh_io_sync+0x0/0x3b
  [<c015214b>] end_bio_bh_io_sync+0x23/0x3b
  [<c0153338>] bio_endio+0x3b/0x65
  [<c0152956>] bio_put+0x21/0x2d
  [<c02f26b9>] put_all_bios+0x3d/0x57
  [<c02f2738>] raid_end_bio_io+0x22/0xb8
  [<c013637e>] mempool_free+0x6c/0x73
  [<c02f2815>] raid1_end_read_request+0x47/0xcb
  [<c02a846d>] scsi_softirq+0xbf/0xcd
  [<c0136257>] mempool_alloc+0x66/0x121
  [<c02f27ce>] raid1_end_read_request+0x0/0xcb
  [<c0153338>] bio_endio+0x3b/0x65
  [<c0279dd4>] __end_that_request_first+0xe3/0x22d
  [<c011e537>] prepare_to_wait_exclusive+0x15/0x4c
  [<c02ac212>] scsi_end_request+0x1b/0xa6
  [<c02ac56d>] scsi_io_completion+0x16a/0x4a3
  [<c011d2d5>] __wake_up+0x32/0x43
  [<c02a851e>] scsi_finish_command+0x7d/0xd1
  [<c02a846d>] scsi_softirq+0xbf/0xcd
  [<c0124342>] __do_softirq+0x62/0xcd
  [<c01243da>] do_softirq+0x2d/0x35
  [<c0108b38>] do_IRQ+0x112/0x129
  [<c0106cc0>] common_interrupt+0x18/0x20
  [<c027007b>] uart_block_til_ready+0x18e/0x193
  [<c02f2b60>] unplug_slaves+0x95/0x97
  [<c02f3b29>] raid1d+0x186/0x18e
  [<c02f85ac>] md_thread+0x174/0x19a
  [<c011e5b9>] autoremove_wake_function+0x0/0x37
  [<c011e5b9>] autoremove_wake_function+0x0/0x37
  [<c02f8438>] md_thread+0x0/0x19a
  [<c01047fd>] kernel_thread_helper+0x5/0xb
Code: ff f0 0f ba 2f 01 eb a0 8b 02 a8 04 74 2a 5b 89 ea b8 f4 28 3e c0 
5e 5f 5d


 >>EIP; c014faf9 <end_buffer_async_read+a4/bb>   <=====

 >>ebx; c0dc695c <pg0+83995c/3fa71400>
 >>ecx; c0dc695c <pg0+83995c/3fa71400>
 >>esp; df9f7d30 <pg0+1f46ad30/3fa71400>

Trace; c0152128 <end_bio_bh_io_sync+0/3b>
Trace; c015214b <end_bio_bh_io_sync+23/3b>
Trace; c0153338 <bio_endio+3b/65>
Trace; c0152956 <bio_put+21/2d>
Trace; c02f26b9 <put_all_bios+3d/57>
Trace; c02f2738 <raid_end_bio_io+22/b8>
Trace; c013637e <mempool_free+6c/73>
Trace; c02f2815 <raid1_end_read_request+47/cb>
Trace; c02a846d <scsi_softirq+bf/cd>
Trace; c0136257 <mempool_alloc+66/121>
Trace; c02f27ce <raid1_end_read_request+0/cb>
Trace; c0153338 <bio_endio+3b/65>
Trace; c0279dd4 <__end_that_request_first+e3/22d>
Trace; c011e537 <prepare_to_wait_exclusive+15/4c>
Trace; c02ac212 <scsi_end_request+1b/a6>
Trace; c02ac56d <scsi_io_completion+16a/4a3>
Trace; c011d2d5 <__wake_up+32/43>
Trace; c02a851e <scsi_finish_command+7d/d1>
Trace; c02a846d <scsi_softirq+bf/cd>
Trace; c0124342 <__do_softirq+62/cd>
Trace; c01243da <do_softirq+2d/35>
Trace; c0108b38 <do_IRQ+112/129>
Trace; c0106cc0 <common_interrupt+18/20>
Trace; c027007b <uart_block_til_ready+18e/193>
Trace; c02f2b60 <unplug_slaves+95/97>
Trace; c02f3b29 <raid1d+186/18e>
Trace; c02f85ac <md_thread+174/19a>
Trace; c011e5b9 <autoremove_wake_function+0/37>
Trace; c011e5b9 <autoremove_wake_function+0/37>
Trace; c02f8438 <md_thread+0/19a>
Trace; c01047fd <kernel_thread_helper+5/b>

Code;  c014faf9 <end_buffer_async_read+a4/bb>
00000000 <_EIP>:
Code;  c014faf9 <end_buffer_async_read+a4/bb>   <=====
    0:   ff f0                     push   %eax   <=====
Code;  c014fafb <end_buffer_async_read+a6/bb>
    2:   0f ba 2f 01               btsl   $0x1,(%edi)
Code;  c014faff <end_buffer_async_read+aa/bb>
    6:   eb a0                     jmp    ffffffa8 <_EIP+0xffffffa8>
Code;  c014fb01 <end_buffer_async_read+ac/bb>
    8:   8b 02                     mov    (%edx),%eax
Code;  c014fb03 <end_buffer_async_read+ae/bb>
    a:   a8 04                     test   $0x4,%al
Code;  c014fb05 <end_buffer_async_read+b0/bb>
    c:   74 2a                     je     38 <_EIP+0x38>
Code;  c014fb07 <end_buffer_async_read+b2/bb>
    e:   5b                        pop    %ebx
Code;  c014fb08 <end_buffer_async_read+b3/bb>
    f:   89 ea                     mov    %ebp,%edx
Code;  c014fb0a <end_buffer_async_read+b5/bb>
   11:   b8 f4 28 3e c0            mov    $0xc03e28f4,%eax
Code;  c014fb0f <end_buffer_async_read+ba/bb>
   16:   5e                        pop    %esi
Code;  c014fb10 <end_buffer_async_write+0/de>
   17:   5f                        pop    %edi
Code;  c014fb11 <end_buffer_async_write+1/de>
   18:   5d                        pop    %ebp

  <0>Kernel panic - not syncing: Fatal exception in interrupt
---

In these cases, the kernel is a monolithic kernel - no modules at all. 
Since the problem also happens with the standard SuSE smp kernel, which 
does have modules, I don't believe that that is a factor. We just don't 
need modules in our embedded system.

I don't know if the problem is in the raid1 code, in the general SCSI 
code or in the Adaptec driver somewhere. Does anyone have a clue?

Note that using mdadm to fail a drive is utterly unlike this and seems 
to work ok. It seems to take an honest-to-goodness broken drive to get 
this failure. Of course, the whole point of RAID1 is to handle a 
failing drive, so this is kind of a serious problem.

-- 
Mark Rustad, MRustad@aol.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-20 22:08 PROBLEM: kernel crashes on RAID1 drive error Mark Rustad
@ 2004-10-21  8:45 ` Jens Axboe
  2004-10-21 13:52   ` Paul Clements
  2004-10-21 16:31   ` Mark Rustad
  0 siblings, 2 replies; 13+ messages in thread
From: Jens Axboe @ 2004-10-21  8:45 UTC (permalink / raw)
  To: Mark Rustad; +Cc: linux-raid, linux-scsi

On Wed, Oct 20 2004, Mark Rustad wrote:
> Folks,
> 
> I have been having trouble with kernel crashes resulting from RAID1 
> component device failures. I have been testing the robustness of an 
> embedded system and have been using a drive that is known to fail after 
> a time under load. When this device returns a media error, I always 
> wind up with either a kernel hang or reboot. In this environment, each 
> drive has four partitions, each of which is part of a RAID1 with its 
> partner on the other device. Swap is on md2 so even it should be 
> robust.
> 
> I have gotten this result with the SuSE standard i386 smp kernels 
> 2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
> kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.
> 
> The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
> controller with two Seagate drives on a SAF-TE bus. I run three or four 
> dd commands copying /dev/md0 to /dev/null to provide the activity that 
> stimulates the failure.
> 
> I suspect that something is going wrong in the retry of the failed I/O 
> operations, but I'm really not familiar with any of this area of the 
> kernel at all.
> 
> In one failure, I get the following messages from kernel 2.6.9:
> 
> raid1: Disk failure on sdb1, disabling device.
> raid1: sdb1: rescheduling sector 176
> raid1: sda1: redirecting sector 176 to another mirror
> raid1: sdb1: rescheduling sector 184
> raid1: sda1: redirecting sector 184 to another mirror
> Incorrect number of segments after building list
> counted 2, received 1
> req nr_sec 0, cur_nr_sec 7

This should be fixed by this patch, can you test it?

===== drivers/block/ll_rw_blk.c 1.273 vs edited =====
--- 1.273/drivers/block/ll_rw_blk.c	2004-10-19 11:40:18 +02:00
+++ edited/drivers/block/ll_rw_blk.c	2004-10-20 17:06:12 +02:00
@@ -2766,22 +2767,36 @@
 {
 	struct bio *bio, *prevbio = NULL;
 	int nr_phys_segs, nr_hw_segs;
+	unsigned int phys_size, hw_size;
+	request_queue_t *q = rq->q;
 
 	if (!rq->bio)
 		return;
 
-	nr_phys_segs = nr_hw_segs = 0;
+	phys_size = hw_size = nr_phys_segs = nr_hw_segs = 0;
 	rq_for_each_bio(bio, rq) {
 		/* Force bio hw/phys segs to be recalculated. */
 		bio->bi_flags &= ~(1 << BIO_SEG_VALID);
 
-		nr_phys_segs += bio_phys_segments(rq->q, bio);
-		nr_hw_segs += bio_hw_segments(rq->q, bio);
+		nr_phys_segs += bio_phys_segments(q, bio);
+		nr_hw_segs += bio_hw_segments(q, bio);
 		if (prevbio) {
-			if (blk_phys_contig_segment(rq->q, prevbio, bio))
+			int pseg = phys_size + prevbio->bi_size + bio->bi_size;
+			int hseg = hw_size + prevbio->bi_size + bio->bi_size;
+
+			if (blk_phys_contig_segment(q, prevbio, bio) &&
+			    pseg <= q->max_segment_size) {
 				nr_phys_segs--;
-			if (blk_hw_contig_segment(rq->q, prevbio, bio))
+				phys_size += prevbio->bi_size + bio->bi_size;
+			} else
+				phys_size = 0;
+
+			if (blk_hw_contig_segment(q, prevbio, bio) &&
+			    hseg <= q->max_segment_size) {
 				nr_hw_segs--;
+				hw_size += prevbio->bi_size + bio->bi_size;
+			} else
+				hw_size = 0;
 		}
 		prevbio = bio;
 	}

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21  8:45 ` Jens Axboe
@ 2004-10-21 13:52   ` Paul Clements
  2004-10-21 13:55     ` Jens Axboe
  2004-10-21 16:31   ` Mark Rustad
  1 sibling, 1 reply; 13+ messages in thread
From: Paul Clements @ 2004-10-21 13:52 UTC (permalink / raw)
  To: Mark Rustad; +Cc: Jens Axboe, linux-raid, linux-scsi

Jens Axboe wrote:
> On Wed, Oct 20 2004, Mark Rustad wrote:
> 
>>Folks,
>>
>>I have been having trouble with kernel crashes resulting from RAID1 
>>component device failures. I have been testing the robustness of an 
>>embedded system and have been using a drive that is known to fail after 
>>a time under load. When this device returns a media error, I always 
>>wind up with either a kernel hang or reboot. In this environment, each 
>>drive has four partitions, each of which is part of a RAID1 with its 
>>partner on the other device. Swap is on md2 so even it should be 
>>robust.
>>
>>I have gotten this result with the SuSE standard i386 smp kernels 
>>2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
>>kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.
>>
>>The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
>>controller with two Seagate drives on a SAF-TE bus. I run three or four 
>>dd commands copying /dev/md0 to /dev/null to provide the activity that 
>>stimulates the failure.
>>
>>I suspect that something is going wrong in the retry of the failed I/O 
>>operations, but I'm really not familiar with any of this area of the 
>>kernel at all.
>>
>>In one failure, I get the following messages from kernel 2.6.9:
>>
>>raid1: Disk failure on sdb1, disabling device.
>>raid1: sdb1: rescheduling sector 176
>>raid1: sda1: redirecting sector 176 to another mirror
>>raid1: sdb1: rescheduling sector 184
>>raid1: sda1: redirecting sector 184 to another mirror
>>Incorrect number of segments after building list
>>counted 2, received 1
>>req nr_sec 0, cur_nr_sec 7
> 
> 
> This should be fixed by this patch, can you test it?

There may well be two problems here, but the original problem you're 
seeing (infinite read retries, and failures) is due to a bug in raid1. 
Basically the bio handling on read error retry was not quite right. Neil 
Brown just posted the patch to correct this a couple of days ago:

http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2

Please try that. (If you need a patch that applies to SUSE 2.6.5, I also 
have a version of the patch which should apply to that).

Please be aware that there are several other bugs in the SUSE 2.6.5-7.97 
kernel in md and raid1 (basically it's a matter of that kernel being 
somewhat behind mainline, where most of these bugs are now fixed). I've 
sent several patches to SUSE to fix these issues, that hopefully will 
get into their SP1 release that should be forthcoming soon...

--
Paul

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21 13:52   ` Paul Clements
@ 2004-10-21 13:55     ` Jens Axboe
  2004-10-21 14:01       ` Paul Clements
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-10-21 13:55 UTC (permalink / raw)
  To: Paul Clements; +Cc: Mark Rustad, linux-raid, linux-scsi

On Thu, Oct 21 2004, Paul Clements wrote:
> Jens Axboe wrote:
> >On Wed, Oct 20 2004, Mark Rustad wrote:
> >
> >>Folks,
> >>
> >>I have been having trouble with kernel crashes resulting from RAID1 
> >>component device failures. I have been testing the robustness of an 
> >>embedded system and have been using a drive that is known to fail after 
> >>a time under load. When this device returns a media error, I always 
> >>wind up with either a kernel hang or reboot. In this environment, each 
> >>drive has four partitions, each of which is part of a RAID1 with its 
> >>partner on the other device. Swap is on md2 so even it should be 
> >>robust.
> >>
> >>I have gotten this result with the SuSE standard i386 smp kernels 
> >>2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
> >>kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.
> >>
> >>The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
> >>controller with two Seagate drives on a SAF-TE bus. I run three or four 
> >>dd commands copying /dev/md0 to /dev/null to provide the activity that 
> >>stimulates the failure.
> >>
> >>I suspect that something is going wrong in the retry of the failed I/O 
> >>operations, but I'm really not familiar with any of this area of the 
> >>kernel at all.
> >>
> >>In one failure, I get the following messages from kernel 2.6.9:
> >>
> >>raid1: Disk failure on sdb1, disabling device.
> >>raid1: sdb1: rescheduling sector 176
> >>raid1: sda1: redirecting sector 176 to another mirror
> >>raid1: sdb1: rescheduling sector 184
> >>raid1: sda1: redirecting sector 184 to another mirror
> >>Incorrect number of segments after building list
> >>counted 2, received 1
> >>req nr_sec 0, cur_nr_sec 7
> >
> >
> >This should be fixed by this patch, can you test it?
> 
> There may well be two problems here, but the original problem you're 
> seeing (infinite read retries, and failures) is due to a bug in raid1. 
> Basically the bio handling on read error retry was not quite right. Neil 
> Brown just posted the patch to correct this a couple of days ago:
> 
> http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2
> 
> Please try that. (If you need a patch that applies to SUSE 2.6.5, I also 
> have a version of the patch which should apply to that).

Is 2.6.9 not uptodate wrt those raid1 patches?!

> Please be aware that there are several other bugs in the SUSE 2.6.5-7.97 
> kernel in md and raid1 (basically it's a matter of that kernel being 
> somewhat behind mainline, where most of these bugs are now fixed). I've 
> sent several patches to SUSE to fix these issues, that hopefully will 
> get into their SP1 release that should be forthcoming soon...

-97 is the release kernel, -111 is the current update kernel. And it has
those raid1 issues fixed already, at least the ones that are known. The
scsi segment issue is not, however.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21 13:55     ` Jens Axboe
@ 2004-10-21 14:01       ` Paul Clements
  2004-10-21 14:02         ` Jens Axboe
  0 siblings, 1 reply; 13+ messages in thread
From: Paul Clements @ 2004-10-21 14:01 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Mark Rustad, linux-raid, linux-scsi

Jens Axboe wrote:
> On Thu, Oct 21 2004, Paul Clements wrote:
> 
>>Jens Axboe wrote:
>>
>>>On Wed, Oct 20 2004, Mark Rustad wrote:
>>>
>>>
>>>>Folks,
>>>>
>>>>I have been having trouble with kernel crashes resulting from RAID1 
>>>>component device failures. I have been testing the robustness of an 
>>>>embedded system and have been using a drive that is known to fail after 
>>>>a time under load. When this device returns a media error, I always 
>>>>wind up with either a kernel hang or reboot. In this environment, each 
>>>>drive has four partitions, each of which is part of a RAID1 with its 
>>>>partner on the other device. Swap is on md2 so even it should be 
>>>>robust.
>>>>
>>>>I have gotten this result with the SuSE standard i386 smp kernels 
>>>>2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
>>>>kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.
>>>>
>>>>The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
>>>>controller with two Seagate drives on a SAF-TE bus. I run three or four 
>>>>dd commands copying /dev/md0 to /dev/null to provide the activity that 
>>>>stimulates the failure.
>>>>
>>>>I suspect that something is going wrong in the retry of the failed I/O 
>>>>operations, but I'm really not familiar with any of this area of the 
>>>>kernel at all.
>>>>
>>>>In one failure, I get the following messages from kernel 2.6.9:
>>>>
>>>>raid1: Disk failure on sdb1, disabling device.
>>>>raid1: sdb1: rescheduling sector 176
>>>>raid1: sda1: redirecting sector 176 to another mirror
>>>>raid1: sdb1: rescheduling sector 184
>>>>raid1: sda1: redirecting sector 184 to another mirror
>>>>Incorrect number of segments after building list
>>>>counted 2, received 1
>>>>req nr_sec 0, cur_nr_sec 7
>>>
>>>
>>>This should be fixed by this patch, can you test it?
>>
>>There may well be two problems here, but the original problem you're 
>>seeing (infinite read retries, and failures) is due to a bug in raid1. 
>>Basically the bio handling on read error retry was not quite right. Neil 
>>Brown just posted the patch to correct this a couple of days ago:
>>
>>http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2
>>
>>Please try that. (If you need a patch that applies to SUSE 2.6.5, I also 
>>have a version of the patch which should apply to that).
> 
> 
> Is 2.6.9 not uptodate wrt those raid1 patches?!

Unfortunately, no. This latest problem (the one he's reporting) is not 
fixed in mainline. I discovered the problem a month or so ago while 
testing with SLES 9. I posted a patch and Neil expanded on it (to 
include raid10, which is now in mainline, and also suffers from the same 
problem). Neil just posted the patch two days ago to linux-raid, so I 
expect it's in -mm now.

>>Please be aware that there are several other bugs in the SUSE 2.6.5-7.97 
>>kernel in md and raid1 (basically it's a matter of that kernel being 
>>somewhat behind mainline, where most of these bugs are now fixed). I've 
>>sent several patches to SUSE to fix these issues, that hopefully will 
>>get into their SP1 release that should be forthcoming soon...
> 
> 
> -97 is the release kernel, -111 is the current update kernel. And it has
> those raid1 issues fixed already, at least the ones that are known. The
> scsi segment issue is not, however.

Thanks. Good to know that. -111 is currently available to customers? We 
may recommend that our customers use that, rather than patching -97 
ourselves.

--
Paul



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21 14:01       ` Paul Clements
@ 2004-10-21 14:02         ` Jens Axboe
  2004-10-22 16:00           ` Mark Rustad
  0 siblings, 1 reply; 13+ messages in thread
From: Jens Axboe @ 2004-10-21 14:02 UTC (permalink / raw)
  To: Paul Clements; +Cc: Mark Rustad, linux-raid, linux-scsi

On Thu, Oct 21 2004, Paul Clements wrote:
> Jens Axboe wrote:
> >On Thu, Oct 21 2004, Paul Clements wrote:
> >
> >>Jens Axboe wrote:
> >>
> >>>On Wed, Oct 20 2004, Mark Rustad wrote:
> >>>
> >>>
> >>>>Folks,
> >>>>
> >>>>I have been having trouble with kernel crashes resulting from RAID1 
> >>>>component device failures. I have been testing the robustness of an 
> >>>>embedded system and have been using a drive that is known to fail after 
> >>>>a time under load. When this device returns a media error, I always 
> >>>>wind up with either a kernel hang or reboot. In this environment, each 
> >>>>drive has four partitions, each of which is part of a RAID1 with its 
> >>>>partner on the other device. Swap is on md2 so even it should be 
> >>>>robust.
> >>>>
> >>>>I have gotten this result with the SuSE standard i386 smp kernels 
> >>>>2.6.5-7.97 and 2.6.5-7.108. I also get these failures with the 
> >>>>kernel.org kernels 2.6.8.1, 2.6.9-rc4 and 2.6.9.
> >>>>
> >>>>The hardware setup is a two cpu Nacona with an Adaptec 7902 SCSI 
> >>>>controller with two Seagate drives on a SAF-TE bus. I run three or four 
> >>>>dd commands copying /dev/md0 to /dev/null to provide the activity that 
> >>>>stimulates the failure.
> >>>>
> >>>>I suspect that something is going wrong in the retry of the failed I/O 
> >>>>operations, but I'm really not familiar with any of this area of the 
> >>>>kernel at all.
> >>>>
> >>>>In one failure, I get the following messages from kernel 2.6.9:
> >>>>
> >>>>raid1: Disk failure on sdb1, disabling device.
> >>>>raid1: sdb1: rescheduling sector 176
> >>>>raid1: sda1: redirecting sector 176 to another mirror
> >>>>raid1: sdb1: rescheduling sector 184
> >>>>raid1: sda1: redirecting sector 184 to another mirror
> >>>>Incorrect number of segments after building list
> >>>>counted 2, received 1
> >>>>req nr_sec 0, cur_nr_sec 7
> >>>
> >>>
> >>>This should be fixed by this patch, can you test it?
> >>
> >>There may well be two problems here, but the original problem you're 
> >>seeing (infinite read retries, and failures) is due to a bug in raid1. 
> >>Basically the bio handling on read error retry was not quite right. Neil 
> >>Brown just posted the patch to correct this a couple of days ago:
> >>
> >>http://marc.theaimsgroup.com/?l=linux-raid&m=109824318202358&w=2
> >>
> >>Please try that. (If you need a patch that applies to SUSE 2.6.5, I also 
> >>have a version of the patch which should apply to that).
> >
> >
> >Is 2.6.9 not uptodate wrt those raid1 patches?!
> 
> Unfortunately, no. This latest problem (the one he's reporting) is not 
> fixed in mainline. I discovered the problem a month or so ago while 
> testing with SLES 9. I posted a patch and Neil expanded on it (to 
> include raid10, which is now in mainline, and also suffers from the same 
> problem). Neil just posted the patch two days ago to linux-raid, so I 
> expect it's in -mm now.

Irk, that's too bad. So we are now looking at probably a month before
mainline has a stable release with that fixed too :/

> >>Please be aware that there are several other bugs in the SUSE 2.6.5-7.97 
> >>kernel in md and raid1 (basically it's a matter of that kernel being 
> >>somewhat behind mainline, where most of these bugs are now fixed). I've 
> >>sent several patches to SUSE to fix these issues, that hopefully will 
> >>get into their SP1 release that should be forthcoming soon...
> >
> >
> >-97 is the release kernel, -111 is the current update kernel. And it has
> >those raid1 issues fixed already, at least the ones that are known. The
> >scsi segment issue is not, however.
> 
> Thanks. Good to know that. -111 is currently available to customers? We 
> may recommend that our customers use that, rather than patching -97 
> ourselves.

Yes it is, it's generally available through the online updates.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21  8:45 ` Jens Axboe
  2004-10-21 13:52   ` Paul Clements
@ 2004-10-21 16:31   ` Mark Rustad
  1 sibling, 0 replies; 13+ messages in thread
From: Mark Rustad @ 2004-10-21 16:31 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-raid

Jens,

On Oct 21, 2004, at 3:45 AM, Jens Axboe wrote:

> On Wed, Oct 20 2004, Mark Rustad wrote:
>> Folks,
>>
>> I have been having trouble with kernel crashes resulting from RAID1
>> component device failures. I have been testing the robustness of an
>> embedded system and have been using a drive that is known to fail 
>> after
>> a time under load. When this device returns a media error, I always
>> wind up with either a kernel hang or reboot. In this environment, each
>> drive has four partitions, each of which is part of a RAID1 with its
>> partner on the other device. Swap is on md2 so even it should be
>> robust.

<snip>

> This should be fixed by this patch, can you test it?
>
> ===== drivers/block/ll_rw_blk.c 1.273 vs edited =====
> --- 1.273/drivers/block/ll_rw_blk.c 2004-10-19 11:40:18 +02:00
> +++ edited/drivers/block/ll_rw_blk.c    2004-10-20 17:06:12 +02:00

<snip>

I applied this patch and the raid1/raid10 patch referenced in another 
message. I had to mess with this patch a bit to get it to apply, but 
because there was such good context, I know that I got the correct end 
result. The raid1/raid10 patch applied cleanly unchanged. Unfortunately 
I still get the oops. As I was looking at this I realized that I am 
running with elevator=cfq simply because that is how SuSE sets things 
up (just in case that has some bearing on things).

Because of the differences in the patch compared to the 2.6.9 base I 
was applying it to, I wonder if there are other changes required. 
Anyway, here is the oops that I now get:

ksymoops 2.4.9 on i686 2.6.5-7.97-bigsmp.  Options used
      -v vmlinux (specified)
      -K (specified)
      -L (specified)
      -O (specified)
      -m System.map (specified)

kernel BUG at /usr/src/linux-2.6.9/fs/buffer.c:614!
invalid operand: 0000 [#1]
CPU:    1
EIP:    0060:[<c014faf9>]    Not tainted VLI
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010246   (2.6.9-3d-1)
eax: 00000019   ebx: c0b24adc   ecx: c0b24adc   edx: 00000001
esi: 00000001   edi: 00000000   ebp: 00000001   esp: df9f7cc8
ds: 007b   es: 007b   ss: 0068
Stack: ded502c0 c0152128 00000000 00000001 c015214b ded502c0 c0153338 
00000000
        00000000 f7cb65b0 df9f7d14 f7cd1240 f7cd1240 df8ada00 c02f2738 
df9b5300
        f7cd1240 00000001 df8ada00 00000001 c02f2815 c1814220 deced450 
df5c7be4
Call Trace:
  [<c0152128>] end_bio_bh_io_sync+0x0/0x3b
  [<c015214b>] end_bio_bh_io_sync+0x23/0x3b
  [<c0153338>] bio_endio+0x3b/0x65
  [<c02f2738>] raid_end_bio_io+0x22/0xb8
  [<c02f2815>] raid1_end_read_request+0x47/0xcb
  [<c011bb08>] try_to_wake_up+0x1f4/0x273
  [<c02f27ce>] raid1_end_read_request+0x0/0xcb
  [<c0153338>] bio_endio+0x3b/0x65
  [<c0279dd4>] __end_that_request_first+0xe3/0x22d
  [<c011d280>] __wake_up_common+0x35/0x58
  [<c02ac212>] scsi_end_request+0x1b/0xa6
  [<c02ac56d>] scsi_io_completion+0x16a/0x4a3
  [<c0136257>] mempool_alloc+0x66/0x121
  [<c02a851e>] scsi_finish_command+0x7d/0xd1
  [<c02a846d>] scsi_softirq+0xbf/0xcd
  [<c0124342>] __do_softirq+0x62/0xcd
  [<c01243da>] do_softirq+0x2d/0x35
  [<c0108b38>] do_IRQ+0x112/0x129
  [<c0106cc0>] common_interrupt+0x18/0x20
  [<c027007b>] uart_block_til_ready+0x18e/0x193
  [<c0279627>] __make_request+0x244/0x4ac
  [<c027994e>] generic_make_request+0xbf/0x16c
  [<c011d2d5>] __wake_up+0x32/0x43
  [<c02f2ab5>] read_balance+0x16b/0x181
  [<c0120c64>] __printk_ratelimit+0x8a/0xa5
  [<c02f3ab6>] raid1d+0x113/0x18e
  [<c02f85ac>] md_thread+0x174/0x19a
  [<c011e5b9>] autoremove_wake_function+0x0/0x37
  [<c011e5b9>] autoremove_wake_function+0x0/0x37
  [<c02f8438>] md_thread+0x0/0x19a
  [<c01047fd>] kernel_thread_helper+0x5/0xb
Code: ff f0 0f ba 2f 01 eb a0 8b 02 a8 04 74 2a 5b 89 ea b8 f4 28 3e c0 
5e 5f 5d


 >>EIP; c014faf9 <__find_get_block_slow+112/128>   <=====

 >>ebx; c0b24adc <pg0+593adc/3fa6d400>
 >>ecx; c0b24adc <pg0+593adc/3fa6d400>
 >>esp; df9f7cc8 <pg0+1f466cc8/3fa6d400>

Trace; c0152128 <block_write_full_page+8/fa>
Trace; c015214b <block_write_full_page+2b/fa>
Trace; c0153338 <bio_dirty_fn+35/4d>
Trace; c02f2738 <r1buf_pool_alloc+6b/11d>
Trace; c02f2815 <r1buf_pool_free+2b/72>
Trace; c011bb08 <try_to_wake_up+a4/273>
Trace; c02f27ce <r1buf_pool_alloc+101/11d>
Trace; c0153338 <bio_dirty_fn+35/4d>
Trace; c0279dd4 <blk_recalc_rq_segments+10b/154>
Trace; c011d280 <scheduler_tick+343/452>
Trace; c02ac212 <scsi_single_lun_run+35/ce>
Trace; c02ac56d <scsi_release_buffers+d/83>
Trace; c0136257 <mempool_resize+b7/158>
Trace; c02a851e <scsi_init_cmd_from_req+159/15e>
Trace; c02a846d <scsi_init_cmd_from_req+a8/15e>
Trace; c0124342 <sys_adjtimex+2/4e>
Trace; c01243da <getnstimeofday+b/22>
Trace; c0108b38 <do_IRQ+112/198>
Trace; c0106cc0 <common_interrupt+18/20>
Trace; c027007b <uart_block_til_ready+6e/193>
Trace; c0279627 <__make_request+124/4ac>
Trace; c027994e <__make_request+44b/4ac>
Trace; c011d2d5 <scheduler_tick+398/452>
Trace; c02f2ab5 <raid1_end_write_request+3c/b1>
Trace; c0120c64 <unregister_console+3/85>
Trace; c02f3ab6 <sync_request_write+17e/24b>
Trace; c02f85ac <md_open+3/5d>
Trace; c011e5b9 <add_wait_queue+27/30>
Trace; c011e5b9 <add_wait_queue+27/30>
Trace; c02f8438 <md_ioctl+558/6c9>
Trace; c01047fd <kernel_thread_helper+5/b>

Code;  c014faf9 <__find_get_block_slow+112/128>
00000000 <_EIP>:
Code;  c014faf9 <__find_get_block_slow+112/128>   <=====
    0:   ff f0                     push   %eax   <=====
Code;  c014fafb <__find_get_block_slow+114/128>
    2:   0f ba 2f 01               btsl   $0x1,(%edi)
Code;  c014faff <__find_get_block_slow+118/128>
    6:   eb a0                     jmp    ffffffa8 <_EIP+0xffffffa8>
Code;  c014fb01 <__find_get_block_slow+11a/128>
    8:   8b 02                     mov    (%edx),%eax
Code;  c014fb03 <__find_get_block_slow+11c/128>
    a:   a8 04                     test   $0x4,%al
Code;  c014fb05 <__find_get_block_slow+11e/128>
    c:   74 2a                     je     38 <_EIP+0x38>
Code;  c014fb07 <__find_get_block_slow+120/128>
    e:   5b                        pop    %ebx
Code;  c014fb08 <__find_get_block_slow+121/128>
    f:   89 ea                     mov    %ebp,%edx
Code;  c014fb0a <__find_get_block_slow+123/128>
   11:   b8 f4 28 3e c0            mov    $0xc03e28f4,%eax
Code;  c014fb0f <invalidate_bdev+0/17>
   16:   5e                        pop    %esi
Code;  c014fb10 <invalidate_bdev+1/17>
   17:   5f                        pop    %edi
Code;  c014fb11 <invalidate_bdev+2/17>
   18:   5d                        pop    %ebp

  <0>Kernel panic - not syncing: Fatal exception in interrupt

-- 
Mark Rustad, MRustad@aol.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-21 14:02         ` Jens Axboe
@ 2004-10-22 16:00           ` Mark Rustad
  2004-10-28 19:35             ` Mark Rustad
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rustad @ 2004-10-22 16:00 UTC (permalink / raw)
  To: Jens Axboe; +Cc: linux-raid, Paul Clements, linux-scsi

Jens,

On Oct 21, 2004, at 9:02 AM, Jens Axboe wrote:

>>> -97 is the release kernel, -111 is the current update kernel. And it 
>>> has
>>> those raid1 issues fixed already, at least the ones that are known. 
>>> The
>>> scsi segment issue is not, however.
>>
>> Thanks. Good to know that. -111 is currently available to customers? 
>> We
>> may recommend that our customers use that, rather than patching -97
>> ourselves.
>
> Yes it is, it's generally available through the online updates.

FWIW, I tried the -111 kernel and got a crash with my failing drive. 
The messages out of the kernel were:

raid1: Disk failure on sdb1, disabling device.
raid1: sdb1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
raid1: sdb1: rescheduling sector 184
raid1: sda1: redirecting sector 184 to another mirror
Oct 22 10:42:03 linux kernel: scsi0: ERROR on channel 0, id 5, lun 0, 
CDB: Read (10) 00 00 00 00 bf 00 01 00 00
Oct 22 10:42:03 linux kernel: Info fld=0xf3, Current sdb: sense key 
Medium Error
Oct 22 10:42:03 linux kernel: Additional sense: Unrecovered read error
Oct 22 10:42:03 linux kernel: end_request: I/O error, dev sdb, sector 
240
Unable to handle kernel NULL pointer dereference at virtual address 
00000000
  printing eip:
*pde = 00000000
Oops: 0000 [#1]
SMP
CPU:    0
EIP:    0060:[<c01559a4>]    Tainted: G  U
EFLAGS: 00010286   (2.6.5-7.111-smp)
EIP is at page_address+0x14/0xc0
eax: 00000000   ebx: 00000000   ecx: d0e50ac0   edx: f782a970
esi: f7d7cd00   edi: 00000001   ebp: 00000008   esp: f7e65e90
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00000000 f7d7cd00 00000001 00000008 c0249501 c0127b7a 00000001 
d0e50ac0
        00000000 00000e00 c0249bee c035b0f4 f7eb5e8c 000000ef 00000000 
00000001
        fffffffb 00000e00 00000007 f7d7cd00 f7d7cd00 f71cce00 00000000 
f7def200
Call Trace:
  [<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
  [<c0127b7a>] printk+0x18a/0x1a0
  [<c0249bee>] __end_that_request_first+0x1be/0x240
  [<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
  [<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
  [<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
  [<c0127b7a>] printk+0x18a/0x1a0
  [<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
  [<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
  [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 8b 00 f6 c4 01 75 26 a1 0c fb 47 c0 29 c3 c1 fb 05 c1 e3 0c
  <1>Unable to handle kernel NULL pointer dereference at virtual address 
00000000
  printing eip:
f88584be
*pde = 00000000
Oops: 0002 [#2]
SMP
CPU:    0
EIP:    0060:[<f88584be>]    Tainted: G  U
EFLAGS: 00010046   (2.6.5-7.111-smp)
EIP is at dump_block_silence+0x1e/0xc0 [dump_blockdev]
eax: 00000000   ebx: f7d86c00   ecx: f8875810   edx: 00000000
esi: f8859740   edi: f7e65e5c   ebp: 00000000   esp: f7e65d28
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00000000 00000000 00000000 00000000 00000000 00000000 f8870ae9 
00000000
        00000000 00000000 f8870c49 00000000 00000000 00000000 f8870d05 
00000000
        c0358f00 00000202 f886f852 ffffffef c010aed3 00000000 c010af28 
c03552c0
Call Trace:
  [<f8870ae9>] dump_begin+0x59/0xd0 [dump]
  [<f8870c49>] dump_execute_savedump+0x9/0x50 [dump]
  [<f8870d05>] dump_generic_execute+0x75/0x80 [dump]
  [<f886f852>] dump_execute+0x52/0xa0 [dump]
  [<c010aed3>] die+0x133/0x1b0
  [<c010af28>] die+0x188/0x1b0
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c011df81>] do_page_fault+0x341/0x54d
  [<f88c9c20>] ahd_linux_queue_cmd_complete+0xe0/0x2a0 [aic79xx]
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c010a28d>] error_code+0x2d/0x40
  [<c01559a4>] page_address+0x14/0xc0
  [<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
  [<c0127b7a>] printk+0x18a/0x1a0
  [<c0249bee>] __end_that_request_first+0x1be/0x240
  [<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
  [<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
  [<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
  [<c0127b7a>] printk+0x18a/0x1a0
  [<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
  [<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
  [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 86 02 84 c0 ba f0 ff ff ff 7f 0e 8b 5c 24 10 89 d0 8b 74 24
  <6>LKCD dump already in progress
------------[ cut here ]------------
kernel BUG at kernel/exit.c:833!
invalid operand: 0000 [#3]
SMP
CPU:    0
EIP:    0060:[<c012a108>]    Tainted: G  U
EFLAGS: 00010282   (2.6.5-7.111-smp)
EIP is at do_exit+0x968/0xb60
eax: 00000001   ebx: 00000000   ecx: 00000000   edx: 00000001
esi: f7fa17c0   edi: f7e1acb0   ebp: f7fa17c0   esp: f7e65bd8
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_0 (pid: 220, threadinfo=f7e64000 task=f7e1acb0)
Stack: 00017e5a 00000282 f7e65cf4 c0431a41 00000246 f7e1ad08 00000002 
f7e1ad48
        f7e65c10 00000202 00000002 f7e1ad08 f7e64000 00000002 f7e65cf4 
00000002
        c010af50 0000000b c034405a 00000002 00000002 f7e1acb0 c034405a 
00000000
Call Trace:
  [<c010af50>] do_simd_coprocessor_error+0x0/0xb0
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c011df81>] do_page_fault+0x341/0x54d
  [<f886fdfe>] dump_lcrash_save_context+0x2e/0x60 [dump]
  [<c0119fa1>] dump_send_ipi+0x11/0x20
  [<f88710e4>] __dump_save_other_cpus+0xb4/0xe0 [dump]
  [<f88700ce>] dump_lcrash_configure_header+0x29e/0x2c0 [dump]
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c010a28d>] error_code+0x2d/0x40
  [<f88584be>] dump_block_silence+0x1e/0xc0 [dump_blockdev]
  [<f8870ae9>] dump_begin+0x59/0xd0 [dump]
  [<f8870c49>] dump_execute_savedump+0x9/0x50 [dump]
  [<f8870d05>] dump_generic_execute+0x75/0x80 [dump]
  [<f886f852>] dump_execute+0x52/0xa0 [dump]
  [<c010aed3>] die+0x133/0x1b0
  [<c010af28>] die+0x188/0x1b0
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c011df81>] do_page_fault+0x341/0x54d
  [<f88c9c20>] ahd_linux_queue_cmd_complete+0xe0/0x2a0 [aic79xx]
  [<c011dc40>] do_page_fault+0x0/0x54d
  [<c010a28d>] error_code+0x2d/0x40
  [<c01559a4>] page_address+0x14/0xc0
  [<c0249501>] blk_recalc_rq_sectors+0xa1/0x110
  [<c0127b7a>] printk+0x18a/0x1a0
  [<c0249bee>] __end_that_request_first+0x1be/0x240
  [<f883fb99>] scsi_end_request+0x29/0xe0 [scsi_mod]
  [<f883ff74>] scsi_io_completion+0x324/0x4c0 [scsi_mod]
  [<f883a3b2>] scsi_finish_command+0x82/0xf0 [scsi_mod]
  [<c0127b7a>] printk+0x18a/0x1a0
  [<f883e687>] scsi_error_handler+0x987/0xed0 [scsi_mod]
  [<f883dd00>] scsi_error_handler+0x0/0xed0 [scsi_mod]
  [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 0f 0b 41 03 37 43 34 c0 eb fe 8b 6f 10 85 ed 74 ac eb 9b 8b
  <6>LKCD dump already in progress

*** everything beyond removed, because cpu 0 continued to fault over 
and over

-- 
Mark Rustad, MRustad@aol.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-22 16:00           ` Mark Rustad
@ 2004-10-28 19:35             ` Mark Rustad
  2004-11-04 18:56               ` Mark Rustad
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rustad @ 2004-10-28 19:35 UTC (permalink / raw)
  To: linux-raid

I've still been trying to resolve this problem. I have reproduced the 
kernel crash on a RAID1 drive error on kernels all the way up to 
2.6.10-rc1-bk6. Seeing a patch to fix a problem in the sg device, I 
turned off sgraidmon as well as mdadmd to have less going on in my 
environment and took the SCSI generic driver out of the kernel. I also 
turned on frame pointers in the kernel, just to aid in walking the 
stack.

With those changes I still get a kernel crash on a drive error. In this 
case I got the following:

raid1: Disk failure on sdb1, disabling device.
raid1: sdb1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
Oct 28 10:18:23 xio3d-x3C kernel: SCSI error : <0 0 5 0> return code = 
0x8000002
Oct 28 10:18:23 xio3d-x3C kernel: Info fld=0xf3, Current sdb: sense key 
Medium Error
Oct 28 10:18:23 xio3d-x3C kernel: Additional sense: Unrecovered read 
error
Oct 28 10:18:23 xio3d-x3C kernel: end_request: I/O error, dev sdb, 
sector 240
------------[ cut here ]------------
kernel BUG at /usr/src/linux-2.6.10-rc1-bk6/drivers/scsi/scsi_lib.c:572!
invalid operand: 0000 [#1]
SMP
CPU:    0
EIP:    0060:[<c02ca8fb>]    Not tainted VLI
EFLAGS: 00010046   (2.6.10-rc1-3d-x)
EIP is at scsi_alloc_sgtable+0xe0/0xed
eax: 00000000   ebx: df45899c   ecx: 00000000   edx: ded49e00
esi: ded49e00   edi: ded49e00   ebp: f7cd3d98   esp: f7cd3d80
ds: 007b   es: 007b   ss: 0068
Process scsi_eh_0 (pid: 18, threadinfo=f7cd2000 task=f7ca2a20)
Stack: c0582b94 c05720c0 00000000 df45899c ded49e00 ded49e00 f7cd3db8 
c02caf3a
        ded49e00 00000020 00010000 ded49e00 ded49e00 df45899c f7cd3de0 
c02cb0ee
        ded49e00 00000001 00000000 00000000 f7cacc00 df45899c f7cacc00 
f7c44030
Call Trace:
  [<c0107a93>] show_stack+0xaf/0xb7
  [<c0107c13>] show_registers+0x158/0x1cd
  [<c0107e0f>] die+0xfa/0x182
  [<c0108325>] do_invalid_op+0x108/0x10a
  [<c01076ed>] error_code+0x2d/0x38
  [<c02caf3a>] scsi_init_io+0x62/0x123
  [<c02cb0ee>] scsi_prep_fn+0xae/0x205
  [<c02932ae>] elv_next_request+0x65/0xf7
  [<c02cb465>] scsi_request_fn+0x220/0x429
  [<c02959d5>] blk_insert_request+0xaf/0xe3
  [<c02ca6cd>] scsi_requeue_command+0x40/0x4d
  [<c02ca7b0>] scsi_end_request+0x6c/0xd7
  [<c02cac20>] scsi_io_completion+0x278/0x530
  [<c02faf37>] sd_rw_intr+0x84/0x2ab
  [<c02c5ebe>] scsi_finish_command+0x83/0xe4
  [<c02c99b8>] scsi_eh_flush_done_q+0xb0/0x105
  [<c02c9aa2>] scsi_unjam_host+0x95/0x1eb
  [<c02c9cc8>] scsi_error_handler+0xd0/0x172
  [<c0104ff5>] kernel_thread_helper+0x5/0xb
Code: a0 00 00 00 02 00 eb 8b 66 c7 82 a0 00 00 00 03 00 eb 80 66 c7 82 
a0 00 00
  00 00 00 e9 72 ff ff ff 31 c0 83 c4 0c 5b 5e 5f 5d c3 <0f> 0b 3c 02 f0 
ab 3f c0
  e9 2f ff ff ff 55 89 e5 8b 45 0c 8b 55

The BUG is caused by the nr_phys_segments field being zero, resulting 
in an attempt to have a sg list with no elements. Since I have been 
stimulating the drive failure with "dd of=/dev/null bs=512 
if=/dev/md0", I got to wondering if it was the single-sector reads that 
contribute to the problem, so I changed the bs=512 to bs=4096. Then I 
got the following crash:

raid1: Disk failure on sdb1, disabling device.
raid1: sdb1: rescheduling sector 176
raid1: sda1: redirecting sector 176 to another mirror
Oct 28 13:30:11 xio3d-x3C kernel: SCSI error : <0 0 5 0> return code = 
0x8000002
Oct 28 13:30:11 xio3d-x3C kernel: Info fld=0xf3, Current sdb: sense key 
Medium Error
Oct 28 13:30:11 xio3d-x3C kernel: Additional sense: Unrecovered read 
error
Oct 28 13:30:11 xio3d-x3C kernel: end_request: I/O error, dev sdb, 
sector 240
Unable to handle kernel NULL pointer dereference at virtual address 
00000000 printing eip:
*pde = 00000000
Oops: 0000 [#1]
SMP
CPU:    0
EIP:    0060:[<c0144cfc>]    Not tainted VLI
EFLAGS: 00010292   (2.6.10-rc1-3d-x)
EIP is at page_address+0xc/0x93
eax: 00000000   ebx: 00000000   ecx: c1936c10   edx: c1936c10
esi: c6f11100   edi: 00000000   ebp: c04f7e1c   esp: c04f7e04
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c04f6000 task=c040db40)
Stack: fffffffb 00000000 c04f7e2c ded86d70 c6f11100 00000000 c04f7e2c 
c0296c60
        00000000 00000e00 c04f7e64 c0296e36 ded86d70 00000007 fffffffb 
00000000
        c6f11100 00000001 fffffffb 00000e00 00000007 ded86d70 decbee00 
f7c448b8
Call Trace:
  [<c0107a93>] show_stack+0xaf/0xb7
  [<c0107c13>] show_registers+0x158/0x1cd
  [<c0107e0f>] die+0xfa/0x182
  [<c0118f05>] do_page_fault+0x432/0x5fe
  [<c01076ed>] error_code+0x2d/0x38
  [<c0296c60>] blk_recalc_rq_sectors+0x87/0xa0
  [<c0296e36>] __end_that_request_first+0x1bd/0x23d
  [<c02ca772>] scsi_end_request+0x2e/0xd7
  [<c02cac20>] scsi_io_completion+0x278/0x530
  [<c02faf37>] sd_rw_intr+0x84/0x2ab
  [<c02c5ebe>] scsi_finish_command+0x83/0xe4
  [<c02c5dee>] scsi_softirq+0xd3/0xe7
  [<c01269f5>] __do_softirq+0x65/0xd3
  [<c0126a94>] do_softirq+0x31/0x33
  [<c0109090>] do_IRQ+0x2c/0x33
  [<c01075d0>] common_interrupt+0x18/0x20
  [<c0104e23>] cpu_idle+0x31/0x40
  [<c04f8ada>] start_kernel+0x19b/0x20b
  [<c0100211>] 0xc0100211
Code: da 01 74 ec 3c c0 eb d8 55 89 e5 69 45 08 01 00 37 9e 5d c1 e8 19 
c1 e0 07
  05 00 ed 58 c0 c3 55 89 e5 57 56 53 83 ec 0c 8b 5d 08 <8b> 03 f6 c4 01 
75 1a 2b
  1d 10 6d 59 c0 c1 fb 05 c1 e3 0c 8d 83
  <0>Kernel panic - not syncing: Fatal exception in interrupt
  <0>Rebooting in 5 seconds..

In this case, there was about 30 seconds between the time that the disk 
error was reported and the kernel finally crashed.

Right now I am wondering if this comment in scsi_lib.c could have 
something to do with this problem...
                 /*
                  * XXX: Following is probably broken since deferred 
errors
                  *      fall through [dpg 20040827]
                  */

-- 
Mark Rustad, MRustad@aol.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-10-28 19:35             ` Mark Rustad
@ 2004-11-04 18:56               ` Mark Rustad
  2004-11-16 15:51                 ` Lars Marowsky-Bree
  0 siblings, 1 reply; 13+ messages in thread
From: Mark Rustad @ 2004-11-04 18:56 UTC (permalink / raw)
  To: linux-raid, linux-scsi

I came up with a patch that seems to fix the kernel crashing on a RAID1 
device failure problem. I do not have confidence that it really is the 
correct fix, but I'm pretty sure that it is touching the right area. 
I'm sure someone more familiar with this area of the kernel will 
recognize from this what the correct fix is, if this is not it. 
Submitted for your consideration:

--- a/drivers/block/ll_rw_blk.c 2004-11-01 14:28:48.000000000 -0600
+++ b/drivers/block/ll_rw_blk.c 2004-11-01 14:29:13.000000000 -0600
@@ -2865,10 +2865,7 @@
       * if the request wasn't completed, update state
       */
      if (bio_nbytes) {
          bio_endio(bio, bio_nbytes, error);
-       bio->bi_idx += next_idx;
-       bio_iovec(bio)->bv_offset += nr_bytes;
-       bio_iovec(bio)->bv_len -= nr_bytes;
      }

      blk_recalc_rq_sectors(req, total_bytes >> 9);

With this applied, my kernel does not crash on media errors on one of 
the devices and just keeps on running on the other device. In my case, 
the code just above these lines was taking the path that was walking 
through the bio_iovec array.

-- 
Mark Rustad, MRustad@mac.com

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-11-04 18:56               ` Mark Rustad
@ 2004-11-16 15:51                 ` Lars Marowsky-Bree
  2004-11-16 16:40                   ` Mark Rustad
  0 siblings, 1 reply; 13+ messages in thread
From: Lars Marowsky-Bree @ 2004-11-16 15:51 UTC (permalink / raw)
  To: Mark Rustad, linux-raid, linux-scsi

On 2004-11-04T12:56:01, Mark Rustad <mrustad@mac.com> wrote:

> I came up with a patch that seems to fix the kernel crashing on a RAID1 
> device failure problem. I do not have confidence that it really is the 
> correct fix, but I'm pretty sure that it is touching the right area. 
> I'm sure someone more familiar with this area of the kernel will 
> recognize from this what the correct fix is, if this is not it. 
> Submitted for your consideration:

Hi Mark, checking back whether any better fix has since been proposed?


Sincerely,
    Lars Marowsky-Brée <lmb@suse.de>

-- 
High Availability & Clustering
SUSE Labs, Research and Development
SUSE LINUX Products GmbH - A Novell Business

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: PROBLEM: kernel crashes on RAID1 drive error
  2004-11-16 15:51                 ` Lars Marowsky-Bree
@ 2004-11-16 16:40                   ` Mark Rustad
  0 siblings, 0 replies; 13+ messages in thread
From: Mark Rustad @ 2004-11-16 16:40 UTC (permalink / raw)
  To: Lars Marowsky-Bree; +Cc: linux-raid, linux-scsi

Lars,

On Nov 16, 2004, at 9:51 AM, Lars Marowsky-Bree wrote:

> On 2004-11-04T12:56:01, Mark Rustad <mrustad@mac.com> wrote:
>
>> I came up with a patch that seems to fix the kernel crashing on a 
>> RAID1
>> device failure problem. I do not have confidence that it really is the
>> correct fix, but I'm pretty sure that it is touching the right area.
>> I'm sure someone more familiar with this area of the kernel will
>> recognize from this what the correct fix is, if this is not it.
>> Submitted for your consideration:
>
> Hi Mark, checking back whether any better fix has since been proposed?

My earlier attempt was clearly broken, though it did cure the kernel 
crashes. Below is the patch I am currently applying against ll_rw_blk.c 
in SuSE kernel 2.6.5-7.111. The last two hunks of the patch are 
relevant to this problem (the remainder of the patch should look 
familiar).

It looks to me like this problem was introduced in 2.6.2 by what was 
thought to be a simple rearrangement of code. I believe that that 
change did not take into account all of the ways that the loop could 
exit.

Unfortunately, the drive I had that would power up and work for 20 
minutes and then return media errors has now totally died, so I am not 
currently able to convince myself that this fix is really right. I am 
continuing to try to reproduce a similar event.

My current patch is below:

--- a/drivers/block/ll_rw_blk.c 2004-11-15 08:35:19.000000000 -0600
+++ a/drivers/block/ll_rw_blk.c 2004-11-15 08:41:37.000000000 -0600
@@ -2654,19 +2654,40 @@

  void blk_recalc_rq_segments(struct request *rq)
  {
-       struct bio *bio;
+       struct bio *bio, *prevbio = NULL;
         int nr_phys_segs, nr_hw_segs;
+       unsigned int phys_size, hw_size;
+       request_queue_t *q = rq->q;

         if (!rq->bio)
                 return;

-       nr_phys_segs = nr_hw_segs = 0;
+       phys_size = hw_size = nr_phys_segs = nr_hw_segs = 0;
         rq_for_each_bio(bio, rq) {
                 /* Force bio hw/phys segs to be recalculated. */
                 bio->bi_flags &= ~(1 << BIO_SEG_VALID);

-               nr_phys_segs += bio_phys_segments(rq->q, bio);
-               nr_hw_segs += bio_hw_segments(rq->q, bio);
+               nr_phys_segs += bio_phys_segments(q, bio);
+               nr_hw_segs += bio_hw_segments(q, bio);
+               if (prevbio) {
+                       int pseg = phys_size + prevbio->bi_size + 
bio->bi_size;
+                       int hseg = hw_size + prevbio->bi_size + 
bio->bi_size;
+
+                       if (blk_phys_contig_segment(q, prevbio, bio) &&
+                               pseg <= q->max_segment_size) {
+                               nr_phys_segs--;
+                               phys_size += prevbio->bi_size + 
bio->bi_size;
+                       } else
+                               phys_size = 0;
+
+                       if (blk_hw_contig_segment(q, prevbio, bio) &&
+                               hseg <= q->max_segment_size) {
+                               nr_hw_segs--;
+                               hw_size += prevbio->bi_size + 
bio->bi_size;
+                       } else
+                               hw_size = 0;
+               }
+               prevbio = bio;
         }

         rq->nr_phys_segments = nr_phys_segs;
@@ -2762,6 +2783,8 @@
                          * not a complete bvec done
                          */
                         if (unlikely(nbytes > nr_bytes)) {
+                               bio_iovec_idx(bio, idx)->bv_offset += 
nr_bytes;
+                               bio_iovec_idx(bio, idx)->bv_len -= 
nr_bytes;
                                 bio_nbytes += nr_bytes;
                                 total_bytes += nr_bytes;
                                 break;
@@ -2798,8 +2821,6 @@
         if (bio_nbytes) {
                 bio_endio(bio, bio_nbytes, error);
                 bio->bi_idx += next_idx;
-               bio_iovec(bio)->bv_offset += nr_bytes;
-               bio_iovec(bio)->bv_len -= nr_bytes;
         }

         blk_recalc_rq_sectors(req, total_bytes >> 9);

-- 
Mark Rustad, MRustad@mac.com


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Problem: kernel crashes on RAID1 drive error
@ 2004-12-28 12:00 bernd
  0 siblings, 0 replies; 13+ messages in thread
From: bernd @ 2004-12-28 12:00 UTC (permalink / raw)
  To: linux-raid

Hi raid gurus,

can anybody tell me if there is a reliable fix for this problem? Or if not
when there will be one available for SuSE 9.2? 

We would appreciate if the fix will be bundled with an online-update for
SuSE 9.2 systems. May be someone from the SuSE Labs can give us more 
information, too.

We just installed the latest online-update form SuSE on our 9.2 machines
which brought in kernel 2.6.8-24.10 but the problem still exists. 

With this serious bug RAID1 is pretty meaningless ...

Greetings Bernd Rieke

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2004-12-28 12:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-10-20 22:08 PROBLEM: kernel crashes on RAID1 drive error Mark Rustad
2004-10-21  8:45 ` Jens Axboe
2004-10-21 13:52   ` Paul Clements
2004-10-21 13:55     ` Jens Axboe
2004-10-21 14:01       ` Paul Clements
2004-10-21 14:02         ` Jens Axboe
2004-10-22 16:00           ` Mark Rustad
2004-10-28 19:35             ` Mark Rustad
2004-11-04 18:56               ` Mark Rustad
2004-11-16 15:51                 ` Lars Marowsky-Bree
2004-11-16 16:40                   ` Mark Rustad
2004-10-21 16:31   ` Mark Rustad
  -- strict thread matches above, loose matches on Subject: below --
2004-12-28 12:00 Problem: " bernd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).