Re: raid1d crash at boot

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: raid1d crash at boot
       [not found] <20111119134139.GA30570@rere.qmqm.pl>
@ 2011-11-21  1:37 ` NeilBrown
  2011-11-21  7:04   ` James Bottomley
  2012-01-07 12:53 ` Michał Mirosław
  1 sibling, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-11-21  1:37 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: linux-raid, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4215 bytes --]


Thank for the report.
However as this crash is clearly in the SCSI layer it makes sense to reported
it to linux-scsi - so I have cc:ed this reply there.

NeilBrown


On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> I get following BUG_ON tripped while booting, before rootfs is mounted by
> Debian's initrd. This started to happen for kernels since sometime
> during 3.1-rcX.
> 
> [    6.246170] ------------[ cut here ]------------
> [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> [    6.246347] invalid opcode: 0000 [#1] PREEMPT SMP
> [    6.246558] CPU 5
> [    6.246614] Modules linked in: usb_storage uas firewire_ohci firewire_core crc_itu_t xhci_hcd [last unloaded: scsi_wait_scan]
> [    6.247131]
> [    6.247194] Pid: 288, comm: md1_raid1 Not tainted 3.2.0-rc2mq+ #5 System manufacturer System Product Name/P8Z68-V PRO
> [    6.247422] RIP: 0010:[<ffffffff812443a1>]  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.247563] RSP: 0018:ffff8804140d1bd0  EFLAGS: 00010046
> [    6.247634] RAX: 0000000000000000 RBX: ffff88041d463800 RCX: 00000000ffffffff
> [    6.247710] RDX: 00000000ffffffff RSI: ffff8804142fd600 RDI: ffff88041d463800
> [    6.247785] RBP: ffff8804142fd600 R08: 00000000ffffffff R09: 0000000000017a00
> [    6.247861] R10: ffff88041d464000 R11: ffff88041d464000 R12: 0000000000000800
> [    6.247936] R13: 0000000000000001 R14: ffff88041d463800 R15: 0000000000000000
> [    6.248013] FS:  0000000000000000(0000) GS:ffff88042fb40000(0000) knlGS:0000000000000000
> [    6.248104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    6.248176] CR2: 000000000042b200 CR3: 0000000001605000 CR4: 00000000000406e0
> [    6.248252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.248328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    6.248404] Process md1_raid1 (pid: 288, threadinfo ffff8804140d0000, task ffff88041539a4c0)
> [    6.248495] Stack:
> [    6.248557]  0000000000000000 ffff8804142fd600 ffff8804142fd600 ffffffff8124a9be
> [    6.248819]  ffff8804142fe3a0 ffff8804142fd600 ffff88041d463848 ffffffff811a5d67
> [    6.249084]  ffff8804142fe3a0 ffff880415452400 ffff8804156f0000 00000000fffffa2b
> [    6.249346] Call Trace:
> [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
> [    6.249919]  [<ffffffff81031729>] ? get_parent_ip+0x9/0x1b
> [    6.249990]  [<ffffffff813a5c9e>] ? sub_preempt_count+0x83/0x94
> [    6.250060]  [<ffffffff813a202a>] ? schedule+0x73f/0x772
> [    6.250129]  [<ffffffff813a5d49>] ? add_preempt_count+0x9a/0x9c
> [    6.250199]  [<ffffffff813a330b>] ? _raw_spin_lock_irqsave+0x13/0x31
> [    6.250271]  [<ffffffff812a9bb4>] ? md_thread+0xfe/0x11c
> [    6.250340]  [<ffffffff8104f6c6>] ? add_wait_queue+0x3c/0x3c
> [    6.250410]  [<ffffffff812a9ab6>] ? signal_pending+0x17/0x17
> [    6.250479]  [<ffffffff8104f045>] ? kthread+0x76/0x7e
> [    6.250548]  [<ffffffff813a8c34>] ? kernel_thread_helper+0x4/0x10
> [    6.250618]  [<ffffffff8104efcf>] ? kthread_worker_fn+0x139/0x139
> [    6.250688]  [<ffffffff813a8c30>] ? gs_change+0xb/0xb
> [    6.250754] Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 50 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd d0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 b6 e9 ff ff 48 85 c0 48 89 c2 74 20
> [    6.253544] RIP  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.253658]  RSP <ffff8804140d1bd0>
> [    6.253722] ---[ end trace 533b0b5008dd7cee ]---
> [    6.253788] note: md1_raid1[288] exited with preempt_count 1
> 
> I'm attaching dmesg log I could catch with netcosole (there's some part missing).
> 
> Best Regards,
> Michał Mirosław


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  1:37 ` raid1d crash at boot NeilBrown
@ 2011-11-21  7:04   ` James Bottomley
  2011-11-21  8:27     ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: James Bottomley @ 2011-11-21  7:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Michał Mirosław, linux-raid, linux-scsi

On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> Thank for the report.
> However as this crash is clearly in the SCSI layer it makes sense to reported
> it to linux-scsi - so I have cc:ed this reply there.
> 
> NeilBrown
> 
> 
> On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
> 
> > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > Debian's initrd. This started to happen for kernels since sometime
> > during 3.1-rcX.
> > 
> > [    6.246170] ------------[ cut here ]------------
> > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!

I can tell you what it is:

        /*
         * Filesystem requests must transfer data.
         */
        BUG_ON(!req->nr_phys_segments);

But the fault is in the layer above SCSI.  It means something sent a
request with REQ_TYPE_FS but no actual data attached ... this is
supposed to be impossible, hence the bug on.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  7:04   ` James Bottomley
@ 2011-11-21  8:27     ` NeilBrown
  2011-11-22  0:50       ` Michał Mirosław
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-11-21  8:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Michał Mirosław, linux-raid, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
<James.Bottomley@HansenPartnership.com> wrote:

> On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > Thank for the report.
> > However as this crash is clearly in the SCSI layer it makes sense to reported
> > it to linux-scsi - so I have cc:ed this reply there.
> > 
> > NeilBrown
> > 
> > 
> > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > wrote:
> > 
> > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > Debian's initrd. This started to happen for kernels since sometime
> > > during 3.1-rcX.
> > > 
> > > [    6.246170] ------------[ cut here ]------------
> > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> 
> I can tell you what it is:
> 
>         /*
>          * Filesystem requests must transfer data.
>          */
>         BUG_ON(!req->nr_phys_segments);
> 
> But the fault is in the layer above SCSI.  It means something sent a
> request with REQ_TYPE_FS but no actual data attached ... this is
> supposed to be impossible, hence the bug on.
> 
> James
> 

Thanks.... that sounds strangely familiar, but I cannot be sure and google
doesn't help.

Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  8:27     ` NeilBrown
@ 2011-11-22  0:50       ` Michał Mirosław
  2011-11-22  1:26         ` NeilBrown
  0 siblings, 1 reply; 15+ messages in thread
From: Michał Mirosław @ 2011-11-22  0:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: James Bottomley, linux-raid, linux-scsi

On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> <James.Bottomley@HansenPartnership.com> wrote:
> > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > Thank for the report.
> > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > it to linux-scsi - so I have cc:ed this reply there.
> > > 
> > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > wrote:
> > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > Debian's initrd. This started to happen for kernels since sometime
> > > > during 3.1-rcX.
> > > > 
> > > > [    6.246170] ------------[ cut here ]------------
> > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > 
> > I can tell you what it is:
> > 
> >         /*
> >          * Filesystem requests must transfer data.
> >          */
> >         BUG_ON(!req->nr_phys_segments);
> > 
> > But the fault is in the layer above SCSI.  It means something sent a
> > request with REQ_TYPE_FS but no actual data attached ... this is
> > supposed to be impossible, hence the bug on.
> 
> Thanks.... that sounds strangely familiar, but I cannot be sure and google
> doesn't help.
> 
> Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?

The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
drives.  The boot doesn't survive to the point where the initrd script asks
for md-crypt's key password.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-22  0:50       ` Michał Mirosław
@ 2011-11-22  1:26         ` NeilBrown
  2011-11-22 12:03           ` Michał Mirosław
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2011-11-22  1:26 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > <James.Bottomley@HansenPartnership.com> wrote:
> > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > Thank for the report.
> > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > 
> > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > wrote:
> > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > during 3.1-rcX.
> > > > > 
> > > > > [    6.246170] ------------[ cut here ]------------
> > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > 
> > > I can tell you what it is:
> > > 
> > >         /*
> > >          * Filesystem requests must transfer data.
> > >          */
> > >         BUG_ON(!req->nr_phys_segments);
> > > 
> > > But the fault is in the layer above SCSI.  It means something sent a
> > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > supposed to be impossible, hence the bug on.
> > 
> > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > doesn't help.
> > 
> > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> 
> The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> drives.  The boot doesn't survive to the point where the initrd script asks
> for md-crypt's key password.
>

That gives us lots of room for pointing the finger of blame, doesn't it?
I think it is -> his problem. :-)

From the md part of the stack trace it looks most like a write request.  It
could be a retried read, but that is extremely unlike that early in boot.

So presumably it is some sort of zero-length REQ_FLUSH or something like that.
md/raid1 will just pass those unchanged down. 
My guess is that ext4 is generating this and something in the stack is
stripping the REQ_FLUSH .... though why it even tries before asking for a
password is beyond me.

Maybe someone of dm-devel can help?

If not we might need to try a debugging patch like this:


diff --git a/block/blk-core.c b/block/blk-core.c
index f43c8a5..59cb2ad 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
 			goto end_io;
 		}
 	}
-
+	WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
 	if ((bio->bi_rw & REQ_DISCARD) &&
 	    (!blk_queue_discard(q) ||
 	     ((bio->bi_rw & REQ_SECURE) &&


NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-22  1:26         ` NeilBrown
@ 2011-11-22 12:03           ` Michał Mirosław
  2011-11-22 12:10             ` Michał Mirosław
  0 siblings, 1 reply; 15+ messages in thread
From: Michał Mirosław @ 2011-11-22 12:03 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
> 
> > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > Thank for the report.
> > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > > 
> > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > wrote:
> > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > during 3.1-rcX.
> > > > > > 
> > > > > > [    6.246170] ------------[ cut here ]------------
> > > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > > 
> > > > I can tell you what it is:
> > > > 
> > > >         /*
> > > >          * Filesystem requests must transfer data.
> > > >          */
> > > >         BUG_ON(!req->nr_phys_segments);
> > > > 
> > > > But the fault is in the layer above SCSI.  It means something sent a
> > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > supposed to be impossible, hence the bug on.
> > > 
> > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > doesn't help.
> > > 
> > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> > 
> > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > drives.  The boot doesn't survive to the point where the initrd script asks
> > for md-crypt's key password.
> >
> 
> That gives us lots of room for pointing the finger of blame, doesn't it?
> I think it is -> his problem. :-)
> 
> From the md part of the stack trace it looks most like a write request.  It
> could be a retried read, but that is extremely unlike that early in boot.
> 
> So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> md/raid1 will just pass those unchanged down. 
> My guess is that ext4 is generating this and something in the stack is
> stripping the REQ_FLUSH .... though why it even tries before asking for a
> password is beyond me.

I pointed finger at md because when dm-crypt is not yet set up
then only thing working is the array.  All filesystems need the
dm-crypt mapping first.

From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:

[    2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
[    2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
[    2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.589321] ata1.00: configured for UDMA/100
[    2.589440] scsi 1:0:0:0: Direct-Access     ATA      KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
[    2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
[    2.631265] sd 1:0:0:0: [sda] Write Protect is off
[    2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.632119] sd 1:0:0:0: [sda] Attached SCSI disk

[    2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
[    2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
[    2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    2.630143] ata2.00: configured for UDMA/133
[    2.630238] scsi 2:0:0:0: Direct-Access     ATA      ST9500420AS      0002 PQ: 0 ANSI: 5
[    2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[    2.631792] sd 2:0:0:0: [sdb] Write Protect is off
[    2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk

There's two RAID1 array on both of the disks, and one more RAID1 (with second
leg missing) on sdb.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index f43c8a5..59cb2ad 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
>  			goto end_io;
>  		}
>  	}
> -
> +	WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
>  	if ((bio->bi_rw & REQ_DISCARD) &&
>  	    (!blk_queue_discard(q) ||
>  	     ((bio->bi_rw & REQ_SECURE) &&

I'll try that. I hope it can be caught through netconsole.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2011-11-22 12:03           ` Michał Mirosław
@ 2011-11-22 12:10             ` Michał Mirosław
  0 siblings, 0 replies; 15+ messages in thread
From: Michał Mirosław @ 2011-11-22 12:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

On Tue, Nov 22, 2011 at 01:03:37PM +0100, Michał Mirosław wrote:
> On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> > On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > wrote:
> > 
> > > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > > Thank for the report.
> > > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > > > 
> > > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > > wrote:
> > > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > > during 3.1-rcX.
> > > > > > > 
> > > > > > > [    6.246170] ------------[ cut here ]------------
> > > > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > > > 
> > > > > I can tell you what it is:
> > > > > 
> > > > >         /*
> > > > >          * Filesystem requests must transfer data.
> > > > >          */
> > > > >         BUG_ON(!req->nr_phys_segments);
> > > > > 
> > > > > But the fault is in the layer above SCSI.  It means something sent a
> > > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > > supposed to be impossible, hence the bug on.
> > > > 
> > > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > > doesn't help.
> > > > 
> > > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> > > 
> > > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > > drives.  The boot doesn't survive to the point where the initrd script asks
> > > for md-crypt's key password.
> > >
> > 
> > That gives us lots of room for pointing the finger of blame, doesn't it?
> > I think it is -> his problem. :-)
> > 
> > From the md part of the stack trace it looks most like a write request.  It
> > could be a retried read, but that is extremely unlike that early in boot.
> > 
> > So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> > md/raid1 will just pass those unchanged down. 
> > My guess is that ext4 is generating this and something in the stack is
> > stripping the REQ_FLUSH .... though why it even tries before asking for a
> > password is beyond me.
> 
> I pointed finger at md because when dm-crypt is not yet set up
> then only thing working is the array.  All filesystems need the
> dm-crypt mapping first.
> 
> From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:
> 
> [    2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
> [    2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
> [    2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
> [    2.589321] ata1.00: configured for UDMA/100
> [    2.589440] scsi 1:0:0:0: Direct-Access     ATA      KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
> [    2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
> [    2.631265] sd 1:0:0:0: [sda] Write Protect is off
> [    2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    2.632119] sd 1:0:0:0: [sda] Attached SCSI disk
> 
> [    2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
> [    2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
> [    2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [    2.630143] ata2.00: configured for UDMA/133
> [    2.630238] scsi 2:0:0:0: Direct-Access     ATA      ST9500420AS      0002 PQ: 0 ANSI: 5
> [    2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> [    2.631792] sd 2:0:0:0: [sdb] Write Protect is off
> [    2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [    2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk
> 
> There's two RAID1 array on both of the disks, and one more RAID1 (with second
> leg missing) on sdb.

I just remembered that the sdb leg of the main array has write-mostly flag
set. I checked /proc/mdstat from running system and it turns out that now
I have both legs marked so. Does this ring a bell?

cat /proc/mdstat
Personalities : [raid1]
md2 : active (auto-read-only) raid1 sdb3[0]
      425862712 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[3](W) sdb2[2](W)
      62396688 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      123892 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* raid 1 bug with write-mostly and administrative failed disk
@ 2012-01-05 21:30 Art -kwaak- van Breemen
  2012-01-05 21:39 ` Art -kwaak- van Breemen
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Art -kwaak- van Breemen @ 2012-01-05 21:30 UTC (permalink / raw)
  To: linux-raid

Hi,

Please Cc: me too as I am trying to subscribe to the list.

Anyway: I found a small bug in raid1, with write-behind and
write-mostly, occuring at least on 3.1.4 and 3.2 .

This is the test setup:
mdadm --stop /dev/md5
mdadm --zero-superblock /dev/sda8
mdadm --zero-superblock /dev/sdb8
mdadm --create -l 1 -n 2 --metadata=0.90 --bitmap=internal --bitmap-chunk=1024 --write-behind=2048 /dev/md5 /dev/sdb8 -W /dev/sda8
(wait until finished)
mdadm --fail /dev/md5 /dev/sdb8
# And this to trigger the bug:
dd if=/dev/md5 of=/dev/null bs=10k count=1


Transcript of the session:
================================================================================
root@skipper:~# mdadm --zero-superblock /dev/sda8^M
root@skipper:~# mdadm --zero-superblock /dev/sdb8^M
root@skipper:~# mdadm --create -l 1 -n 2 --metadata=0.90 --bitmap=internal --bit map-chunk=1024 --write-behind=2048 /dev/md5 /dev/sdb8 -W /dev/sda8
mdadm: /dev/sdb8 appears to contain an ext2fs file system
    size=228074688K  mtime=Tue Jan  3 20:37:01 2012
mdadm: largest drive (/dev/sda8) exceeds size (228074688K) by
more than 1%
Continue creating array? yes
md: bind<sdb8>
md: bind<sda8>
md/raid1:md5: not clean -- starting background reconstruction
md/raid1:md5: active with 2 out of 2 mirrors
md5: bitmap file is out of date (0 < 1) -- forcing full
recovery
created bitmap (109 pages) for device md5
md5: bitmap file is out of date, doing full recovery
md5: bitmap initialized from disk: read 7/7 pages, set 222730 of
222730 bits^M
md5: detected capacity change from 0 to 233548480512
mdadm: array /demd: resync of RAID array md5
v/md5 started.
 md5: unknown partition table^M
md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for resync.
md: using 128k window, over a total of 228074688k.


# Now waiting until raid array rebuild finishes :-(
root@skipper:~# md: md5: resync done.
# I will now paste as I got it from the serial console :-)

root@skipper:~# dd if=/dev/sda8 of=/dev/null bs=10k count=1^M
1+0 records in^M
1+0 records out^M
10240 bytes (10 kB) copied, 2.008e-05 s, 510 MB/s^M
root@skipper:~# dd if=/dev/sdb8 of=/dev/null bs=10k count=1^M
1+0 records in^M
1+0 records out^M
10240 bytes (10 kB) copied, 0.00303616 s, 3.4 MB/s^M
root@skipper:~# dd if=/dev/md5 of=/dev/null bs=10k count=1^M
1+0 records in^M
1+0 records out^M
10240 bytes (10 kB) copied, 0.00942157 s, 1.1 MB/s^M
root@skipper:~# mdadm --fail /dev/md5 /dev/sdb8^M
md/raid1:md5: Disk failure on sdb8, disabling device.^M
md/raid1:md5: Operation continuing on 1 devices.^M
mdadm: set /dev/sdb8 faulty in /dev/md5^M
root@skipper:~# dd if=/dev/sda8 of=/dev/null bs=10k count=1^M
1+0 records in^M
1+0 records out^M
10240 bytes (10 kB) copied, 3.0578e-05 s, 335 MB/s^M
root@skipper:~# dd if=/dev/sdb8 of=/dev/null bs=10k count=1^M
1+0 records in^M
1+0 records out^M
10240 bytes (10 kB) copied, 2.937e-05 s, 349 MB/s^M
root@skipper:~# dd if=/dev/md5 of=/dev/null bs=10k count=1^M
------------[ cut here ]------------^M
kernel BUG at drivers/scsi/scsi_lib.c:1153!^M
invalid opcode: 0000 [#1] SMP ^M
CPU 4 ^M
Modules linked in: 8021q bonding e1000 dcdbas bnx2 acpi_power_meter evdev hed^M
^M
Pid: 2932, comm: md5_raid1 Not tainted 3.2.0-d64-i7 #1 Dell Inc. PowerEdge M610/0V56FN^M
RIP: 0010:[<ffffffff8136f90e>]  [<ffffffff8136f90e>] scsi_setup_fs_cmnd+0xae/0xf0^M
RSP: 0018:ffff88061b1b5b70  EFLAGS: 00010046^M
RAX: 0000000000000000 RBX: ffff88061cfaa330 RCX: 0000000000000001^M
RDX: 0000000000000000 RSI: ffff88061cfaa330 RDI: ffff88031d5de000^M
RBP: ffff88031d5de000 R08: 0000000000000086 R09: 0000000000000001^M
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88061cfaa330^M
R13: ffff88031d5de000 R14: ffff88061c193400 R15: 0000000000000000^M
FS:  0000000000000000(0000) GS:ffff88062fc80000(0000) knlGS:0000000000000000^M
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b^M
CR2: 00007f0ca82304f8 CR3: 0000000001745000 CR4: 00000000000006e0^M
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000^M
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400^M
Process md5_raid1 (pid: 2932, threadinfo ffff88061b1b4000, task ffff88061ca78280)^M
Stack:^M
 ffff88031b54d418 ffff88061cfaa330 ffff88061be4d7c8 ffffffff813bd5ec^M
 0000000008100000 000000010006aa55 01ff88061cfaa330 0000000000000000^M
 0000000000000000 ffff88031b54d418 ffff88061be6a8c8 ffff88061cfaa330^M
Call Trace:^M
 [<ffffffff813bd5ec>] ? sd_prep_fn+0x15c/0xe10^M
 [<ffffffff812a6a2f>] ? blk_peek_request+0xbf/0x220^M
 [<ffffffff8136ed50>] ? scsi_request_fn+0x60/0x570^M
 [<ffffffff812a7229>] ? queue_unplugged+0x49/0xd0^M
 [<ffffffff812a7492>] ? blk_flush_plug_list+0x1e2/0x230^M
 [<ffffffff812a74eb>] ? blk_finish_plug+0xb/0x30^M
 [<ffffffff8143e17c>] ? raid1d+0x76c/0xec0^M
 [<ffffffff81093063>] ? lock_timer_base+0x33/0x70^M
 [<ffffffff81458187>] ? md_thread+0x117/0x150^M
 [<ffffffff810a4d40>] ? wake_up_bit+0x40/0x40^M
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100^M
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100^M
 [<ffffffff810a4836>] ? kthread+0x96/0xa0^M
 [<ffffffff815750f4>] ? kernel_thread_helper+0x4/0x10^M
 [<ffffffff810a47a0>] ? kthread_worker_fn+0x180/0x180^M
 [<ffffffff815750f0>] ? gs_change+0xb/0xb^M
Code: 00 00 0f 1f 00 48 83 c4 08 5b 5d c3 90 48 89 ef be 20 00 00 00 e8 83 93 ff ff 48 89 c7 48 85 c0 74 db 48 89 83 e8 00 00 00 eb 9
1 <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 67 ff ff ff 48 8b 40 50 48 ^M
RIP  [<ffffffff8136f90e>] scsi_setup_fs_cmnd+0xae/0xf0^M
 RSP <ffff88061b1b5b70>^M
---[ end trace 9e2209ca727bd89d ]---^M
^M^M
^GMessage from syslogd@localhost at Jan  5 21:59:41 ...^M^M
 kernel:------------[ cut here ]------------^M
^M^M^M
^GMessage from syslogd@localhost at Jan  5 21:59:41 ...^M^M
 kernel:invalid opcode: 0000 [#1] SMP ^M
^M^M^M
^GMessage from syslogd@localhost at Jan  5 21:59:41 ...^M^M
 kernel:Stack:^M
^M^M^M
^GMessage from syslogd@localhost at Jan  5 21:59:41 ...^M^M
 kernel:Call Trace:^M
^M^M^M
^GMessage from syslogd@localhost at Jan  5 21:59:41 ...^M^M
 kernel:Code: 00 00 0f 1f 00 48 83 c4 08 5b 5d c3 90 48 89 ef be 20 00 00 00 e8 83 93 ff ff 48 89 c7 48 85 c0 74 db 48 89 83 e8 00 00
 00 eb 91 <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 67 ff ff ff 48 8b 40 50 48 ^M
^M------------[ cut here ]------------^M
WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()^M
Hardware name: PowerEdge M610^M
Watchdog detected hard LOCKUP on cpu 4^M
Modules linked in: 8021q bonding e1000 dcdbas bnx2 acpi_power_meter evdev hed^M
Pid: 2932, comm: md5_raid1 Tainted: G      D      3.2.0-d64-i7 #1^M
Call Trace:^M
 <NMI>  [<ffffffff8108454b>] ? warn_slowpath_common+0x7b/0xc0^M
 [<ffffffff81084645>] ? warn_slowpath_fmt+0x45/0x50^M
 [<ffffffff810d2bf8>] ? watchdog_overflow_callback+0x98/0xc0^M
 [<ffffffff810fc99a>] ? __perf_event_overflow+0x9a/0x1f0^M
 [<ffffffff81052db9>] ? intel_pmu_handle_irq+0x149/0x280^M
 [<ffffffff81042b78>] ? do_nmi+0x108/0x360^M
 [<ffffffff8157384a>] ? nmi+0x1a/0x20^M
 [<ffffffff81573052>] ? _raw_spin_lock_irqsave+0x22/0x30^M
 <<EOE>>  [<ffffffff812b7d82>] ? cfq_exit_single_io_context+0x32/0x90^M
 [<ffffffff812b7e04>] ? cfq_exit_io_context+0x24/0x40^M
 [<ffffffff812aa7df>] ? exit_io_context+0x4f/0x70^M
 [<ffffffff81088aaa>] ? do_exit+0x58a/0x850^M
 [<ffffffff81042652>] ? oops_end+0x72/0xa0^M
 [<ffffffff810403a4>] ? do_invalid_op+0x84/0xa0^M

================================================================================

I can try variations of the test, but maybe it's easier if I add some debugging
to the kernel?
Anyway: it seems to be the same bug as:
http://marc.info/?l=linux-raid&m=132196390925943&w=2
So I guess it's a bug in handling write-mostly and not having any normal disks
in the array.

I am going to look further tommorow, now it's time to go home ;-).

Regards,
Ard van Breemen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid 1 bug with write-mostly and administrative failed disk
  2012-01-05 21:30 raid 1 bug with write-mostly and administrative failed disk Art -kwaak- van Breemen
@ 2012-01-05 21:39 ` Art -kwaak- van Breemen
  2012-01-06 21:41 ` Art -kwaak- van Breemen
  2012-01-09  1:34 ` NeilBrown
  2 siblings, 0 replies; 15+ messages in thread
From: Art -kwaak- van Breemen @ 2012-01-05 21:39 UTC (permalink / raw)
  To: linux-raid

On Thu, Jan 05, 2012 at 10:30:23PM +0100, Art -kwaak- van Breemen wrote:
> Please Cc: me too as I am trying to subscribe to the list.

Never mind that, already subscribed ;-).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid 1 bug with write-mostly and administrative failed disk
  2012-01-05 21:30 raid 1 bug with write-mostly and administrative failed disk Art -kwaak- van Breemen
  2012-01-05 21:39 ` Art -kwaak- van Breemen
@ 2012-01-06 21:41 ` Art -kwaak- van Breemen
  2012-01-09  1:34 ` NeilBrown
  2 siblings, 0 replies; 15+ messages in thread
From: Art -kwaak- van Breemen @ 2012-01-06 21:41 UTC (permalink / raw)
  To: linux-raid

On Thu, Jan 05, 2012 at 10:30:23PM +0100, Art -kwaak- van Breemen wrote:
> This is the test setup:
> mdadm --stop /dev/md5
> mdadm --zero-superblock /dev/sda8
> mdadm --zero-superblock /dev/sdb8
> mdadm --create -l 1 -n 2 --metadata=0.90 --bitmap=internal --bitmap-chunk=1024 --write-behind=2048 /dev/md5 /dev/sdb8 -W /dev/sda8
> (wait until finished)
> mdadm --fail /dev/md5 /dev/sdb8
> # And this to trigger the bug:
> dd if=/dev/md5 of=/dev/null bs=10k count=1

Original test:
- size b < size a; a == write-mostly; write-behind; metadata
  0.90; disk b "fails"
Allright: variations:
- metadata 1.2 -> crash
- size a == size b -> crash
- no write mostly disks -> OK
- fail disk a instead of disk b -> OK
- no write behind or bitmap chunk options, just the writemostly
  -> crash

The failure is persistent accross reboots. Once you only have
write-mostly disks, you are in trouble.

This leaves us with a minimal amount of testoptions:
mdadm --create -l 1 -n 2 --bitmap=internal /dev/md3 /dev/sdb6 -W /dev/sda6
# wait for the rebuild to finish
mdadm --fail /dev/md3 /dev/sdb6
dd if=/dev/md3 of=/dev/null bs=10k count=1

- tested this on 2.6.37 -> OK
- tested this on 2.6.38.8 -> OK
- tested this on 3.0.9 -> OK
- tested this on 3.0.9 -> OK
- tested this on 3.1.4 -> crash
- tested this on 3.2 -> crash
So this is a (major!) regression between 3.0.9 and 3.1.4.

Allright: I've managed to make the test even smaller:
mdadm --create -l 1 -n 2 --bitmap=internal /dev/md3 -W /dev/sdb6 /dev/sda6

Basically I think it boils down to this: if we only have
write-mostly, we probably do not have disks to read from.

Some more debugging info: after the fail (as seen in my first
post), processors start to lockup hard 1 by 1.
So again: first:
------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi_lib.c:1153!
invalid opcode: 0000 [#1] SMP 
CPU 2 
Modules linked in: e1000 bnx2 dcdbas psmouse evdev

Pid: 2768, comm: md3_raid1 Not tainted 3.2.0-d64-i7 #1 Dell Inc. PowerEdge 1950/0DT097
RIP: 0010:[<ffffffff8136f90e>]  [<ffffffff8136f90e>] scsi_setup_fs_cmnd+0xae/0xf0
RSP: 0018:ffff880222f4db70  EFLAGS: 00010046
RAX: 0000000000000000 RBX: ffff880221e2d600 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffff880221e2d600 RDI: ffff880222f99000
RBP: ffff880222f99000 R08: 0000000000000086 R09: 0000000000000001
R10: 4000000000000000 R11: 0000000000000000 R12: ffff880221e2d600
R13: ffff880222f99000 R14: ffff880221bf9c00 R15: 0000000000000800
FS:  0000000000000000(0000) GS:ffff88022fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000a31008 CR3: 0000000220ee4000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process md3_raid1 (pid: 2768, threadinfo ffff880222f4c000, task ffff880220ebcb30)
Stack:
 ffff880220d51ef8 ffff880221e2d600 ffff880222f960b8 ffffffff813bd5ec
 ffff880222ffd810 0000000000000000 0100000000000000 ffffffff00000000
 0000000000000002 ffff880220d51ef8 ffff880222824908 ffff880221e2d600
Call Trace:
 [<ffffffff813bd5ec>] ? sd_prep_fn+0x15c/0xe10
 [<ffffffff812a6a2f>] ? blk_peek_request+0xbf/0x220
 [<ffffffff8136ed50>] ? scsi_request_fn+0x60/0x570
 [<ffffffff812a7229>] ? queue_unplugged+0x49/0xd0
 [<ffffffff812a7492>] ? blk_flush_plug_list+0x1e2/0x230
 [<ffffffff812a74eb>] ? blk_finish_plug+0xb/0x30
 [<ffffffff8143e17c>] ? raid1d+0x76c/0xec0
 [<ffffffff81093063>] ? lock_timer_base+0x33/0x70
 [<ffffffff81458187>] ? md_thread+0x117/0x150
 [<ffffffff810a4d40>] ? wake_up_bit+0x40/0x40
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff810a4836>] ? kthread+0x96/0xa0
 [<ffffffff815750f4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff810a47a0>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff815750f0>] ? gs_change+0xb/0xb
Code: 00 00 0f 1f 00 48 83 c4 08 5b 5d c3 90 48 89 ef be 20 00 00 00 e8 83 93 ff ff 48 89 c7 48 85 c0 74 db 48 89 83 e8 00 00 00 eb 91 <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 67 ff ff ff 48 8b 40 50 48 
RIP  [<ffffffff8136f90e>] scsi_setup_fs_cmnd+0xae/0xf0
 RSP <ffff880222f4db70>
---[ end trace 9045ba4c41e91f50 ]---

And then we get:
------------[ cut here ]------------
WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
Hardware name: PowerEdge 1950
Watchdog detected hard LOCKUP on cpu 2
Modules linked in: e1000 bnx2 dcdbas psmouse evdev
Pid: 2768, comm: md3_raid1 Tainted: G      D      3.2.0-d64-i7 #1
Call Trace:
 <NMI>  [<ffffffff8108454b>] ? warn_slowpath_common+0x7b/0xc0
 [<ffffffff81084645>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffff810d2bf8>] ? watchdog_overflow_callback+0x98/0xc0
 [<ffffffff810fc99a>] ? __perf_event_overflow+0x9a/0x1f0
 [<ffffffff810aa905>] ? sched_clock_local+0x15/0x80
 [<ffffffff81052db9>] ? intel_pmu_handle_irq+0x149/0x280
 [<ffffffff81042b78>] ? do_nmi+0x108/0x360
 [<ffffffff8157384a>] ? nmi+0x1a/0x20
 [<ffffffff81573052>] ? _raw_spin_lock_irqsave+0x22/0x30
 <<EOE>>  [<ffffffff812b7d82>] ? cfq_exit_single_io_context+0x32/0x90
 [<ffffffff812b7e04>] ? cfq_exit_io_context+0x24/0x40
 [<ffffffff812aa7df>] ? exit_io_context+0x4f/0x70
 [<ffffffff81088aaa>] ? do_exit+0x58a/0x850
 [<ffffffff815705e4>] ? printk+0x40/0x45
 [<ffffffff81042652>] ? oops_end+0x72/0xa0
 [<ffffffff810403a4>] ? do_invalid_op+0x84/0xa0
 [<ffffffff8136f90e>] ? scsi_setup_fs_cmnd+0xae/0xf0
 [<ffffffff812b8687>] ? cfq_init_prio_data+0x67/0x120
 [<ffffffff812b8d73>] ? cfq_get_queue+0x523/0x5b0
 [<ffffffff81574f75>] ? invalid_op+0x15/0x20
 [<ffffffff8136f90e>] ? scsi_setup_fs_cmnd+0xae/0xf0
 [<ffffffff813bd5ec>] ? sd_prep_fn+0x15c/0xe10
 [<ffffffff812a6a2f>] ? blk_peek_request+0xbf/0x220
 [<ffffffff8136ed50>] ? scsi_request_fn+0x60/0x570
 [<ffffffff812a7229>] ? queue_unplugged+0x49/0xd0
 [<ffffffff812a7492>] ? blk_flush_plug_list+0x1e2/0x230
 [<ffffffff812a74eb>] ? blk_finish_plug+0xb/0x30
 [<ffffffff8143e17c>] ? raid1d+0x76c/0xec0
 [<ffffffff81093063>] ? lock_timer_base+0x33/0x70
 [<ffffffff81458187>] ? md_thread+0x117/0x150
 [<ffffffff810a4d40>] ? wake_up_bit+0x40/0x40
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff810a4836>] ? kthread+0x96/0xa0
 [<ffffffff815750f4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff810a47a0>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff815750f0>] ? gs_change+0xb/0xb
---[ end trace 9045ba4c41e91f51 ]---
And:
------------[ cut here ]------------
WARNING: at kernel/watchdog.c:241 watchdog_overflow_callback+0x98/0xc0()
Hardware name: PowerEdge 1950
Watchdog detected hard LOCKUP on cpu 3
Modules linked in: e1000 bnx2 dcdbas psmouse evdev
Pid: 1256, comm: md0_raid1 Tainted: G      D W    3.2.0-d64-i7 #1
Call Trace:
 <NMI>  [<ffffffff8108454b>] ? warn_slowpath_common+0x7b/0xc0
 [<ffffffff81084645>] ? warn_slowpath_fmt+0x45/0x50
 [<ffffffff810d2bf8>] ? watchdog_overflow_callback+0x98/0xc0
 [<ffffffff810fc99a>] ? __perf_event_overflow+0x9a/0x1f0
 [<ffffffff810aa905>] ? sched_clock_local+0x15/0x80
 [<ffffffff81052db9>] ? intel_pmu_handle_irq+0x149/0x280
 [<ffffffff81042b78>] ? do_nmi+0x108/0x360
 [<ffffffff8157384a>] ? nmi+0x1a/0x20
 [<ffffffff8157307a>] ? _raw_spin_lock_irq+0x1a/0x30
 <<EOE>>  [<ffffffff812a75d5>] ? blk_queue_bio+0xc5/0x350
 [<ffffffff812a581f>] ? generic_make_request+0xaf/0xe0
 [<ffffffff812a58be>] ? submit_bio+0x6e/0xf0
 [<ffffffff81458f37>] ? md_super_write+0x67/0xc0
 [<ffffffff814592a6>] ? md_update_sb+0x316/0x560
 [<ffffffff8145a97a>] ? md_check_recovery+0x29a/0x6a0
 [<ffffffff8143da42>] ? raid1d+0x32/0xec0
 [<ffffffff81458187>] ? md_thread+0x117/0x150
 [<ffffffff810a4d40>] ? wake_up_bit+0x40/0x40
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff81458070>] ? md_register_thread+0x100/0x100
 [<ffffffff810a4836>] ? kthread+0x96/0xa0
 [<ffffffff815750f4>] ? kernel_thread_helper+0x4/0x10
 [<ffffffff810a47a0>] ? kthread_worker_fn+0x180/0x180
 [<ffffffff815750f0>] ? gs_change+0xb/0xb
---[ end trace 9045ba4c41e91f52 ]---

I think that it means something in the block handling gets locked.
Anyway, off to home.

Regards,
Ard

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
       [not found] <20111119134139.GA30570@rere.qmqm.pl>
  2011-11-21  1:37 ` raid1d crash at boot NeilBrown
@ 2012-01-07 12:53 ` Michał Mirosław
  2012-01-09  1:35   ` NeilBrown
  1 sibling, 1 reply; 15+ messages in thread
From: Michał Mirosław @ 2012-01-07 12:53 UTC (permalink / raw)
  To: linux-raid; +Cc: Neil Brown

On Sat, Nov 19, 2011 at 02:41:39PM +0100, Michał Mirosław wrote:
> I get following BUG_ON tripped while booting, before rootfs is mounted by
> Debian's initrd. This started to happen for kernels since sometime
> during 3.1-rcX.
> 
> [    6.246170] ------------[ cut here ]------------
> [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> [    6.246347] invalid opcode: 0000 [#1] PREEMPT SMP
> [    6.246558] CPU 5
> [    6.246614] Modules linked in: usb_storage uas firewire_ohci firewire_core crc_itu_t xhci_hcd [last unloaded: scsi_wait_scan]
> [    6.247131]
> [    6.247194] Pid: 288, comm: md1_raid1 Not tainted 3.2.0-rc2mq+ #5 System manufacturer System Product Name/P8Z68-V PRO
> [    6.247422] RIP: 0010:[<ffffffff812443a1>]  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.247563] RSP: 0018:ffff8804140d1bd0  EFLAGS: 00010046
> [    6.247634] RAX: 0000000000000000 RBX: ffff88041d463800 RCX: 00000000ffffffff
> [    6.247710] RDX: 00000000ffffffff RSI: ffff8804142fd600 RDI: ffff88041d463800
> [    6.247785] RBP: ffff8804142fd600 R08: 00000000ffffffff R09: 0000000000017a00
> [    6.247861] R10: ffff88041d464000 R11: ffff88041d464000 R12: 0000000000000800
> [    6.247936] R13: 0000000000000001 R14: ffff88041d463800 R15: 0000000000000000
> [    6.248013] FS:  0000000000000000(0000) GS:ffff88042fb40000(0000) knlGS:0000000000000000
> [    6.248104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    6.248176] CR2: 000000000042b200 CR3: 0000000001605000 CR4: 00000000000406e0
> [    6.248252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.248328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    6.248404] Process md1_raid1 (pid: 288, threadinfo ffff8804140d0000, task ffff88041539a4c0)
> [    6.248495] Stack:
> [    6.248557]  0000000000000000 ffff8804142fd600 ffff8804142fd600 ffffffff8124a9be
> [    6.248819]  ffff8804142fe3a0 ffff8804142fd600 ffff88041d463848 ffffffff811a5d67
> [    6.249084]  ffff8804142fe3a0 ffff880415452400 ffff8804156f0000 00000000fffffa2b
> [    6.249346] Call Trace:
> [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
> [    6.249919]  [<ffffffff81031729>] ? get_parent_ip+0x9/0x1b
> [    6.249990]  [<ffffffff813a5c9e>] ? sub_preempt_count+0x83/0x94
> [    6.250060]  [<ffffffff813a202a>] ? schedule+0x73f/0x772
> [    6.250129]  [<ffffffff813a5d49>] ? add_preempt_count+0x9a/0x9c
> [    6.250199]  [<ffffffff813a330b>] ? _raw_spin_lock_irqsave+0x13/0x31
> [    6.250271]  [<ffffffff812a9bb4>] ? md_thread+0xfe/0x11c
> [    6.250340]  [<ffffffff8104f6c6>] ? add_wait_queue+0x3c/0x3c
> [    6.250410]  [<ffffffff812a9ab6>] ? signal_pending+0x17/0x17
> [    6.250479]  [<ffffffff8104f045>] ? kthread+0x76/0x7e
> [    6.250548]  [<ffffffff813a8c34>] ? kernel_thread_helper+0x4/0x10
> [    6.250618]  [<ffffffff8104efcf>] ? kthread_worker_fn+0x139/0x139
> [    6.250688]  [<ffffffff813a8c30>] ? gs_change+0xb/0xb
> [    6.250754] Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 50 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd d0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 b6 e9 ff ff 48 85 c0 48 89 c2 74 20
> [    6.253544] RIP  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.253658]  RSP <ffff8804140d1bd0>
> [    6.253722] ---[ end trace 533b0b5008dd7cee ]---
> [    6.253788] note: md1_raid1[288] exited with preempt_count 1

I've bisected this to following commit. It's not trivially revertable on v3.2,
but I'll do some tries with it.

Best Regards,
Michał Mirosław

---

commit d2eb35acfdccbe2a3622ed6cc441a5482148423b
Author: NeilBrown <neilb@suse.de>
Date:   Thu Jul 28 11:31:48 2011 +1000

    md/raid1: avoid reading from known bad blocks.
    
    Now that we have a bad block list, we should not read from those
    blocks.
    There are several main parts to this:
      1/ read_balance needs to check for bad blocks, and return not only
         the chosen device, but also how many good blocks are available
         there.
      2/ fix_read_error needs to avoid trying to read from bad blocks.
      3/ read submission must be ready to issue multiple reads to
         different devices as different bad blocks on different devices
         could mean that a single large read cannot be served by any one
         device, but can still be served by the array.
         This requires keeping count of the number of outstanding requests
         per bio.  This count is stored in 'bi_phys_segments'
      4/ retrying a read needs to also be ready to submit a smaller read
         and queue another request for the rest.
    
    This does not yet handle bad blocks when reading to perform resync,
    recovery, or check.
    
    'md_trim_bio' will also be used for RAID10, so put it in md.c and
    export it.
    
    Signed-off-by: NeilBrown <neilb@suse.de>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid 1 bug with write-mostly and administrative failed disk
  2012-01-05 21:30 raid 1 bug with write-mostly and administrative failed disk Art -kwaak- van Breemen
  2012-01-05 21:39 ` Art -kwaak- van Breemen
  2012-01-06 21:41 ` Art -kwaak- van Breemen
@ 2012-01-09  1:34 ` NeilBrown
  2012-01-09 13:25   ` Art -kwaak- van Breemen
  2 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2012-01-09  1:34 UTC (permalink / raw)
  To: Art -kwaak- van Breemen; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

On Thu, 5 Jan 2012 22:30:23 +0100 Art -kwaak- van Breemen
<ard@telegraafnet.nl> wrote:

> Hi,
> 
> Please Cc: me too as I am trying to subscribe to the list.
> 
> Anyway: I found a small bug in raid1, with write-behind and
> write-mostly, occuring at least on 3.1.4 and 3.2 .
> 
> This is the test setup:
> mdadm --stop /dev/md5
> mdadm --zero-superblock /dev/sda8
> mdadm --zero-superblock /dev/sdb8
> mdadm --create -l 1 -n 2 --metadata=0.90 --bitmap=internal --bitmap-chunk=1024 --write-behind=2048 /dev/md5 /dev/sdb8 -W /dev/sda8
> (wait until finished)
> mdadm --fail /dev/md5 /dev/sdb8
> # And this to trigger the bug:
> dd if=/dev/md5 of=/dev/null bs=10k count=1
> 

Thanks for the excellent bug report.

I never tested write-behind :-(

Please confirm that this patch fixes it for you.

NeilBrown


diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc24f0c..a368db2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -531,8 +531,17 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0)
+			if (best_disk < 0) {
+				if (is_badblock(rdev, this_sector, sectors,
+						&first_bad, &bad_sectors)) {
+					if (first_bad < this_sector)
+						/* Cannot use this */
+						continue;
+					best_good_sectors = first_bad - this_sector;
+				} else
+					best_good_sectors = sectors;
 				best_disk = disk;
+			}
 			continue;
 		}
 		/* This is a reasonable device to use.  It might

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2012-01-07 12:53 ` Michał Mirosław
@ 2012-01-09  1:35   ` NeilBrown
  2012-01-09 20:30     ` Michał Mirosław
  0 siblings, 1 reply; 15+ messages in thread
From: NeilBrown @ 2012-01-09  1:35 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 5197 bytes --]

On Sat, 7 Jan 2012 13:53:04 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> On Sat, Nov 19, 2011 at 02:41:39PM +0100, Michał Mirosław wrote:
> > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > Debian's initrd. This started to happen for kernels since sometime
> > during 3.1-rcX.
> > 
> > [    6.246170] ------------[ cut here ]------------
> > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > [    6.246347] invalid opcode: 0000 [#1] PREEMPT SMP
> > [    6.246558] CPU 5
> > [    6.246614] Modules linked in: usb_storage uas firewire_ohci firewire_core crc_itu_t xhci_hcd [last unloaded: scsi_wait_scan]
> > [    6.247131]
> > [    6.247194] Pid: 288, comm: md1_raid1 Not tainted 3.2.0-rc2mq+ #5 System manufacturer System Product Name/P8Z68-V PRO
> > [    6.247422] RIP: 0010:[<ffffffff812443a1>]  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> > [    6.247563] RSP: 0018:ffff8804140d1bd0  EFLAGS: 00010046
> > [    6.247634] RAX: 0000000000000000 RBX: ffff88041d463800 RCX: 00000000ffffffff
> > [    6.247710] RDX: 00000000ffffffff RSI: ffff8804142fd600 RDI: ffff88041d463800
> > [    6.247785] RBP: ffff8804142fd600 R08: 00000000ffffffff R09: 0000000000017a00
> > [    6.247861] R10: ffff88041d464000 R11: ffff88041d464000 R12: 0000000000000800
> > [    6.247936] R13: 0000000000000001 R14: ffff88041d463800 R15: 0000000000000000
> > [    6.248013] FS:  0000000000000000(0000) GS:ffff88042fb40000(0000) knlGS:0000000000000000
> > [    6.248104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [    6.248176] CR2: 000000000042b200 CR3: 0000000001605000 CR4: 00000000000406e0
> > [    6.248252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [    6.248328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > [    6.248404] Process md1_raid1 (pid: 288, threadinfo ffff8804140d0000, task ffff88041539a4c0)
> > [    6.248495] Stack:
> > [    6.248557]  0000000000000000 ffff8804142fd600 ffff8804142fd600 ffffffff8124a9be
> > [    6.248819]  ffff8804142fe3a0 ffff8804142fd600 ffff88041d463848 ffffffff811a5d67
> > [    6.249084]  ffff8804142fe3a0 ffff880415452400 ffff8804156f0000 00000000fffffa2b
> > [    6.249346] Call Trace:
> > [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> > [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> > [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> > [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> > [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> > [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> > [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
> > [    6.249919]  [<ffffffff81031729>] ? get_parent_ip+0x9/0x1b
> > [    6.249990]  [<ffffffff813a5c9e>] ? sub_preempt_count+0x83/0x94
> > [    6.250060]  [<ffffffff813a202a>] ? schedule+0x73f/0x772
> > [    6.250129]  [<ffffffff813a5d49>] ? add_preempt_count+0x9a/0x9c
> > [    6.250199]  [<ffffffff813a330b>] ? _raw_spin_lock_irqsave+0x13/0x31
> > [    6.250271]  [<ffffffff812a9bb4>] ? md_thread+0xfe/0x11c
> > [    6.250340]  [<ffffffff8104f6c6>] ? add_wait_queue+0x3c/0x3c
> > [    6.250410]  [<ffffffff812a9ab6>] ? signal_pending+0x17/0x17
> > [    6.250479]  [<ffffffff8104f045>] ? kthread+0x76/0x7e
> > [    6.250548]  [<ffffffff813a8c34>] ? kernel_thread_helper+0x4/0x10
> > [    6.250618]  [<ffffffff8104efcf>] ? kthread_worker_fn+0x139/0x139
> > [    6.250688]  [<ffffffff813a8c30>] ? gs_change+0xb/0xb
> > [    6.250754] Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 50 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd d0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 b6 e9 ff ff 48 85 c0 48 89 c2 74 20
> > [    6.253544] RIP  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> > [    6.253658]  RSP <ffff8804140d1bd0>
> > [    6.253722] ---[ end trace 533b0b5008dd7cee ]---
> > [    6.253788] note: md1_raid1[288] exited with preempt_count 1
> 
> I've bisected this to following commit. It's not trivially revertable on v3.2,
> but I'll do some tries with it.

Thanks for doing that - it is a great help.

And you were right - the write-mostly flag is relevant.

Please test this patch - it should fix the problem.

Thanks,
NeilBrown

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index cc24f0c..a368db2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -531,8 +531,17 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0)
+			if (best_disk < 0) {
+				if (is_badblock(rdev, this_sector, sectors,
+						&first_bad, &bad_sectors)) {
+					if (first_bad < this_sector)
+						/* Cannot use this */
+						continue;
+					best_good_sectors = first_bad - this_sector;
+				} else
+					best_good_sectors = sectors;
 				best_disk = disk;
+			}
 			continue;
 		}
 		/* This is a reasonable device to use.  It might


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: raid 1 bug with write-mostly and administrative failed disk
  2012-01-09  1:34 ` NeilBrown
@ 2012-01-09 13:25   ` Art -kwaak- van Breemen
  0 siblings, 0 replies; 15+ messages in thread
From: Art -kwaak- van Breemen @ 2012-01-09 13:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi,

On Mon, Jan 09, 2012 at 12:34:03PM +1100, NeilBrown wrote:
> On Thu, 5 Jan 2012 22:30:23 +0100 Art -kwaak- van Breemen
> <ard@telegraafnet.nl> wrote:
> 
> > Anyway: I found a small bug in raid1, with write-behind and
> > write-mostly, occuring at least on 3.1.4 and 3.2 .
>
> I never tested write-behind :-(

Well, I (un?)fortunately had a testcase were I was exchanging disks from a raid
array for ssd.

> Please confirm that this patch fixes it for you.

My latest testcase:
mdadm --create -l 1 -n 2 --bitmap=internal /dev/md3 -W /dev/sdb6 /dev/sda6
works like a charm now for 3.2 :-).

Proof:
md3 : active raid1 sda6[1](W) sdb6[0](W)
      2103478 blocks super 1.2 [2/2] [UU]
      bitmap: 1/1 pages [4KB], 65536KB chunk

Failing disk 1 and then reading it works.

Thanks Neil! I can be very happy now (3.1 is a very stable
series so I guess 3.2 is stable too :-) ).
But trying to understand the patch: we have selected a disk but we
did not have determined the amount of readable sectors on that
disk so it defaulted to 0?

Thanks again!
Regards,
Ard van Breemen

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: raid1d crash at boot
  2012-01-09  1:35   ` NeilBrown
@ 2012-01-09 20:30     ` Michał Mirosław
  0 siblings, 0 replies; 15+ messages in thread
From: Michał Mirosław @ 2012-01-09 20:30 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Jan 09, 2012 at 12:35:52PM +1100, NeilBrown wrote:
> On Sat, 7 Jan 2012 13:53:04 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
> > On Sat, Nov 19, 2011 at 02:41:39PM +0100, Michał Mirosław wrote:
> > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > Debian's initrd. This started to happen for kernels since sometime
> > > during 3.1-rcX.
> > > 
> > > [    6.246170] ------------[ cut here ]------------
> > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
[...]
> > > [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> > > [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> > > [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> > > [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> > > [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> > > [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> > > [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
[...]
> > I've bisected this to following commit. It's not trivially revertable on v3.2,
> > but I'll do some tries with it.
> Thanks for doing that - it is a great help.
> And you were right - the write-mostly flag is relevant.
> Please test this patch - it should fix the problem.

> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index cc24f0c..a368db2 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -531,8 +531,17 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
>  		if (test_bit(WriteMostly, &rdev->flags)) {
>  			/* Don't balance among write-mostly, just
>  			 * use the first as a last resort */
> -			if (best_disk < 0)
> +			if (best_disk < 0) {
> +				if (is_badblock(rdev, this_sector, sectors,
> +						&first_bad, &bad_sectors)) {
> +					if (first_bad < this_sector)
> +						/* Cannot use this */
> +						continue;
> +					best_good_sectors = first_bad - this_sector;
> +				} else
> +					best_good_sectors = sectors;
>  				best_disk = disk;
> +			}
>  			continue;
>  		}
>  		/* This is a reasonable device to use.  It might
> 

Built and booted fine. Thanks!

Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2012-01-09 20:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-01-05 21:30 raid 1 bug with write-mostly and administrative failed disk Art -kwaak- van Breemen
2012-01-05 21:39 ` Art -kwaak- van Breemen
2012-01-06 21:41 ` Art -kwaak- van Breemen
2012-01-09  1:34 ` NeilBrown
2012-01-09 13:25   ` Art -kwaak- van Breemen
     [not found] <20111119134139.GA30570@rere.qmqm.pl>
2011-11-21  1:37 ` raid1d crash at boot NeilBrown
2011-11-21  7:04   ` James Bottomley
2011-11-21  8:27     ` NeilBrown
2011-11-22  0:50       ` Michał Mirosław
2011-11-22  1:26         ` NeilBrown
2011-11-22 12:03           ` Michał Mirosław
2011-11-22 12:10             ` Michał Mirosław
2012-01-07 12:53 ` Michał Mirosław
2012-01-09  1:35   ` NeilBrown
2012-01-09 20:30     ` Michał Mirosław

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).