Re: raid1d crash at boot

public inbox for linux-scsi@vger.kernel.org
 help / color / mirror / Atom feed

* Re: raid1d crash at boot
       [not found] <20111119134139.GA30570@rere.qmqm.pl>
@ 2011-11-21  1:37 ` NeilBrown
  2011-11-21  7:04   ` James Bottomley
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-11-21  1:37 UTC (permalink / raw)
  To: Michał Mirosław; +Cc: linux-raid, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4215 bytes --]


Thank for the report.
However as this crash is clearly in the SCSI layer it makes sense to reported
it to linux-scsi - so I have cc:ed this reply there.

NeilBrown


On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> I get following BUG_ON tripped while booting, before rootfs is mounted by
> Debian's initrd. This started to happen for kernels since sometime
> during 3.1-rcX.
> 
> [    6.246170] ------------[ cut here ]------------
> [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> [    6.246347] invalid opcode: 0000 [#1] PREEMPT SMP
> [    6.246558] CPU 5
> [    6.246614] Modules linked in: usb_storage uas firewire_ohci firewire_core crc_itu_t xhci_hcd [last unloaded: scsi_wait_scan]
> [    6.247131]
> [    6.247194] Pid: 288, comm: md1_raid1 Not tainted 3.2.0-rc2mq+ #5 System manufacturer System Product Name/P8Z68-V PRO
> [    6.247422] RIP: 0010:[<ffffffff812443a1>]  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.247563] RSP: 0018:ffff8804140d1bd0  EFLAGS: 00010046
> [    6.247634] RAX: 0000000000000000 RBX: ffff88041d463800 RCX: 00000000ffffffff
> [    6.247710] RDX: 00000000ffffffff RSI: ffff8804142fd600 RDI: ffff88041d463800
> [    6.247785] RBP: ffff8804142fd600 R08: 00000000ffffffff R09: 0000000000017a00
> [    6.247861] R10: ffff88041d464000 R11: ffff88041d464000 R12: 0000000000000800
> [    6.247936] R13: 0000000000000001 R14: ffff88041d463800 R15: 0000000000000000
> [    6.248013] FS:  0000000000000000(0000) GS:ffff88042fb40000(0000) knlGS:0000000000000000
> [    6.248104] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [    6.248176] CR2: 000000000042b200 CR3: 0000000001605000 CR4: 00000000000406e0
> [    6.248252] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    6.248328] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [    6.248404] Process md1_raid1 (pid: 288, threadinfo ffff8804140d0000, task ffff88041539a4c0)
> [    6.248495] Stack:
> [    6.248557]  0000000000000000 ffff8804142fd600 ffff8804142fd600 ffffffff8124a9be
> [    6.248819]  ffff8804142fe3a0 ffff8804142fd600 ffff88041d463848 ffffffff811a5d67
> [    6.249084]  ffff8804142fe3a0 ffff880415452400 ffff8804156f0000 00000000fffffa2b
> [    6.249346] Call Trace:
> [    6.249414]  [<ffffffff8124a9be>] ? sd_prep_fn+0x2cd/0xb72
> [    6.249490]  [<ffffffff811a5d67>] ? cfq_dispatch_requests+0x6f2/0x82c
> [    6.249567]  [<ffffffff8119a168>] ? blk_peek_request+0xc8/0x1bf
> [    6.249638]  [<ffffffff81243d83>] ? scsi_request_fn+0x64/0x406
> [    6.249708]  [<ffffffff8119a526>] ? blk_flush_plug_list+0x186/0x1b7
> [    6.249780]  [<ffffffff8119a562>] ? blk_finish_plug+0xb/0x2a
> [    6.249849]  [<ffffffff812a400f>] ? raid1d+0x91/0xb22
> [    6.249919]  [<ffffffff81031729>] ? get_parent_ip+0x9/0x1b
> [    6.249990]  [<ffffffff813a5c9e>] ? sub_preempt_count+0x83/0x94
> [    6.250060]  [<ffffffff813a202a>] ? schedule+0x73f/0x772
> [    6.250129]  [<ffffffff813a5d49>] ? add_preempt_count+0x9a/0x9c
> [    6.250199]  [<ffffffff813a330b>] ? _raw_spin_lock_irqsave+0x13/0x31
> [    6.250271]  [<ffffffff812a9bb4>] ? md_thread+0xfe/0x11c
> [    6.250340]  [<ffffffff8104f6c6>] ? add_wait_queue+0x3c/0x3c
> [    6.250410]  [<ffffffff812a9ab6>] ? signal_pending+0x17/0x17
> [    6.250479]  [<ffffffff8104f045>] ? kthread+0x76/0x7e
> [    6.250548]  [<ffffffff813a8c34>] ? kernel_thread_helper+0x4/0x10
> [    6.250618]  [<ffffffff8104efcf>] ? kthread_worker_fn+0x139/0x139
> [    6.250688]  [<ffffffff813a8c30>] ? gs_change+0xb/0xb
> [    6.250754] Code: 85 c0 74 1d 48 8b 00 48 85 c0 74 15 48 8b 40 50 48 85 c0 74 0c 48 89 ee 48 89 df ff d0 85 c0 75 44 66 83 bd d0 00 00 00 00 75 02 <0f> 0b 48 89 ee 48 89 df e8 b6 e9 ff ff 48 85 c0 48 89 c2 74 20
> [    6.253544] RIP  [<ffffffff812443a1>] scsi_setup_fs_cmnd+0x45/0x83
> [    6.253658]  RSP <ffff8804140d1bd0>
> [    6.253722] ---[ end trace 533b0b5008dd7cee ]---
> [    6.253788] note: md1_raid1[288] exited with preempt_count 1
> 
> I'm attaching dmesg log I could catch with netcosole (there's some part missing).
> 
> Best Regards,
> Michał Mirosław


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  1:37 ` raid1d crash at boot NeilBrown
@ 2011-11-21  7:04   ` James Bottomley
  2011-11-21  8:27     ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: James Bottomley @ 2011-11-21  7:04 UTC (permalink / raw)
  To: NeilBrown; +Cc: Michał Mirosław, linux-raid, linux-scsi

On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> Thank for the report.
> However as this crash is clearly in the SCSI layer it makes sense to reported
> it to linux-scsi - so I have cc:ed this reply there.
> 
> NeilBrown
> 
> 
> On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
> 
> > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > Debian's initrd. This started to happen for kernels since sometime
> > during 3.1-rcX.
> > 
> > [    6.246170] ------------[ cut here ]------------
> > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!

I can tell you what it is:

        /*
         * Filesystem requests must transfer data.
         */
        BUG_ON(!req->nr_phys_segments);

But the fault is in the layer above SCSI.  It means something sent a
request with REQ_TYPE_FS but no actual data attached ... this is
supposed to be impossible, hence the bug on.

James


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  7:04   ` James Bottomley
@ 2011-11-21  8:27     ` NeilBrown
  2011-11-22  0:50       ` Michał Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-11-21  8:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Michał Mirosław, linux-raid, linux-scsi

[-- Attachment #1: Type: text/plain, Size: 1377 bytes --]

On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
<James.Bottomley@HansenPartnership.com> wrote:

> On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > Thank for the report.
> > However as this crash is clearly in the SCSI layer it makes sense to reported
> > it to linux-scsi - so I have cc:ed this reply there.
> > 
> > NeilBrown
> > 
> > 
> > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > wrote:
> > 
> > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > Debian's initrd. This started to happen for kernels since sometime
> > > during 3.1-rcX.
> > > 
> > > [    6.246170] ------------[ cut here ]------------
> > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> 
> I can tell you what it is:
> 
>         /*
>          * Filesystem requests must transfer data.
>          */
>         BUG_ON(!req->nr_phys_segments);
> 
> But the fault is in the layer above SCSI.  It means something sent a
> request with REQ_TYPE_FS but no actual data attached ... this is
> supposed to be impossible, hence the bug on.
> 
> James
> 

Thanks.... that sounds strangely familiar, but I cannot be sure and google
doesn't help.

Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-21  8:27     ` NeilBrown
@ 2011-11-22  0:50       ` Michał Mirosław
  2011-11-22  1:26         ` NeilBrown
  0 siblings, 1 reply; 7+ messages in thread
From: Michał Mirosław @ 2011-11-22  0:50 UTC (permalink / raw)
  To: NeilBrown; +Cc: James Bottomley, linux-raid, linux-scsi

On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> <James.Bottomley@HansenPartnership.com> wrote:
> > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > Thank for the report.
> > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > it to linux-scsi - so I have cc:ed this reply there.
> > > 
> > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > wrote:
> > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > Debian's initrd. This started to happen for kernels since sometime
> > > > during 3.1-rcX.
> > > > 
> > > > [    6.246170] ------------[ cut here ]------------
> > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > 
> > I can tell you what it is:
> > 
> >         /*
> >          * Filesystem requests must transfer data.
> >          */
> >         BUG_ON(!req->nr_phys_segments);
> > 
> > But the fault is in the layer above SCSI.  It means something sent a
> > request with REQ_TYPE_FS but no actual data attached ... this is
> > supposed to be impossible, hence the bug on.
> 
> Thanks.... that sounds strangely familiar, but I cannot be sure and google
> doesn't help.
> 
> Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?

The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
drives.  The boot doesn't survive to the point where the initrd script asks
for md-crypt's key password.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-22  0:50       ` Michał Mirosław
@ 2011-11-22  1:26         ` NeilBrown
  2011-11-22 12:03           ` Michał Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: NeilBrown @ 2011-11-22  1:26 UTC (permalink / raw)
  To: Michał Mirosław
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

[-- Attachment #1: Type: text/plain, Size: 2869 bytes --]

On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
wrote:

> On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > <James.Bottomley@HansenPartnership.com> wrote:
> > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > Thank for the report.
> > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > 
> > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > wrote:
> > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > during 3.1-rcX.
> > > > > 
> > > > > [    6.246170] ------------[ cut here ]------------
> > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > 
> > > I can tell you what it is:
> > > 
> > >         /*
> > >          * Filesystem requests must transfer data.
> > >          */
> > >         BUG_ON(!req->nr_phys_segments);
> > > 
> > > But the fault is in the layer above SCSI.  It means something sent a
> > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > supposed to be impossible, hence the bug on.
> > 
> > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > doesn't help.
> > 
> > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> 
> The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> drives.  The boot doesn't survive to the point where the initrd script asks
> for md-crypt's key password.
>

That gives us lots of room for pointing the finger of blame, doesn't it?
I think it is -> his problem. :-)

From the md part of the stack trace it looks most like a write request.  It
could be a retried read, but that is extremely unlike that early in boot.

So presumably it is some sort of zero-length REQ_FLUSH or something like that.
md/raid1 will just pass those unchanged down. 
My guess is that ext4 is generating this and something in the stack is
stripping the REQ_FLUSH .... though why it even tries before asking for a
password is beyond me.

Maybe someone of dm-devel can help?

If not we might need to try a debugging patch like this:


diff --git a/block/blk-core.c b/block/blk-core.c
index f43c8a5..59cb2ad 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
 			goto end_io;
 		}
 	}
-
+	WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
 	if ((bio->bi_rw & REQ_DISCARD) &&
 	    (!blk_queue_discard(q) ||
 	     ((bio->bi_rw & REQ_SECURE) &&


NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-22  1:26         ` NeilBrown
@ 2011-11-22 12:03           ` Michał Mirosław
  2011-11-22 12:10             ` Michał Mirosław
  0 siblings, 1 reply; 7+ messages in thread
From: Michał Mirosław @ 2011-11-22 12:03 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> wrote:
> 
> > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > Thank for the report.
> > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > > 
> > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > wrote:
> > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > during 3.1-rcX.
> > > > > > 
> > > > > > [    6.246170] ------------[ cut here ]------------
> > > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > > 
> > > > I can tell you what it is:
> > > > 
> > > >         /*
> > > >          * Filesystem requests must transfer data.
> > > >          */
> > > >         BUG_ON(!req->nr_phys_segments);
> > > > 
> > > > But the fault is in the layer above SCSI.  It means something sent a
> > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > supposed to be impossible, hence the bug on.
> > > 
> > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > doesn't help.
> > > 
> > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> > 
> > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > drives.  The boot doesn't survive to the point where the initrd script asks
> > for md-crypt's key password.
> >
> 
> That gives us lots of room for pointing the finger of blame, doesn't it?
> I think it is -> his problem. :-)
> 
> From the md part of the stack trace it looks most like a write request.  It
> could be a retried read, but that is extremely unlike that early in boot.
> 
> So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> md/raid1 will just pass those unchanged down. 
> My guess is that ext4 is generating this and something in the stack is
> stripping the REQ_FLUSH .... though why it even tries before asking for a
> password is beyond me.

I pointed finger at md because when dm-crypt is not yet set up
then only thing working is the array.  All filesystems need the
dm-crypt mapping first.

From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:

[    2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
[    2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
[    2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.589321] ata1.00: configured for UDMA/100
[    2.589440] scsi 1:0:0:0: Direct-Access     ATA      KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
[    2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
[    2.631265] sd 1:0:0:0: [sda] Write Protect is off
[    2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.632119] sd 1:0:0:0: [sda] Attached SCSI disk

[    2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
[    2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[    2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
[    2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
[    2.630143] ata2.00: configured for UDMA/133
[    2.630238] scsi 2:0:0:0: Direct-Access     ATA      ST9500420AS      0002 PQ: 0 ANSI: 5
[    2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
[    2.631792] sd 2:0:0:0: [sdb] Write Protect is off
[    2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk

There's two RAID1 array on both of the disks, and one more RAID1 (with second
leg missing) on sdb.

> diff --git a/block/blk-core.c b/block/blk-core.c
> index f43c8a5..59cb2ad 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -1560,7 +1560,7 @@ generic_make_request_checks(struct bio *bio)
>  			goto end_io;
>  		}
>  	}
> -
> +	WARN_ON(((bio->bi_rw & (REQ_FLUSH | REQ_FUA)) && nr_sectors == 0);
>  	if ((bio->bi_rw & REQ_DISCARD) &&
>  	    (!blk_queue_discard(q) ||
>  	     ((bio->bi_rw & REQ_SECURE) &&

I'll try that. I hope it can be caught through netconsole.

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: raid1d crash at boot
  2011-11-22 12:03           ` Michał Mirosław
@ 2011-11-22 12:10             ` Michał Mirosław
  0 siblings, 0 replies; 7+ messages in thread
From: Michał Mirosław @ 2011-11-22 12:10 UTC (permalink / raw)
  To: NeilBrown
  Cc: James Bottomley, linux-raid, linux-scsi,
	device-mapper development

On Tue, Nov 22, 2011 at 01:03:37PM +0100, Michał Mirosław wrote:
> On Tue, Nov 22, 2011 at 12:26:57PM +1100, NeilBrown wrote:
> > On Tue, 22 Nov 2011 01:50:37 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > wrote:
> > 
> > > On Mon, Nov 21, 2011 at 07:27:45PM +1100, NeilBrown wrote:
> > > > On Mon, 21 Nov 2011 08:04:30 +0100 James Bottomley
> > > > <James.Bottomley@HansenPartnership.com> wrote:
> > > > > On Mon, 2011-11-21 at 12:37 +1100, NeilBrown wrote:
> > > > > > Thank for the report.
> > > > > > However as this crash is clearly in the SCSI layer it makes sense to reported
> > > > > > it to linux-scsi - so I have cc:ed this reply there.
> > > > > > 
> > > > > > On Sat, 19 Nov 2011 14:41:39 +0100 Michał Mirosław <mirq-linux@rere.qmqm.pl>
> > > > > > wrote:
> > > > > > > I get following BUG_ON tripped while booting, before rootfs is mounted by
> > > > > > > Debian's initrd. This started to happen for kernels since sometime
> > > > > > > during 3.1-rcX.
> > > > > > > 
> > > > > > > [    6.246170] ------------[ cut here ]------------
> > > > > > > [    6.246246] kernel BUG at /mnt/src-tmp/jaja/git/qmqm/drivers/scsi/scsi_lib.c:1153!
> > > > > 
> > > > > I can tell you what it is:
> > > > > 
> > > > >         /*
> > > > >          * Filesystem requests must transfer data.
> > > > >          */
> > > > >         BUG_ON(!req->nr_phys_segments);
> > > > > 
> > > > > But the fault is in the layer above SCSI.  It means something sent a
> > > > > request with REQ_TYPE_FS but no actual data attached ... this is
> > > > > supposed to be impossible, hence the bug on.
> > > > 
> > > > Thanks.... that sounds strangely familiar, but I cannot be sure and google
> > > > doesn't help.
> > > > 
> > > > Michał: what are you using on the RAID1 - some filesystem (which one)or swap or something else?
> > > 
> > > The whole stack is: ext4 over lvm over dm-crypt over md-raid1 over SATA
> > > drives.  The boot doesn't survive to the point where the initrd script asks
> > > for md-crypt's key password.
> > >
> > 
> > That gives us lots of room for pointing the finger of blame, doesn't it?
> > I think it is -> his problem. :-)
> > 
> > From the md part of the stack trace it looks most like a write request.  It
> > could be a retried read, but that is extremely unlike that early in boot.
> > 
> > So presumably it is some sort of zero-length REQ_FLUSH or something like that.
> > md/raid1 will just pass those unchanged down. 
> > My guess is that ext4 is generating this and something in the stack is
> > stripping the REQ_FLUSH .... though why it even tries before asking for a
> > password is beyond me.
> 
> I pointed finger at md because when dm-crypt is not yet set up
> then only thing working is the array.  All filesystems need the
> dm-crypt mapping first.
> 
> From the dmesg on 3.0, I see that NCQ is enabled but FUA is not:
> 
> [    2.269487] ata1: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25100 irq 64
> [    2.588395] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    2.588979] ata1.00: ATA-8: KINGSTON SV100S264G, D110225a, max UDMA/100
> [    2.589037] ata1.00: 125045424 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
> [    2.589321] ata1.00: configured for UDMA/100
> [    2.589440] scsi 1:0:0:0: Direct-Access     ATA      KINGSTON SV100S2 D110 PQ: 0 ANSI: 5
> [    2.631113] sd 1:0:0:0: [sda] 125045424 512-byte logical blocks: (64.0 GB/59.6 GiB)
> [    2.631265] sd 1:0:0:0: [sda] Write Protect is off
> [    2.631267] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
> [    2.631296] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    2.632119] sd 1:0:0:0: [sda] Attached SCSI disk
> 
> [    2.269557] ata2: SATA max UDMA/133 abar m2048@0xfbd25000 port 0xfbd25180 irq 64
> [    2.588916] ata2: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> [    2.628336] ata2.00: ATA-8: ST9500420AS, 0002SDM1, max UDMA/133
> [    2.628396] ata2.00: 976773168 sectors, multi 16: LBA48 NCQ (depth 31/32)
> [    2.630143] ata2.00: configured for UDMA/133
> [    2.630238] scsi 2:0:0:0: Direct-Access     ATA      ST9500420AS      0002 PQ: 0 ANSI: 5
> [    2.631236] sd 2:0:0:0: [sdb] 976773168 512-byte logical blocks: (500 GB/465 GiB)
> [    2.631792] sd 2:0:0:0: [sdb] Write Protect is off
> [    2.632031] sd 2:0:0:0: [sdb] Mode Sense: 00 3a 00 00
> [    2.632050] sd 2:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    2.636038] sd 2:0:0:0: [sdb] Attached SCSI disk
> 
> There's two RAID1 array on both of the disks, and one more RAID1 (with second
> leg missing) on sdb.

I just remembered that the sdb leg of the main array has write-mostly flag
set. I checked /proc/mdstat from running system and it turns out that now
I have both legs marked so. Does this ring a bell?

cat /proc/mdstat
Personalities : [raid1]
md2 : active (auto-read-only) raid1 sdb3[0]
      425862712 blocks super 1.2 [2/1] [U_]

md1 : active raid1 sda2[3](W) sdb2[2](W)
      62396688 blocks super 1.2 [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      123892 blocks super 1.2 [2/2] [UU]

unused devices: <none>

Best Regards,
Michał Mirosław
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-11-22 12:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20111119134139.GA30570@rere.qmqm.pl>
2011-11-21  1:37 ` raid1d crash at boot NeilBrown
2011-11-21  7:04   ` James Bottomley
2011-11-21  8:27     ` NeilBrown
2011-11-22  0:50       ` Michał Mirosław
2011-11-22  1:26         ` NeilBrown
2011-11-22 12:03           ` Michał Mirosław
2011-11-22 12:10             ` Michał Mirosław

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox