Linux RAID subsystem development
 help / color / mirror / Atom feed
* MD does not wait for drives on start-up with kernels 3.8+
@ 2013-03-30 13:05 Roman Mamedov
  2013-03-31  5:36 ` CoolCold
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2013-03-30 13:05 UTC (permalink / raw)
  To: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1270 bytes --]

Hello,

I got a bizzare problem after trying to upgrade from 3.7.10 (which works fine)
to 3.8.3 or 3.9-rc3 (fail in the same way):

It looks like on the newer kernels MD does not wait for all drives to finish
registering before trying to bring up the arrays. As a result, on newer
kernels my arrays always come up with only 5 out of 7 members active.

dmesg 3.7.10:  http://romanrm.ru/dl/mdadm/dmesg37.txt
dmesg 3.9-rc3: http://romanrm.ru/dl/mdadm/dmesg39.txt

As you can see on 3.7.10 drives continue to go online until about 3.89sec
(Hitachi) and 4.26sec (card reader, unrelated), and only THEN, at 5.15sec the
md0 starts; it has all drives and everything is good.

On 3.9-rc3 (and 3.8.3), md0 tries to become active VERY EARLY at 2.64sec of
boot-up!!! Of course missing drives which appear later, and of those, members
of it should have been two Hitachi drives coming up at 3.33sec and 3.87sec.
So it starts degraded with 5 of 7 devices only.

What's up with this? I understand this might be not an md problem but an
udev(?) one; but how do I go about solving this?

-- 
With respect,
Roman

(resending this message since the previous one with largish attachments
didn't seem to make it to the list; also used a different dmesg for 3.7)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MD does not wait for drives on start-up with kernels 3.8+
  2013-03-30 13:05 MD does not wait for drives on start-up with kernels 3.8+ Roman Mamedov
@ 2013-03-31  5:36 ` CoolCold
  2013-04-15  8:45   ` Roman Mamedov
  0 siblings, 1 reply; 6+ messages in thread
From: CoolCold @ 2013-03-31  5:36 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: linux-raid

In similar situation (but on 2.6.2x kernels i AFAIR, with scsi drives)
I've used rootdelay param -
http://lists.debian.org/debian-user/2012/01/msg00748.html to make md
assembly succeed. Hope this will help you too.

P.S. to Neil B.
Just wondering why md prints something like "md: md0 stopped" - why it
is stopped without being started, this confuses me all the time? Is
this message just to indicate that array isn't assembled, not started?
If so, I'd suggest to change it to something like:
"md: md0 is in stopped state"

On Sat, Mar 30, 2013 at 5:05 PM, Roman Mamedov <rm@romanrm.ru> wrote:
> Hello,
>
> I got a bizzare problem after trying to upgrade from 3.7.10 (which works fine)
> to 3.8.3 or 3.9-rc3 (fail in the same way):
>
> It looks like on the newer kernels MD does not wait for all drives to finish
> registering before trying to bring up the arrays. As a result, on newer
> kernels my arrays always come up with only 5 out of 7 members active.
>
> dmesg 3.7.10:  http://romanrm.ru/dl/mdadm/dmesg37.txt
> dmesg 3.9-rc3: http://romanrm.ru/dl/mdadm/dmesg39.txt
>
> As you can see on 3.7.10 drives continue to go online until about 3.89sec
> (Hitachi) and 4.26sec (card reader, unrelated), and only THEN, at 5.15sec the
> md0 starts; it has all drives and everything is good.
>
> On 3.9-rc3 (and 3.8.3), md0 tries to become active VERY EARLY at 2.64sec of
> boot-up!!! Of course missing drives which appear later, and of those, members
> of it should have been two Hitachi drives coming up at 3.33sec and 3.87sec.
> So it starts degraded with 5 of 7 devices only.
>
> What's up with this? I understand this might be not an md problem but an
> udev(?) one; but how do I go about solving this?
>
> --
> With respect,
> Roman
>
> (resending this message since the previous one with largish attachments
> didn't seem to make it to the list; also used a different dmesg for 3.7)



-- 
Best regards,
[COOLCOLD-RIPN]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MD does not wait for drives on start-up with kernels 3.8+
  2013-03-31  5:36 ` CoolCold
@ 2013-04-15  8:45   ` Roman Mamedov
  2013-04-15 10:34     ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2013-04-15  8:45 UTC (permalink / raw)
  To: CoolCold; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2469 bytes --]

On Sun, 31 Mar 2013 09:36:30 +0400
CoolCold <coolthecold@gmail.com> wrote:

> In similar situation (but on 2.6.2x kernels i AFAIR, with scsi drives)
> I've used rootdelay param -
> http://lists.debian.org/debian-user/2012/01/msg00748.html to make md
> assembly succeed. Hope this will help you too.

Thanks, rootdelay=10 seems to have solved the problem.

...
[    3.860538] scsi 12:0:0:0: Direct-Access     ATA      Hitachi HDS5C302 ML6O PQ: 0 ANSI: 5
[    3.860716] sd 12:0:0:0: Attached scsi generic sg8 type 0
[    3.860879] sd 12:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
[    3.860906] sd 12:0:0:0: [sdi] Write Protect is off
[    3.860908] sd 12:0:0:0: [sdi] Mode Sense: 00 3a 00 00
[    3.860920] sd 12:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    3.872747]  sdi: sdi1 sdi2 sdi3
[    3.873014] sd 12:0:0:0: [sdi] Attached SCSI disk
[    4.144402] scsi 14:0:0:0: Direct-Access     ChipsBnk   Multi-Reader   4082 PQ: 0 ANSI: 2
[    4.145294] sd 14:0:0:0: Attached scsi generic sg9 type 0
[    4.147579] sd 14:0:0:0: [sdj] Attached SCSI removable disk
[    4.179650] ata14: SATA link down (SStatus 0 SControl 300)
[    4.601802] hid-generic 0003:051D:0002.0001: hiddev0,hidraw0: USB HID v1.10 Device [American Power Conversion Back-UPS RS 500 FW:30.j5.I USB FW:j5] on usb-0000:00:12.0-3/input0
[    4.601987] input: Logitech HID compliant keyboard as /devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6.1/1-6.1:1.0/input/input2
[    4.602051] hid-generic 0003:046D:C30E.0002: input,hidraw1: USB HID v1.10 Keyboard [Logitech HID compliant keyboard] on usb-0000:00:12.2-6.1/input0
[    4.606121] input: Logitech HID compliant keyboard as /devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6.1/1-6.1:1.1/input/input3
[    4.606647] hid-generic 0003:046D:C30E.0003: input,hidraw2: USB HID v1.10 Device [Logitech HID compliant keyboard] on usb-0000:00:12.2-6.1/input1
[   12.537344] nbd: registered device at major 43
[   12.599970] md: md0 stopped.
[   12.603855] md: bind<sde3>
[   12.604029] md: bind<sda3>
[   12.604139] md: bind<sdi3>
[   12.604290] md: bind<sdc3>
[   12.604618] md: bind<sdb3>
[   12.604779] md: bind<sdd3>
[   12.604982] md: bind<sdh3>
...

This was NOT required on any of the previous kernels, so I wonder why all of
sudden with the 3.8+ kernels my system needs an extra "crutch" just to keep
starting up properly.


-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MD does not wait for drives on start-up with kernels 3.8+
  2013-04-15  8:45   ` Roman Mamedov
@ 2013-04-15 10:34     ` NeilBrown
  2013-04-15 13:11       ` Roman Mamedov
  0 siblings, 1 reply; 6+ messages in thread
From: NeilBrown @ 2013-04-15 10:34 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: CoolCold, linux-raid

[-- Attachment #1: Type: text/plain, Size: 3151 bytes --]

On Mon, 15 Apr 2013 14:45:30 +0600 Roman Mamedov <rm@romanrm.ru> wrote:

> On Sun, 31 Mar 2013 09:36:30 +0400
> CoolCold <coolthecold@gmail.com> wrote:
> 
> > In similar situation (but on 2.6.2x kernels i AFAIR, with scsi drives)
> > I've used rootdelay param -
> > http://lists.debian.org/debian-user/2012/01/msg00748.html to make md
> > assembly succeed. Hope this will help you too.
> 
> Thanks, rootdelay=10 seems to have solved the problem.
> 
> ...
> [    3.860538] scsi 12:0:0:0: Direct-Access     ATA      Hitachi HDS5C302 ML6O PQ: 0 ANSI: 5
> [    3.860716] sd 12:0:0:0: Attached scsi generic sg8 type 0
> [    3.860879] sd 12:0:0:0: [sdi] 3907029168 512-byte logical blocks: (2.00 TB/1.81 TiB)
> [    3.860906] sd 12:0:0:0: [sdi] Write Protect is off
> [    3.860908] sd 12:0:0:0: [sdi] Mode Sense: 00 3a 00 00
> [    3.860920] sd 12:0:0:0: [sdi] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
> [    3.872747]  sdi: sdi1 sdi2 sdi3
> [    3.873014] sd 12:0:0:0: [sdi] Attached SCSI disk
> [    4.144402] scsi 14:0:0:0: Direct-Access     ChipsBnk   Multi-Reader   4082 PQ: 0 ANSI: 2
> [    4.145294] sd 14:0:0:0: Attached scsi generic sg9 type 0
> [    4.147579] sd 14:0:0:0: [sdj] Attached SCSI removable disk
> [    4.179650] ata14: SATA link down (SStatus 0 SControl 300)
> [    4.601802] hid-generic 0003:051D:0002.0001: hiddev0,hidraw0: USB HID v1.10 Device [American Power Conversion Back-UPS RS 500 FW:30.j5.I USB FW:j5] on usb-0000:00:12.0-3/input0
> [    4.601987] input: Logitech HID compliant keyboard as /devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6.1/1-6.1:1.0/input/input2
> [    4.602051] hid-generic 0003:046D:C30E.0002: input,hidraw1: USB HID v1.10 Keyboard [Logitech HID compliant keyboard] on usb-0000:00:12.2-6.1/input0
> [    4.606121] input: Logitech HID compliant keyboard as /devices/pci0000:00/0000:00:12.2/usb1/1-6/1-6.1/1-6.1:1.1/input/input3
> [    4.606647] hid-generic 0003:046D:C30E.0003: input,hidraw2: USB HID v1.10 Device [Logitech HID compliant keyboard] on usb-0000:00:12.2-6.1/input1
> [   12.537344] nbd: registered device at major 43
> [   12.599970] md: md0 stopped.
> [   12.603855] md: bind<sde3>
> [   12.604029] md: bind<sda3>
> [   12.604139] md: bind<sdi3>
> [   12.604290] md: bind<sdc3>
> [   12.604618] md: bind<sdb3>
> [   12.604779] md: bind<sdd3>
> [   12.604982] md: bind<sdh3>
> ...
> 
> This was NOT required on any of the previous kernels, so I wonder why all of
> sudden with the 3.8+ kernels my system needs an extra "crutch" just to keep
> starting up properly.
> 
> 

This is almost certainly not directly related to the kernel.  It seems clear
that some change in the kernel has resulted in the difference, but it is
probably indirect and really a bug elsewhere.

What distro are you using?  Is systemd in use?

systemd runs lots of things in parallel which is theoretically good, but
tends to expose races.

Possibly inserting "udevadm settle" somewhere before "mdadm" runs - or maybe
even "udevadm trigger ; udevadm settle" would do it.
This would need to be in the initrd of course.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MD does not wait for drives on start-up with kernels 3.8+
  2013-04-15 10:34     ` NeilBrown
@ 2013-04-15 13:11       ` Roman Mamedov
  2013-04-24  7:12         ` NeilBrown
  0 siblings, 1 reply; 6+ messages in thread
From: Roman Mamedov @ 2013-04-15 13:11 UTC (permalink / raw)
  To: NeilBrown; +Cc: CoolCold, linux-raid

[-- Attachment #1: Type: text/plain, Size: 585 bytes --]

On Mon, 15 Apr 2013 20:34:49 +1000
NeilBrown <neilb@suse.de> wrote:

> > This was NOT required on any of the previous kernels, so I wonder why all of
> > sudden with the 3.8+ kernels my system needs an extra "crutch" just to keep
> > starting up properly.
> 
> This is almost certainly not directly related to the kernel.  It seems clear
> that some change in the kernel has resulted in the difference, but it is
> probably indirect and really a bug elsewhere.
> 
> What distro are you using?  Is systemd in use?

Debian Testing, no systemd.

-- 
With respect,
Roman

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MD does not wait for drives on start-up with kernels 3.8+
  2013-04-15 13:11       ` Roman Mamedov
@ 2013-04-24  7:12         ` NeilBrown
  0 siblings, 0 replies; 6+ messages in thread
From: NeilBrown @ 2013-04-24  7:12 UTC (permalink / raw)
  To: Roman Mamedov; +Cc: CoolCold, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1146 bytes --]

On Mon, 15 Apr 2013 19:11:57 +0600 Roman Mamedov <rm@romanrm.ru> wrote:

> On Mon, 15 Apr 2013 20:34:49 +1000
> NeilBrown <neilb@suse.de> wrote:
> 
> > > This was NOT required on any of the previous kernels, so I wonder why all of
> > > sudden with the 3.8+ kernels my system needs an extra "crutch" just to keep
> > > starting up properly.
> > 
> > This is almost certainly not directly related to the kernel.  It seems clear
> > that some change in the kernel has resulted in the difference, but it is
> > probably indirect and really a bug elsewhere.
> > 
> > What distro are you using?  Is systemd in use?
> 
> Debian Testing, no systemd.
> 

Can you try something for me?

Edit   /usr/share/initramfs-tools/scripts/local-top/mdadm

and just before the line:

  verbose && log_begin_msg "Assembling all MD arrays"

insert

  /sbin/udevadm settle

Then try booting without the 'rootdelay=10'.

If it doesn't (and the odds are at least even) then we have an awkward
situation.  The kernel is discovering devices asynchronously but it isn't
clear that there is any way to wait for it to "finish".

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-04-24  7:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-30 13:05 MD does not wait for drives on start-up with kernels 3.8+ Roman Mamedov
2013-03-31  5:36 ` CoolCold
2013-04-15  8:45   ` Roman Mamedov
2013-04-15 10:34     ` NeilBrown
2013-04-15 13:11       ` Roman Mamedov
2013-04-24  7:12         ` NeilBrown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox