linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Hans-Peter Jansen <hpj@urpla.net>
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: Persistent failures with simple md setup
Date: Mon, 4 Mar 2013 10:33:57 +1100	[thread overview]
Message-ID: <20130304103357.45b2cad2@notabene.brown> (raw)
In-Reply-To: <4291349.FrQcKOnicQ@xrated>

[-- Attachment #1: Type: text/plain, Size: 4939 bytes --]

On Sun, 03 Mar 2013 20:31:57 +0100 Hans-Peter Jansen <hpj@urpla.net> wrote:

> Am Donnerstag, 28. Februar 2013, 23:16:01 schrieb Hans-Peter Jansen:
> > Am Freitag, 1. März 2013, 08:25:20 schrieb NeilBrown:
> > > On Thu, 28 Feb 2013 11:49:53 +0100 Hans-Peter Jansen <hpj@urpla.net>
> > > wrote:
> > > 
> > > This is what I asked for:
> > > > > It really feels like a udev bug, but it is hard to be sure.
> > > > > Could you edit /etc/init.d/boot.md and add
> > > > > 
> > > > >    echo BEFORE mdadm -IRs > /dev/kmsg
> > > > > 
> > > > > just before the call the "$mdadm_BIN -IRs",
> > > > > 
> > > > >    echo AFTER mdadm -IRs > /dev/kmsg
> > > > > 
> > > > > just after that call, and
> > > > > 
> > > > >    echo AFTER mdadm -Asc > /dev/mksg
> > > > > 
> > > > > just after the call to "$mdadm_BIN -A -s -c $mdadm_CONFIG"
> > > > > 
> > > > > And maybe put similar messages just before and after the
> > > > > 
> > > > >     /sbin/udevadm settle ....
> > > > > 
> > > > > call.
> > > > > 
> > > > > If you can then reproduce the problem, the extra logging might tell us
> > > > > something useful.
> > > 
> > > This is not at all the same thing:
> > > > I've enabled initrd (linuxrc=trace) and boot.md (#!/bin/sh -x)
> > > > debugging,
> > > > hence we should see the full story next time.
> > > 
> > > It might help, but I'm far from certain that it will be nearly as useful.
> 
> I see, sh debug messages do not condense in /var/log/something, they're routed
> somewhere else. I not sure, how I like that systemd implied sillyness.
> 
> > Okay, okay, I do it both ways now..
> > 
> > Let's see, when it triggers.
> 
> I'm not sure, how the basic kernel boot contribute to boiling this issue down, 
> but here we go: (not censored this time, hope it will go though..)



Thanks.  The interesting bit is here:

> Mar  3 09:11:59 zaphkiel kernel: [   28.781703] md: bind<sdb4>
It seems that udev is running "mdadm -I" on sdb4 here

> Mar  3 09:11:59 zaphkiel kernel: [   28.799939] tveeprom 1-0050: audio processor is MSP3415 (idx 6)
> Mar  3 09:11:59 zaphkiel kernel: [   28.862699] sr 7:0:0:0: Attached scsi generic sg6 type 5
> Mar  3 09:11:59 zaphkiel kernel: [   28.894667] tveeprom 1-0050: has radio
> Mar  3 09:11:59 zaphkiel kernel: [   28.917194] bttv0: Hauppauge eeprom indicates model#44354
> Mar  3 09:11:59 zaphkiel kernel: [   28.984037] bttv0: tuner type=20
> Mar  3 09:11:59 zaphkiel kernel: [   29.003670] md: bind<sdb1>

and "mdadm -I sdb1"  here
> Mar  3 09:11:59 zaphkiel kernel: [   29.026123] input: HDA Digital PCBeep as /devices/pci0000:00/0000:00:07.0/input/input5
> Mar  3 09:11:59 zaphkiel kernel: [   29.098905] i2c-core: driver [msp3400] using legacy suspend method
> Mar  3 09:11:59 zaphkiel kernel: [   29.139227] AFTER udevadm settle --timeout=60

and now udev is "settled".  This is much less than 60 seconds after the
"BEFORE udevadm settle", so it didn't time out - it must have exhausted the
queue.

> Mar  3 09:11:59 zaphkiel kernel: [   29.211092] i2c-core: driver [msp3400] using legacy resume method
> Mar  3 09:11:59 zaphkiel kernel: [   29.276147] msp3400 1-0040: MSP3415D-B3 found @ 0x80 (bt878 #0 [sw])
> Mar  3 09:11:59 zaphkiel kernel: [   29.325203] md: bind<sdb2>

But look - it seems that "mdadm -I /dev/sdb2" was just run - presumably by
udev.

> Mar  3 09:11:59 zaphkiel kernel: [   29.391360] msp3400 1-0040: msp3400 supports nicam, BEFORE mdadm -IRs

We managed to get two lines blended together here, but it is clear that this
is just before "mdadm -IRs" runs.

> Mar  3 09:11:59 zaphkiel kernel: [   29.455183] mode is autodetect
> Mar  3 09:11:59 zaphkiel kernel: [   29.525553] md/raid1:md0: active with 1 out of 2 mirrors
> 

And unsurprisingly, md0 is assembled with only on mirror because we have seen
sdb1 but not sda1

> Mar  3 09:11:59 zaphkiel kernel: [   29.572395] i2c-core: driver [tuner] using legacy suspend method
> Mar  3 09:11:59 zaphkiel kernel: [   29.642012] i2c-core: driver [tuner] using legacy resume method
> Mar  3 09:11:59 zaphkiel kernel: [   29.688786] md: bind<sda4>

and here comes sda4.  Presumably sda1 is attempted a little later, but as md0
is already assembled, mdadm refuses to do anything with it.

So "mdadm -IRs" is being run too early, apparently because "udevadm settle" is
exiting too early.  However I have looked closely at the udev code, and run
various tests, and cannot see how it could possibly exit early.  I can
imagine it exiting a bit late in some anomalous situations, but not early.

The only way I can explain this is to suggest that something other than udev
is running "mdadm -I".  But I cannot imagine what would be doing that.

If you put a "sleep 10" before the "mdadm -IRs", I suspect your problems
would go away.  It isn't really a nice solution, but it is the only one I can
think of at the moment.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  parent reply	other threads:[~2013-03-03 23:33 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-29 22:14 Persistent failures with simple md setup Hans-Peter Jansen
2013-01-30  9:07 ` Sebastian Riemer
2013-01-30 17:12   ` Hans-Peter Jansen
2013-02-04 20:43     ` Hans-Peter Jansen
2013-02-05  3:44       ` NeilBrown
2013-02-27 17:01         ` Hans-Peter Jansen
2013-02-28  3:40           ` NeilBrown
2013-02-28 10:49             ` Hans-Peter Jansen
2013-02-28 21:25               ` NeilBrown
2013-02-28 22:16                 ` Hans-Peter Jansen
     [not found]                   ` <4291349.FrQcKOnicQ@xrated>
2013-03-03 23:33                     ` NeilBrown [this message]
2013-03-13  0:52                     ` NeilBrown
2013-03-15 22:43                       ` Hans-Peter Jansen
2013-03-18 11:20                         ` Hans-Peter Jansen
2013-03-21  3:24                           ` NeilBrown
2013-04-10 13:28                             ` Hans-Peter Jansen
2013-04-10 13:44                             ` Hans-Peter Jansen
2013-04-11  7:33                               ` NeilBrown
2013-01-30  9:20 ` Roy Sigurd Karlsbakk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130304103357.45b2cad2@notabene.brown \
    --to=neilb@suse.de \
    --cc=hpj@urpla.net \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).