All of lore.kernel.org
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: "Daniel L. Miller" <dmiller@amfes.com>, linux-raid@vger.kernel.org
Subject: Re: Raid-10 mount at startup always has problem
Date: Thu, 25 Oct 2007 10:46:56 -0400	[thread overview]
Message-ID: <4720AC60.2040506@tmr.com> (raw)
In-Reply-To: <18208.13247.106651.142652@notabene.brown>

Neil Brown wrote:
> On Wednesday October 24, dmiller@amfes.com wrote:
>   
>> Current mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>>
>> still have the problem where on boot one drive is not part of the 
>> array.  Is there a log file I can check to find out WHY a drive is not 
>> being added?  It's been a while since the reboot, but I did find some 
>> entries in dmesg - I'm appending both the md lines and the physical disk 
>> related lines.  The bottom shows one disk not being added (this time is 
>> was sda) - and the disk that gets skipped on each boot seems to be 
>> random - there's no consistent failure:
>>     
>
> Odd.... but interesting.
> Does it sometimes fail to start the array altogether?
>
>   
>> md: md0 stopped.
>> md: md0 stopped.
>> md: bind<sdc>
>> md: bind<sdd>
>> md: bind<sdb>
>> md: md0: raid array is not clean -- starting background reconstruction
>> raid10: raid set md0 active with 3 out of 4 devices
>> md: couldn't update array info. -22
>>     
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is the most surprising line, and hence the one most likely to
> convey helpful information.
>
> This message is generated when a process calls "SET_ARRAY_INFO" on an
> array that is already running, and the changes implied by the new
> "array_info" are not supportable.
>
> The only way I can see this happening is if two copies of "mdadm" are
> running at exactly the same time and are both are trying to assemble
> the same array.  The first calls SET_ARRAY_INFO and assembles the
> (partial) array.  The second calls SET_ARRAY_INFO and gets this error.
> Not all devices are included because while when one mdadm when to
> look, at a device, the other has it locked and so the first just
> ignored it.
>
> I just tried that, and sometimes it worked, but sometimes it assembled
> with 3 out of 4 devices.  I didn't get the "couldn't update array info"
> message, but that doesn't prove I'm wrong.
>
> I cannot imagine how that might be happening (two at once) unless
> maybe 'udev' had been configured to do something as soon as devices
> were discovered.... seems unlikely.
>
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
>    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
> BTW, I don't think your problem has anything to do with the fact that
> you are using whole partitions.
>   

You don't think the "unknown partition table" on sdd is related? Because 
I read that as a sure indication that the system isn't considering the 
drive as one without a partition table, and therefore isn't looking for 
the superblock on the whole device. And as Doug pointed out, once you 
decide that there is a partition table lots of things might try to use it.
> While it is debatable whether that is a good idea or not (I like the
> idea, but Doug doesn't and I respect his opinion) I doubt it would
> contribute to the current problem.
>
>
> Your description makes me nearly certain that there is some sort of
> race going on (that is the easiest way to explain randomly differing
> behaviours).   The race is probably between different code 'locking'
> (opening with O_EXCL) the various devices.  Give the above error
> message, two different 'mdadm's seems most likely, but an mdadm and a
> mount-by-label scan could probably do it too.
>   
-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  parent reply	other threads:[~2007-10-25 14:46 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-27 18:14 Raid-10 mount at startup always has problem Daniel L. Miller
     [not found] ` <46D49F1A.7030409@tmr.com>
2007-09-10  1:53   ` Daniel L. Miller
2007-09-10  2:04     ` Richard Scobie
     [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
2007-09-10  2:11       ` Daniel L. Miller
2007-10-24 14:22         ` Daniel L. Miller
2007-10-24 16:25           ` Doug Ledford
2007-10-24 20:01           ` Bill Davidsen
2007-10-25  5:43             ` Daniel L. Miller
2007-10-25  6:40               ` Doug Ledford
2007-10-26  9:15                 ` Luca Berra
2007-10-26 16:53                   ` Gabor Gombas
2007-10-27  7:57                     ` Luca Berra
2007-10-26 19:26                   ` Doug Ledford
2007-10-27  7:50                     ` Luca Berra
2007-10-27 15:07                       ` Gabor Gombas
2007-10-27 20:47                       ` Doug Ledford
2007-10-28 13:37                         ` Luca Berra
2007-10-28 17:55                           ` Doug Ledford
2007-10-29  0:21                     ` Bill Davidsen
2007-10-29  7:41                       ` Luca Berra
2007-10-29 13:22                         ` Bill Davidsen
2007-10-29 15:21                           ` Doug Ledford
2007-10-29 15:54                         ` Gabor Gombas
2007-10-29 14:31                       ` Doug Ledford
2007-10-29  5:59                 ` Daniel L. Miller
2007-10-29  8:18                   ` Luca Berra
2007-10-29 15:47                     ` Doug Ledford
2007-10-29 21:29                       ` Luca Berra
2007-10-29 23:15                         ` Doug Ledford
2007-10-30  0:03                           ` Daniel L. Miller
2007-11-01 13:56                             ` Bill Davidsen
2007-12-17 14:58                             ` Daniel L. Miller
2007-10-29 17:08                   ` Doug Ledford
2007-10-29 18:56                   ` Richard Scobie
2007-10-25  6:12           ` Neil Brown
2007-10-25  6:51             ` Doug Ledford
2007-10-25 13:13             ` Daniel L. Miller
2007-10-25 13:33             ` Daniel L. Miller
2007-10-26  6:12               ` Neil Brown
2007-10-25 14:46             ` Bill Davidsen [this message]
2007-10-25 16:13               ` Daniel L. Miller
2007-10-26  5:59               ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4720AC60.2040506@tmr.com \
    --to=davidsen@tmr.com \
    --cc=dmiller@amfes.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.