linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: "Daniel L. Miller" <dmiller@amfes.com>, linux-raid@vger.kernel.org
Subject: Re: Raid-10 mount at startup always has problem
Date: Thu, 25 Oct 2007 10:46:56 -0400	[thread overview]
Message-ID: <4720AC60.2040506@tmr.com> (raw)
In-Reply-To: <18208.13247.106651.142652@notabene.brown>

Neil Brown wrote:
> On Wednesday October 24, dmiller@amfes.com wrote:
>   
>> Current mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4 
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>>
>> still have the problem where on boot one drive is not part of the 
>> array.  Is there a log file I can check to find out WHY a drive is not 
>> being added?  It's been a while since the reboot, but I did find some 
>> entries in dmesg - I'm appending both the md lines and the physical disk 
>> related lines.  The bottom shows one disk not being added (this time is 
>> was sda) - and the disk that gets skipped on each boot seems to be 
>> random - there's no consistent failure:
>>     
>
> Odd.... but interesting.
> Does it sometimes fail to start the array altogether?
>
>   
>> md: md0 stopped.
>> md: md0 stopped.
>> md: bind<sdc>
>> md: bind<sdd>
>> md: bind<sdb>
>> md: md0: raid array is not clean -- starting background reconstruction
>> raid10: raid set md0 active with 3 out of 4 devices
>> md: couldn't update array info. -22
>>     
>   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is the most surprising line, and hence the one most likely to
> convey helpful information.
>
> This message is generated when a process calls "SET_ARRAY_INFO" on an
> array that is already running, and the changes implied by the new
> "array_info" are not supportable.
>
> The only way I can see this happening is if two copies of "mdadm" are
> running at exactly the same time and are both are trying to assemble
> the same array.  The first calls SET_ARRAY_INFO and assembles the
> (partial) array.  The second calls SET_ARRAY_INFO and gets this error.
> Not all devices are included because while when one mdadm when to
> look, at a device, the other has it locked and so the first just
> ignored it.
>
> I just tried that, and sometimes it worked, but sometimes it assembled
> with 3 out of 4 devices.  I didn't get the "couldn't update array info"
> message, but that doesn't prove I'm wrong.
>
> I cannot imagine how that might be happening (two at once) unless
> maybe 'udev' had been configured to do something as soon as devices
> were discovered.... seems unlikely.
>
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
>    mdadm -As  -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
> BTW, I don't think your problem has anything to do with the fact that
> you are using whole partitions.
>   

You don't think the "unknown partition table" on sdd is related? Because 
I read that as a sure indication that the system isn't considering the 
drive as one without a partition table, and therefore isn't looking for 
the superblock on the whole device. And as Doug pointed out, once you 
decide that there is a partition table lots of things might try to use it.
> While it is debatable whether that is a good idea or not (I like the
> idea, but Doug doesn't and I respect his opinion) I doubt it would
> contribute to the current problem.
>
>
> Your description makes me nearly certain that there is some sort of
> race going on (that is the easiest way to explain randomly differing
> behaviours).   The race is probably between different code 'locking'
> (opening with O_EXCL) the various devices.  Give the above error
> message, two different 'mdadm's seems most likely, but an mdadm and a
> mount-by-label scan could probably do it too.
>   
-- 
bill davidsen <davidsen@tmr.com>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979


  parent reply	other threads:[~2007-10-25 14:46 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-08-27 18:14 Raid-10 mount at startup always has problem Daniel L. Miller
     [not found] ` <46D49F1A.7030409@tmr.com>
2007-09-10  1:53   ` Daniel L. Miller
2007-09-10  2:04     ` Richard Scobie
     [not found]     ` <46E4A5F0.9090407@sauce.co.nz>
2007-09-10  2:11       ` Daniel L. Miller
2007-10-24 14:22         ` Daniel L. Miller
2007-10-24 16:25           ` Doug Ledford
2007-10-24 20:01           ` Bill Davidsen
2007-10-25  5:43             ` Daniel L. Miller
2007-10-25  6:40               ` Doug Ledford
2007-10-26  9:15                 ` Luca Berra
2007-10-26 16:53                   ` Gabor Gombas
2007-10-27  7:57                     ` Luca Berra
2007-10-26 19:26                   ` Doug Ledford
2007-10-27  7:50                     ` Luca Berra
2007-10-27 15:07                       ` Gabor Gombas
2007-10-27 20:47                       ` Doug Ledford
2007-10-28 13:37                         ` Luca Berra
2007-10-28 17:55                           ` Doug Ledford
2007-10-29  0:21                     ` Bill Davidsen
2007-10-29  7:41                       ` Luca Berra
2007-10-29 13:22                         ` Bill Davidsen
2007-10-29 15:21                           ` Doug Ledford
2007-10-29 15:54                         ` Gabor Gombas
2007-10-29 14:31                       ` Doug Ledford
2007-10-29  5:59                 ` Daniel L. Miller
2007-10-29  8:18                   ` Luca Berra
2007-10-29 15:47                     ` Doug Ledford
2007-10-29 21:29                       ` Luca Berra
2007-10-29 23:15                         ` Doug Ledford
2007-10-30  0:03                           ` Daniel L. Miller
2007-11-01 13:56                             ` Bill Davidsen
2007-12-17 14:58                             ` Daniel L. Miller
2007-10-29 17:08                   ` Doug Ledford
2007-10-29 18:56                   ` Richard Scobie
2007-10-25  6:12           ` Neil Brown
2007-10-25  6:51             ` Doug Ledford
2007-10-25 13:13             ` Daniel L. Miller
2007-10-25 13:33             ` Daniel L. Miller
2007-10-26  6:12               ` Neil Brown
2007-10-25 14:46             ` Bill Davidsen [this message]
2007-10-25 16:13               ` Daniel L. Miller
2007-10-26  5:59               ` Neil Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4720AC60.2040506@tmr.com \
    --to=davidsen@tmr.com \
    --cc=dmiller@amfes.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).