From: Bill Davidsen <davidsen@tmr.com>
To: Neil Brown <neilb@suse.de>
Cc: "Daniel L. Miller" <dmiller@amfes.com>, linux-raid@vger.kernel.org
Subject: Re: Raid-10 mount at startup always has problem
Date: Thu, 25 Oct 2007 10:46:56 -0400 [thread overview]
Message-ID: <4720AC60.2040506@tmr.com> (raw)
In-Reply-To: <18208.13247.106651.142652@notabene.brown>
Neil Brown wrote:
> On Wednesday October 24, dmiller@amfes.com wrote:
>
>> Current mdadm.conf:
>> DEVICE partitions
>> ARRAY /dev/.static/dev/md0 level=raid10 num-devices=4
>> UUID=9d94b17b:f5fac31a:577c252b:0d4c4b2a auto=part
>>
>> still have the problem where on boot one drive is not part of the
>> array. Is there a log file I can check to find out WHY a drive is not
>> being added? It's been a while since the reboot, but I did find some
>> entries in dmesg - I'm appending both the md lines and the physical disk
>> related lines. The bottom shows one disk not being added (this time is
>> was sda) - and the disk that gets skipped on each boot seems to be
>> random - there's no consistent failure:
>>
>
> Odd.... but interesting.
> Does it sometimes fail to start the array altogether?
>
>
>> md: md0 stopped.
>> md: md0 stopped.
>> md: bind<sdc>
>> md: bind<sdd>
>> md: bind<sdb>
>> md: md0: raid array is not clean -- starting background reconstruction
>> raid10: raid set md0 active with 3 out of 4 devices
>> md: couldn't update array info. -22
>>
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This is the most surprising line, and hence the one most likely to
> convey helpful information.
>
> This message is generated when a process calls "SET_ARRAY_INFO" on an
> array that is already running, and the changes implied by the new
> "array_info" are not supportable.
>
> The only way I can see this happening is if two copies of "mdadm" are
> running at exactly the same time and are both are trying to assemble
> the same array. The first calls SET_ARRAY_INFO and assembles the
> (partial) array. The second calls SET_ARRAY_INFO and gets this error.
> Not all devices are included because while when one mdadm when to
> look, at a device, the other has it locked and so the first just
> ignored it.
>
> I just tried that, and sometimes it worked, but sometimes it assembled
> with 3 out of 4 devices. I didn't get the "couldn't update array info"
> message, but that doesn't prove I'm wrong.
>
> I cannot imagine how that might be happening (two at once) unless
> maybe 'udev' had been configured to do something as soon as devices
> were discovered.... seems unlikely.
>
> It might be worth finding out where mdadm is being run in the init
> scripts and add a "-v" flag, and redirecting stdout/stderr to some log
> file.
> e.g.
> mdadm -As -v > /var/log/mdadm-$$ 2>&1
>
> And see if that leaves something useful in the log file.
>
> BTW, I don't think your problem has anything to do with the fact that
> you are using whole partitions.
>
You don't think the "unknown partition table" on sdd is related? Because
I read that as a sure indication that the system isn't considering the
drive as one without a partition table, and therefore isn't looking for
the superblock on the whole device. And as Doug pointed out, once you
decide that there is a partition table lots of things might try to use it.
> While it is debatable whether that is a good idea or not (I like the
> idea, but Doug doesn't and I respect his opinion) I doubt it would
> contribute to the current problem.
>
>
> Your description makes me nearly certain that there is some sort of
> race going on (that is the easiest way to explain randomly differing
> behaviours). The race is probably between different code 'locking'
> (opening with O_EXCL) the various devices. Give the above error
> message, two different 'mdadm's seems most likely, but an mdadm and a
> mount-by-label scan could probably do it too.
>
--
bill davidsen <davidsen@tmr.com>
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979
next prev parent reply other threads:[~2007-10-25 14:46 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-27 18:14 Raid-10 mount at startup always has problem Daniel L. Miller
[not found] ` <46D49F1A.7030409@tmr.com>
2007-09-10 1:53 ` Daniel L. Miller
2007-09-10 2:04 ` Richard Scobie
[not found] ` <46E4A5F0.9090407@sauce.co.nz>
2007-09-10 2:11 ` Daniel L. Miller
2007-10-24 14:22 ` Daniel L. Miller
2007-10-24 16:25 ` Doug Ledford
2007-10-24 20:01 ` Bill Davidsen
2007-10-25 5:43 ` Daniel L. Miller
2007-10-25 6:40 ` Doug Ledford
2007-10-26 9:15 ` Luca Berra
2007-10-26 16:53 ` Gabor Gombas
2007-10-27 7:57 ` Luca Berra
2007-10-26 19:26 ` Doug Ledford
2007-10-27 7:50 ` Luca Berra
2007-10-27 15:07 ` Gabor Gombas
2007-10-27 20:47 ` Doug Ledford
2007-10-28 13:37 ` Luca Berra
2007-10-28 17:55 ` Doug Ledford
2007-10-29 0:21 ` Bill Davidsen
2007-10-29 7:41 ` Luca Berra
2007-10-29 13:22 ` Bill Davidsen
2007-10-29 15:21 ` Doug Ledford
2007-10-29 15:54 ` Gabor Gombas
2007-10-29 14:31 ` Doug Ledford
2007-10-29 5:59 ` Daniel L. Miller
2007-10-29 8:18 ` Luca Berra
2007-10-29 15:47 ` Doug Ledford
2007-10-29 21:29 ` Luca Berra
2007-10-29 23:15 ` Doug Ledford
2007-10-30 0:03 ` Daniel L. Miller
2007-11-01 13:56 ` Bill Davidsen
2007-12-17 14:58 ` Daniel L. Miller
2007-10-29 17:08 ` Doug Ledford
2007-10-29 18:56 ` Richard Scobie
2007-10-25 6:12 ` Neil Brown
2007-10-25 6:51 ` Doug Ledford
2007-10-25 13:13 ` Daniel L. Miller
2007-10-25 13:33 ` Daniel L. Miller
2007-10-26 6:12 ` Neil Brown
2007-10-25 14:46 ` Bill Davidsen [this message]
2007-10-25 16:13 ` Daniel L. Miller
2007-10-26 5:59 ` Neil Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4720AC60.2040506@tmr.com \
--to=davidsen@tmr.com \
--cc=dmiller@amfes.com \
--cc=linux-raid@vger.kernel.org \
--cc=neilb@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).