Re: Some md/mdadm bugs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Asdo <asdo@shiftmail.org>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid <linux-raid@vger.kernel.org>
Subject: Re: Some md/mdadm bugs
Date: Tue, 07 Feb 2012 18:13:05 +0100	[thread overview]
Message-ID: <4F315BA1.9010300@shiftmail.org> (raw)
In-Reply-To: <20120207093156.290bd52b@notabene.brown>

On 02/06/12 23:31, NeilBrown wrote:
> On Mon, 06 Feb 2012 19:47:38 +0100 Asdo<asdo@shiftmail.org>  wrote:
>
>> One or two more bug(s) in 3.2.2
>> (note: my latest mail I am replying to is still valid)
>>
>> AUTO line in mdadm.conf does not appear to work any longer in 3.2.2
>> compared to mdadm 3.1.4
>> Now this line
>>
>> "AUTO -all"
>>
>> still autoassembles every array.
>> There are many arrays not declared in my mdadm.conf, and which are not
>> for this host (hostname is different)
>> but mdadm still autoassembles everything, e.g.:
>>
>> # mdadm -I /dev/sdr8
>> mdadm: /dev/sdr8 attached to /dev/md/perftest:r0d24, not enough to start
>> (1).
>>
>> (note: "perftest" is even not the hostname)
> Odd.. it works for me:
>
> # cat /etc/mdadm.conf
> AUTO -all
> # mdadm -Iv /dev/sda
> mdadm: /dev/sda has metadata type 1.x for which auto-assembly is disabled
> # mdadm -V
> mdadm - v3.2.2 - 17th June 2011
> #
>
> Can you show the complete output of the same commands (with sdr8 in place of sda of course :-)

I confirm the bug exists in 3.2.2
I compiled from source 3.2.2 from your git to make sure

("git checkout mdadm-3.2.2"  and then "make")

# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# ./mdadm --version
mdadm - v3.2.2 - 17th June 2011
# cat /etc/mdadm/mdadm.conf
AUTO -all


however the good news is that the bug is gone in 3.2.3 (still from your git)

# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011
# cat /etc/mdadm/mdadm.conf
AUTO -all






However in 3.2.3 there is another bug, or else I don't understand how 
AUTO works anymore:

# hostname perftest
# hostname
perftest
# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +homehost -all
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


??
Admittedly perftest is not the original hostname for this machine but it 
shouldn't matter (does it go reading /etc/hostname directly?)...
Same result is if I make the mdadm.conf file like this

HOMEHOST perftest
AUTO +homehost -all


Else, If I create the file like this:

# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +1.x homehost -all
# hostname
perftest
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/sr50d12p1n1, not enough to start (1).
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


Now it works, BUT it works *too much*, look:

# hostname foo
# hostname
foo
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +1.x homehost -all
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


Same behaviour is if I make the mdadm.conf file with an explicit 
HOMEHOST name:
# hostname
foo
# cat /etc/mdadm/mdadm.conf
HOMEHOST foo
AUTO +1.x homehost -all
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011



It does not seem correct behaviour to me.

If it is, could you explain how I should create the mdadm.conf file in 
order for mdadm to autoassemble *all* arrays for this host (matching 
`hostname` == array-hostname in 1.x) and never autoassemble arrays with 
different hostname?

Note I'm *not* using 0.90 metadata anywhere, so no special case is 
needed for that metadata version


I'm not sure if 3.1.4 had the "correct" behaviour... Yesterday it seemed 
to me it had, but today I can't seem to make it work anymore like I 
intended.





>
>> I have just regressed to mdadm 3.1.4 to confirm that it worked back
>> then, and yes, I confirm that 3.1.4 was not doing any action upon:
>> # mdadm -I /dev/sdr8
>> -->  nothing done
>> when the line in config was:
>> "AUTO -all"
>> or even
>> "AUTO +homehost -all"
>> which is the line I am normally using.
>>
>>
>> This is a problem in our fairly large system with 80+ HDDs and many
>> partitions which I am testing now which is full of every kind of arrays....
>> I am normally using : "AUTO +homehost -all"  to prevent assembling a
>> bagzillion of arrays at boot, also because doing that gives race
>> conditions at boot and drops me to initramfs shell (see below next bug).
>>
>>
>>
>>
>>
>> Another problem with 3.2.2:
>>
>> At boot, this is from a serial dump:
>>
>> udevd[218]: symlink '../../sdx13'
>> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
>> udevd[189]: symlink '../../sdb1'
>> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
>>
>> And sdb1 is not correctly inserted into array /dev/md0 which hence
>> starts degraded and so I am dropped into an initramfs shell.
>> This looks like a race condition... I don't know if this is fault of
>> udev, udev rules or mdadm...
>> This is with mdadm 3.2.2 and kernel 3.0.13 (called 3.0.0-15-server by
>> Ubuntu) on Ubuntu oneiric 11.10
>> Having also the above bug of nonworking AUTO line, this problem happens
>> a lot with 80+ disks and lots of partitions. If the auto line worked, I
>> would have postponed most of the assembly's at a very late stage in the
>> boot process, maybe after a significant "sleep".
>>
>>
>> Actually this race condition could be an ubuntu udev script bug :
>>
>> Here are the ubuntu udev rules files I could find, related to mdadm or
>> containing "by-partlabel":
> It does look like a udev thing more than an mdadm thing.
>
> What do
>     /dev/blkid -o udev -p /dev/sdb1
> and
>     /dev/blkid -o udev -p /dev/sdx12
>
> report?

Unfortunately I rebooted in the meanwhile.
Now sdb1 is assembled.

I am pretty sure sdb1 is really the same device of the old boot so here 
it goes:


# blkid -o udev -p /dev/sdb1
ID_FS_UUID=d6557fd5-0233-0ca1-8882-200cec91b3a3
ID_FS_UUID_ENC=d6557fd5-0233-0ca1-8882-200cec91b3a3
ID_FS_UUID_SUB=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
ID_FS_UUID_SUB_ENC=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
ID_FS_LABEL=hardstorage1:grubarr
ID_FS_LABEL_ENC=hardstorage1:grubarr
ID_FS_VERSION=1.0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid
ID_PART_ENTRY_SCHEME=gpt
ID_PART_ENTRY_NAME=Linux\x20RAID
ID_PART_ENTRY_UUID=31c747e8-826f-48a3-ace0-c8063d489810
ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
ID_PART_ENTRY_NUMBER=1


regarding sdx13 (I suppose sdx12 was a typo) I don't guarantee it's the 
same device as in the previous boot, because it's in the SAS-expanders 
path...
However it will be something similar anyway

# blkid -o udev -p /dev/sdx13
ID_FS_UUID=527dd3b2-decf-4278-cb92-e47bcea21a39
ID_FS_UUID_ENC=527dd3b2-decf-4278-cb92-e47bcea21a39
ID_FS_UUID_SUB=c1751a32-0ef6-ff30-04ad-16322edfe9b1
ID_FS_UUID_SUB_ENC=c1751a32-0ef6-ff30-04ad-16322edfe9b1
ID_FS_LABEL=perftest:sr50d12p7n6
ID_FS_LABEL_ENC=perftest:sr50d12p7n6
ID_FS_VERSION=1.0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid
ID_PART_ENTRY_SCHEME=gpt
ID_PART_ENTRY_NAME=Linux\x20RAID
ID_PART_ENTRY_UUID=7a355609-793e-442f-b668-4168d2474f89
ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
ID_PART_ENTRY_NUMBER=13


Ok now I understand that I have hundreds of partitions, all with the same
ID_PART_ENTRY_NAME=Linux\x20RAID
and I am actually surprised to see only 2 clashes reported in the serial 
console dump.
I confirm that once the system boots, only the last identically-named 
symlink survives (obviously)
---------
# ll /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root  60 Feb  7 16:54 ./
drwxr-xr-x 8 root root 160 Feb  7 10:59 ../
lrwxrwxrwx 1 root root  12 Feb  7 16:54 Linux\x20RAID -> ../../sdas16
---------
But strangely there were only 2 clashes reported by udev

It it also interesting that sdb1 was the only partition which failed to 
assemble among the 8 basic raid1 arrays I have at boot (which I know 
really well and I checked at last boot and confirmed all other 15 
partitions sd[ab][12345678] were present and correctly assembled in 
couples making /dev/md[01234567]) only sdb1 was missing, the same 
partition that reported the clash... that's a bit too much for a 
coincidence.

What do you think?

Thank you
A.

next prev parent reply	other threads:[~2012-02-07 17:13 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-02 19:08 Some md/mdadm bugs Asdo
2012-02-02 21:17 ` NeilBrown
2012-02-02 22:58   ` Asdo
2012-02-06 16:59     ` Joel
2012-02-06 18:47       ` Asdo
2012-02-06 18:50         ` Joel
2012-02-06 17:07     ` Asdo
2012-02-06 18:47       ` Asdo
2012-02-06 22:31         ` NeilBrown
2012-02-07 17:13           ` Asdo [this message]
2012-02-09  0:55             ` NeilBrown
2012-02-06 22:20       ` NeilBrown
2012-02-07 17:47         ` Asdo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F315BA1.9010300@shiftmail.org \
    --to=asdo@shiftmail.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).