Some md/mdadm bugs

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Some md/mdadm bugs
@ 2012-02-02 19:08 Asdo
  2012-02-02 21:17 ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2012-02-02 19:08 UTC (permalink / raw)
  To: linux-raid

Hello list

I removed sda from the system and I confirmed /dev/sda did not exist any 
more.
After some time an I/O was issued to the array and sda6 was failed by MD 
in /dev/md5:

md5 : active raid1 sdb6[2] sda6[0](F)
       10485688 blocks super 1.0 [2/1] [_U]
       bitmap: 1/160 pages [4KB], 32KB chunk

At this point I tried:

mdadm /dev/md5 --remove detached
--> no effect !
mdadm /dev/md5 --remove failed
--> no effect !
mdadm /dev/md5 --remove /dev/sda6
--> mdadm: cannot find /dev/sda6: No such file or directory  (!!!)
mdadm /dev/md5 --remove sda6
--> finally worked ! (I don't know how I had the idea to actually try 
this...)

Then here is another array:

md1 : active raid1 sda2[0] sdb2[2]
       10485688 blocks super 1.0 [2/2] [UU]
       bitmap: 0/1 pages [0KB], 65536KB chunk

This one did not even realize that sda was removed from the system long ago.
Apparently only when an I/O is issued, mdadm realizes the drive is not 
there anymore.
I am wondering (and this would be very serious) what happens if a new 
drives is inserted and it takes the /dev/sda identifier!? Would MD start 
writing or do any operation THERE!?

There is another problem...
I tried to make MD realize that the drive is detached:

mdadm /dev/md1 --fail detached
--> no effect !
however:
ls /dev/sda2
--> ls: cannot access /dev/sda2: No such file or directory
so "detached" also seems broken...

And here goes also a feature request:

if a device is detached from the system, (echo 1 > device/delete or 
removing via hardware hot-swap + AHCI) MD should detect this situation 
and mark the device (and all its partitions) as failed in all arrays, or 
even remove the device completely from the RAID.
In my case I have verified that MD did not realize the device was 
removed from the system, and only much later when an I/O was issued to 
the disk, it would mark the device as failed in the RAID.

After the above is implemented, it could be an idea to actually allow a 
new disk to take the place of a failed disk automatically if that would 
be a "re-add" (probably the same failed disk is being reinserted by the 
operator) and this even if the array is running, and especially if there 
is a bitmap.
Now it doesn't happen:
When I reinserted the disk, udev triggered the --incremental, to 
reinsert the device, but mdadm refused to do anything because the old 
slot was still occupied with a failed+detached device. I manually 
removed the device from the raid then I ran --incremental, but mdadm 
still refused to re-add the device to the RAID because the array was 
running. I think that if it is a re-add, and especially if the bitmap is 
active, I can't think of a situation in which the user would *not* want 
to do an incremental re-add even if the array is running.

Thank you
Asdo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-02 19:08 Some md/mdadm bugs Asdo
@ 2012-02-02 21:17 ` NeilBrown
  2012-02-02 22:58   ` Asdo
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2012-02-02 21:17 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4492 bytes --]

On Thu, 02 Feb 2012 20:08:53 +0100 Asdo <asdo@shiftmail.org> wrote:

> Hello list
> 
> I removed sda from the system and I confirmed /dev/sda did not exist any 
> more.
> After some time an I/O was issued to the array and sda6 was failed by MD 
> in /dev/md5:
> 
> md5 : active raid1 sdb6[2] sda6[0](F)
>        10485688 blocks super 1.0 [2/1] [_U]
>        bitmap: 1/160 pages [4KB], 32KB chunk
> 
> At this point I tried:
> 
> mdadm /dev/md5 --remove detached
> --> no effect !
> mdadm /dev/md5 --remove failed
> --> no effect !

What version of mdadm? (mdadm --version).
These stopped working at one stage and were fixed in 3.1.5.


> mdadm /dev/md5 --remove /dev/sda6
> --> mdadm: cannot find /dev/sda6: No such file or directory  (!!!)
> mdadm /dev/md5 --remove sda6
> --> finally worked ! (I don't know how I had the idea to actually try 
> this...)

Well done.

> 
> 
> Then here is another array:
> 
> md1 : active raid1 sda2[0] sdb2[2]
>        10485688 blocks super 1.0 [2/2] [UU]
>        bitmap: 0/1 pages [0KB], 65536KB chunk
> 
> This one did not even realize that sda was removed from the system long ago.

Nobody told it.

> Apparently only when an I/O is issued, mdadm realizes the drive is not 
> there anymore.

Only when there is IO, or someone tells it.

> I am wondering (and this would be very serious) what happens if a new 
> drives is inserted and it takes the /dev/sda identifier!? Would MD start 
> writing or do any operation THERE!?

Wouldn't happen.  As long as md hold onto the shell of the old sda nothing
else will get the name 'sda'.

> 
> There is another problem...
> I tried to make MD realize that the drive is detached:
> 
> mdadm /dev/md1 --fail detached
> --> no effect !
> however:
> ls /dev/sda2
> --> ls: cannot access /dev/sda2: No such file or directory
> so "detached" also seems broken...

Before 3.1.5 it was.  If you are using a newer mdadm I'll need to look into
it.

> 
> 
> 
> And here goes also a feature request:
> 
> if a device is detached from the system, (echo 1 > device/delete or 
> removing via hardware hot-swap + AHCI) MD should detect this situation 
> and mark the device (and all its partitions) as failed in all arrays, or 
> even remove the device completely from the RAID.

This needs to be done via a udev rule.
That is why --remove understands names like "sda6" (no /dev).

Then a device is removed, udev processes the remove notification.
The rule

ACTION=="remove", RUN+="/sbin/mdadm -If $name"

in /etc/udev/rules.d/something.rules

will make that happen.

> In my case I have verified that MD did not realize the device was 
> removed from the system, and only much later when an I/O was issued to 
> the disk, it would mark the device as failed in the RAID.
> 
> After the above is implemented, it could be an idea to actually allow a 
> new disk to take the place of a failed disk automatically if that would 
> be a "re-add" (probably the same failed disk is being reinserted by the 
> operator) and this even if the array is running, and especially if there 
> is a bitmap.

It should so that, providing you have a udev rule like:
ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"

You can even get it to add other devices as spares with e.g.
  policy action=force-spare

though you almost certainly don't want that general a policy.  You would
want to restrict that to certain ports (device paths).


> Now it doesn't happen:
> When I reinserted the disk, udev triggered the --incremental, to 
> reinsert the device, but mdadm refused to do anything because the old 
> slot was still occupied with a failed+detached device. I manually 
> removed the device from the raid then I ran --incremental, but mdadm 
> still refused to re-add the device to the RAID because the array was 
> running. I think that if it is a re-add, and especially if the bitmap is 
> active, I can't think of a situation in which the user would *not* want 
> to do an incremental re-add even if the array is running.

Hmmm.. that doesn't seem right.  What version of mdadm are you running?
Maybe a newer one would get this right.

Thanks for the reports.

NeilBrown


> 
> Thank you
> Asdo
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-02 21:17 ` NeilBrown
@ 2012-02-02 22:58   ` Asdo
  2012-02-06 16:59     ` Joel
  2012-02-06 17:07     ` Asdo
  0 siblings, 2 replies; 13+ messages in thread
From: Asdo @ 2012-02-02 22:58 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hello Neil
thanks for the reply

version is:
mdadm - v3.1.4 - 31st August 2010
so it's indeed before 3.1.5
That's what is in Ubuntu latest stable 11.10, they are lagging behind

I'll break the quotes to add a few comments --->

On 02/02/12 22:17, NeilBrown wrote:
> .....
>> I am wondering (and this would be very serious) what happens if a new
>> drives is inserted and it takes the /dev/sda identifier!? Would MD start
>> writing or do any operation THERE!?
> Wouldn't happen.  As long as md hold onto the shell of the old sda nothing
> else will get the name 'sda'.

Great!
indeed this was what I *suspected* based on the fact newly added drives 
got higher identifiers. It's good to hear it from a safe source though.

>> And here goes also a feature request:
>>
>> if a device is detached from the system, (echo 1>  device/delete or
>> removing via hardware hot-swap + AHCI) MD should detect this situation
>> and mark the device (and all its partitions) as failed in all arrays, or
>> even remove the device completely from the RAID.
> This needs to be done via a udev rule.
> That is why --remove understands names like "sda6" (no /dev).
>
> Then a device is removed, udev processes the remove notification.
> The rule
>
> ACTION=="remove", RUN+="/sbin/mdadm -If $name"
>
> in /etc/udev/rules.d/something.rules
>
> will make that happen.

Oh great!

Will use that.

--incremental --fail ! I would never have thought of combining those.

>
>> In my case I have verified that MD did not realize the device was
>> removed from the system, and only much later when an I/O was issued to
>> the disk, it would mark the device as failed in the RAID.
>>
>> After the above is implemented, it could be an idea to actually allow a
>> new disk to take the place of a failed disk automatically if that would
>> be a "re-add" (probably the same failed disk is being reinserted by the
>> operator) and this even if the array is running, and especially if there
>> is a bitmap.
> It should so that, providing you have a udev rule like:
> ACTION=="add", RUN+="/sbin/mdadm -I $tempnode"

I think I have this rule.
But it doesn't work even via commandline if the array is running as I 
wrote below --->

> You can even get it to add other devices as spares with e.g.
>    policy action=force-spare
>
> though you almost certainly don't want that general a policy.  You would
> want to restrict that to certain ports (device paths).

sure, I understand

>> Now it doesn't happen:
>> When I reinserted the disk, udev triggered the --incremental, to
>> reinsert the device, but mdadm refused to do anything because the old
>> slot was still occupied with a failed+detached device. I manually
>> removed the device from the raid then I ran --incremental, but mdadm
>> still refused to re-add the device to the RAID because the array was
>> running. I think that if it is a re-add, and especially if the bitmap is
>> active, I can't think of a situation in which the user would *not* want
>> to do an incremental re-add even if the array is running.
> Hmmm.. that doesn't seem right.  What version of mdadm are you running?

3.1.4

> Maybe a newer one would get this right.
I need to try...
I think I need that.

> Thanks for the reports.
thank you for your reply.

Asdo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-02 22:58   ` Asdo
@ 2012-02-06 16:59     ` Joel
  2012-02-06 18:47       ` Asdo
  2012-02-06 17:07     ` Asdo
  1 sibling, 1 reply; 13+ messages in thread
From: Joel @ 2012-02-06 16:59 UTC (permalink / raw)
  To: linux-raid

Asdo <asdo <at> shiftmail.org> writes:

> Neil say:
> > ACTION=="remove", RUN+="/sbin/mdadm -If $name"
> >
> > in /etc/udev/rules.d/something.rules
> >
> > will make that happen.
> 
> Oh great!
> 
> Will use that.
> 
> --incremental --fail ! I would never have thought of combining those.

I don't think the -If is --incremental --fail.  It is --incremental --force. 
Doesn't incremental automagically add a device if it is new and remove a device
if it is old?

Joel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 16:59     ` Joel
@ 2012-02-06 18:47       ` Asdo
  2012-02-06 18:50         ` Joel
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2012-02-06 18:47 UTC (permalink / raw)
  To: Joel; +Cc: linux-raid

On 02/06/12 17:59, Joel wrote:
> Asdo<asdo<at>  shiftmail.org>  writes:
>
>> Neil say:
>>> ACTION=="remove", RUN+="/sbin/mdadm -If $name"
>>>
>>> in /etc/udev/rules.d/something.rules
>>>
>>> will make that happen.
>> Oh great!
>>
>> Will use that.
>>
>> --incremental --fail ! I would never have thought of combining those.
> I don't think the -If is --incremental --fail.  It is --incremental --force.
> Doesn't incremental automagically add a device if it is new and remove a device
> if it is old?
>

No, it is really --incremental --fail :
it behaves like --incremental --fail, while --incremental --force is an 
illegal combination for mdadm (I just tried)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 18:47       ` Asdo
@ 2012-02-06 18:50         ` Joel
  0 siblings, 0 replies; 13+ messages in thread
From: Joel @ 2012-02-06 18:50 UTC (permalink / raw)
  To: linux-raid

Asdo <asdo <at> shiftmail.org> writes:


> No, it is really --incremental --fail :
> it behaves like --incremental --fail, while --incremental --force is an 
> illegal combination for mdadm (I just tried)

Shoot!  You are absolutely right.  My manpage reading skills must need serious
refreshment!


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-02 22:58   ` Asdo
  2012-02-06 16:59     ` Joel
@ 2012-02-06 17:07     ` Asdo
  2012-02-06 18:47       ` Asdo
  2012-02-06 22:20       ` NeilBrown
  1 sibling, 2 replies; 13+ messages in thread
From: Asdo @ 2012-02-06 17:07 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 02/02/12 23:58, Asdo wrote:
>
>>> Now it doesn't happen:
>>> When I reinserted the disk, udev triggered the --incremental, to
>>> reinsert the device, but mdadm refused to do anything because the old
>>> slot was still occupied with a failed+detached device. I manually
>>> removed the device from the raid then I ran --incremental, but mdadm
>>> still refused to re-add the device to the RAID because the array was
>>> running. I think that if it is a re-add, and especially if the 
>>> bitmap is
>>> active, I can't think of a situation in which the user would *not* want
>>> to do an incremental re-add even if the array is running.
>> Hmmm.. that doesn't seem right.  What version of mdadm are you running?
>
> 3.1.4
>
>> Maybe a newer one would get this right.
> I need to try...
> I think I need that.

Hi Neil,

Still some problems on mdadm 3.2.2 (from Ubuntu Precise) apparently:

Problem #1:

# mdadm -If /dev/sda4
mdadm: incremental removal requires a kernel device name, not a file: 
/dev/sda4

however this works:

# mdadm -If sda4
mdadm: set sda4 faulty in md3
mdadm: hot removed sda4 from md3

Is this by design? Would your udev rule
ACTION=="remove", RUN+="/sbin/mdadm -If $name"
trigger the first or the second kind of invocation?

Problem #2:

by reinserting sda, it became sdax, and the array is still running like 
this:

md3 : active raid1 sdb4[2]
       10485688 blocks super 1.0 [2/1] [_U]
       bitmap: 0/160 pages [0KB], 32KB chunk

please note the bitmap is active

so now I'm trying auto hot-add:

# mdadm  -I /dev/sdax4
mdadm: not adding /dev/sdax4 to active array (without --run) /dev/md3

still the old problem I mentioned with 3.1.4.
Trying more ways: (even with the "--run" which is suggested)

# mdadm --run -I /dev/sdax4
mdadm: -I would set mdadm mode to "incremental", but it is already set 
to "misc".

# mdadm -I --run /dev/sdax4
mdadm: failed to add /dev/sdax4 to /dev/md3: Invalid argument.

# mdadm -I --run sdax4
mdadm: stat failed for sdax4: No such file or directory.

# mdadm -I sdax4
mdadm: stat failed for sdax4: No such file or directory.

This feature not working is a problem because if one extracts one disk 
by mistake, and then reinserts it, even with bitmaps active, he needs to 
do a lot of manual work to re-add it to the arrays (potentially even 
error-prone, if he mistakes the partition numbers)...

Thank you
A.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 17:07     ` Asdo
@ 2012-02-06 18:47       ` Asdo
  2012-02-06 22:31         ` NeilBrown
  2012-02-06 22:20       ` NeilBrown
  1 sibling, 1 reply; 13+ messages in thread
From: Asdo @ 2012-02-06 18:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

One or two more bug(s) in 3.2.2
(note: my latest mail I am replying to is still valid)

AUTO line in mdadm.conf does not appear to work any longer in 3.2.2 
compared to mdadm 3.1.4
Now this line

"AUTO -all"

still autoassembles every array.
There are many arrays not declared in my mdadm.conf, and which are not 
for this host (hostname is different)
but mdadm still autoassembles everything, e.g.:

# mdadm -I /dev/sdr8
mdadm: /dev/sdr8 attached to /dev/md/perftest:r0d24, not enough to start 
(1).

(note: "perftest" is even not the hostname)

I have just regressed to mdadm 3.1.4 to confirm that it worked back 
then, and yes, I confirm that 3.1.4 was not doing any action upon:
# mdadm -I /dev/sdr8
--> nothing done
when the line in config was:
"AUTO -all"
or even
"AUTO +homehost -all"
which is the line I am normally using.


This is a problem in our fairly large system with 80+ HDDs and many 
partitions which I am testing now which is full of every kind of arrays....
I am normally using : "AUTO +homehost -all"  to prevent assembling a 
bagzillion of arrays at boot, also because doing that gives race 
conditions at boot and drops me to initramfs shell (see below next bug).





Another problem with 3.2.2:

At boot, this is from a serial dump:

udevd[218]: symlink '../../sdx13' 
'/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
udevd[189]: symlink '../../sdb1' 
'/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists

And sdb1 is not correctly inserted into array /dev/md0 which hence 
starts degraded and so I am dropped into an initramfs shell.
This looks like a race condition... I don't know if this is fault of 
udev, udev rules or mdadm...
This is with mdadm 3.2.2 and kernel 3.0.13 (called 3.0.0-15-server by 
Ubuntu) on Ubuntu oneiric 11.10
Having also the above bug of nonworking AUTO line, this problem happens 
a lot with 80+ disks and lots of partitions. If the auto line worked, I 
would have postponed most of the assembly's at a very late stage in the 
boot process, maybe after a significant "sleep".


Actually this race condition could be an ubuntu udev script bug :

Here are the ubuntu udev rules files I could find, related to mdadm or 
containing "by-partlabel":
------------------------------------------------
65-mdadm-blkid.rules:

# This file causes Linux RAID (md) block devices to be checked for further
# filesystems if the array is active. See udev(8) for syntax.
#
# Based on Suse's udev rule file for md

SUBSYSTEM!="block", GOTO="mdadm_end"
KERNEL!="md[0-9]*", GOTO="mdadm_end"
ACTION!="add|change", GOTO="mdadm_end"

# container devices have a metadata version of e.g. 'external:ddf' and
# never leave state 'inactive'
ATTR{md/metadata_version}=="external:[A-Za-z]*", 
ATTR{md/array_state}=="inactive", GOTO="md_ignore_state"
ENV{DEVTYPE}=="partition", GOTO="md_ignore_state"
TEST!="md/array_state", GOTO="mdadm_end"
ATTR{md/array_state}=="|clear|inactive", GOTO="mdadm_end"
LABEL="md_ignore_state"

# Obtain array information
IMPORT{program}="/sbin/mdadm --detail --export $tempnode"
ENV{DEVTYPE}=="disk", ENV{MD_NAME}=="?*", 
SYMLINK+="disk/by-id/md-name-$env{MD_NAME}", 
OPTIONS+="string_escape=replace"
ENV{DEVTYPE}=="disk", ENV{MD_UUID}=="?*", 
SYMLINK+="disk/by-id/md-uuid-$env{MD_UUID}"
ENV{DEVTYPE}=="disk", ENV{MD_DEVNAME}=="?*", SYMLINK+="md/$env{MD_DEVNAME}"
ENV{DEVTYPE}=="partition", ENV{MD_NAME}=="?*", 
SYMLINK+="disk/by-id/md-name-$env{MD_NAME}-part%n", 
OPTIONS+="string_escape=replace"
ENV{DEVTYPE}=="partition", ENV{MD_UUID}=="?*", 
SYMLINK+="disk/by-id/md-uuid-$env{MD_UUID}-part%n"
ENV{DEVTYPE}=="partition", ENV{MD_DEVNAME}=="*[^0-9]", 
SYMLINK+="md/$env{MD_DEVNAME}%n"
ENV{DEVTYPE}=="partition", ENV{MD_DEVNAME}=="*[0-9]", 
SYMLINK+="md/$env{MD_DEVNAME}p%n"

# by-uuid and by-label symlinks
IMPORT{program}="/sbin/blkid -o udev -p $tempnode"
OPTIONS+="link_priority=100"
OPTIONS+="watch"
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", \
                        SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", \
                        SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

LABEL="mdadm_end"


------------------------------------------------
85-mdadm.rules:

# This file causes block devices with Linux RAID (mdadm) signatures to
# automatically cause mdadm to be run.
# See udev(8) for syntax

SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
         RUN+="/sbin/mdadm --incremental $env{DEVNAME}"


------------------------------------------------
part of 60-persistent-storage.rules:

# do not edit this file, it will be overwritten on update

# persistent storage links: /dev/disk/{by-id,by-uuid,by-label,by-path}
# scheme based on "Linux persistent device names", 2004, Hannes Reinecke 
<hare@suse.de>

# forward scsi device event to corresponding block device
ACTION=="change", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", 
TEST=="block", ATTR{block/*/uevent}="change"

ACTION=="remove", GOTO="persistent_storage_end"

# enable in-kernel media-presence polling
ACTION=="add", SUBSYSTEM=="module", KERNEL=="block", 
ATTR{parameters/events_dfl_poll_msecs}=="0", 
ATTR{parameters/events_dfl_poll_msecs}="2000"

SUBSYSTEM!="block", GOTO="persistent_storage_end"

# skip rules for inappropriate block devices
KERNEL=="fd*|mtd*|nbd*|gnbd*|btibm*|dm-*|md*", GOTO="persistent_storage_end"

# ignore partitions that span the entire disk
TEST=="whole_disk", GOTO="persistent_storage_end"

# for partitions import parent information
ENV{DEVTYPE}=="partition", IMPORT{parent}="ID_*"

# virtio-blk
KERNEL=="vd*[!0-9]", ATTRS{serial}=="?*", 
ENV{ID_SERIAL}="$attr{serial}", SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}"
KERNEL=="vd*[0-9]", ATTRS{serial}=="?*", ENV{ID_SERIAL}="$attr{serial}", 
SYMLINK+="disk/by-id/virtio-$env{ID_SERIAL}-part%n"

# ATA devices with their own "ata" kernel subsystem
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="ata", 
IMPORT{program}="ata_id --export $tempnode"
# ATA devices using the "scsi" subsystem
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", 
ATTRS{vendor}=="ATA", IMPORT{program}="ata_id --export $tempnode"
# ATA/ATAPI devices (SPC-3 or later) using the "scsi" subsystem
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="scsi", 
ATTRS{type}=="5", ATTRS{scsi_level}=="[6-9]*", IMPORT{program}="ata_id 
--export $tempnode"

# Run ata_id on non-removable USB Mass Storage (SATA/PATA disks in 
enclosures)
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", ATTR{removable}=="0", 
SUBSYSTEMS=="usb", IMPORT{program}="ata_id --export $tempnode"
# Otherwise fall back to using usb_id for USB devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", SUBSYSTEMS=="usb", 
IMPORT{program}="usb_id --export %p"

# scsi devices
KERNEL=="sd*[!0-9]|sr*", ENV{ID_SERIAL}!="?*", IMPORT{program}="scsi_id 
--export --whitelisted -d $tempnode", ENV{ID_BUS}="scsi"
KERNEL=="cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}!="?*", 
IMPORT{program}="scsi_id --export --whitelisted -d $tempnode", 
ENV{ID_BUS}="cciss"
KERNEL=="sd*|sr*|cciss*", ENV{DEVTYPE}=="disk", ENV{ID_SERIAL}=="?*", 
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}"
KERNEL=="sd*|cciss*", ENV{DEVTYPE}=="partition", ENV{ID_SERIAL}=="?*", 
SYMLINK+="disk/by-id/$env{ID_BUS}-$env{ID_SERIAL}-part%n"

# firewire
KERNEL=="sd*[!0-9]|sr*", ATTRS{ieee1394_id}=="?*", 
SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}"
KERNEL=="sd*[0-9]", ATTRS{ieee1394_id}=="?*", 
SYMLINK+="disk/by-id/ieee1394-$attr{ieee1394_id}-part%n"

# scsi compat links for ATA devices
KERNEL=="sd*[!0-9]", ENV{ID_BUS}=="ata", PROGRAM="scsi_id --whitelisted 
--replace-whitespace -p0x80 -d$tempnode", RESULT=="?*", 
ENV{ID_SCSI_COMPAT}="$result", 
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}"
KERNEL=="sd*[0-9]", ENV{ID_SCSI_COMPAT}=="?*", 
SYMLINK+="disk/by-id/scsi-$env{ID_SCSI_COMPAT}-part%n"

KERNEL=="mmcblk[0-9]", SUBSYSTEMS=="mmc", ATTRS{name}=="?*", 
ATTRS{serial}=="?*", ENV{ID_NAME}="$attr{name}", 
ENV{ID_SERIAL}="$attr{serial}", 
SYMLINK+="disk/by-id/mmc-$env{ID_NAME}_$env{ID_SERIAL}"
KERNEL=="mmcblk[0-9]p[0-9]", ENV{ID_NAME}=="?*", ENV{ID_SERIAL}=="?*", 
SYMLINK+="disk/by-id/mmc-$env{ID_NAME}_$env{ID_SERIAL}-part%n"
KERNEL=="mspblk[0-9]", SUBSYSTEMS=="memstick", ATTRS{name}=="?*", 
ATTRS{serial}=="?*", ENV{ID_NAME}="$attr{name}", 
ENV{ID_SERIAL}="$attr{serial}", 
SYMLINK+="disk/by-id/memstick-$env{ID_NAME}_$env{ID_SERIAL}"
KERNEL=="mspblk[0-9]p[0-9]", ENV{ID_NAME}=="?*", ENV{ID_SERIAL}=="?*", 
SYMLINK+="disk/by-id/memstick-$env{ID_NAME}_$env{ID_SERIAL}-part%n"

# by-path (parent device path)
ENV{DEVTYPE}=="disk", ENV{ID_PATH}=="", DEVPATH!="*/virtual/*", 
IMPORT{program}="path_id %p"
ENV{DEVTYPE}=="disk", ENV{ID_PATH}=="?*", 
SYMLINK+="disk/by-path/$env{ID_PATH}"
ENV{DEVTYPE}=="partition", ENV{ID_PATH}=="?*", 
SYMLINK+="disk/by-path/$env{ID_PATH}-part%n"

# skip unpartitioned removable media devices from drivers which do not 
send "change" events
ENV{DEVTYPE}=="disk", KERNEL!="sd*|sr*", ATTR{removable}=="1", 
GOTO="persistent_storage_end"

# probe filesystem metadata of optical drives which have a media inserted
KERNEL=="sr*", ENV{DISK_EJECT_REQUEST}!="?*", 
ENV{ID_CDROM_MEDIA_TRACK_COUNT_DATA}=="?*", 
ENV{ID_CDROM_MEDIA_SESSION_LAST_OFFSET}=="?*", 
IMPORT{program}="/sbin/blkid -o udev -p -u noraid -O 
$env{ID_CDROM_MEDIA_SESSION_LAST_OFFSET} $tempnode"
# single-session CDs do not have ID_CDROM_MEDIA_SESSION_LAST_OFFSET
KERNEL=="sr*", ENV{DISK_EJECT_REQUEST}!="?*", 
ENV{ID_CDROM_MEDIA_TRACK_COUNT_DATA}=="?*", 
ENV{ID_CDROM_MEDIA_SESSION_LAST_OFFSET}=="", 
IMPORT{program}="/sbin/blkid -o udev -p -u noraid $tempnode"

# probe filesystem metadata of disks
KERNEL!="sr*", IMPORT{program}="/sbin/blkid -o udev -p $tempnode"

# watch metadata changes by tools closing the device after writing
KERNEL!="sr*", OPTIONS+="watch"

# by-label/by-uuid links (filesystem metadata)
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", 
SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
ENV{ID_FS_USAGE}=="filesystem|other", ENV{ID_FS_LABEL_ENC}=="?*", 
SYMLINK+="disk/by-label/$env{ID_FS_LABEL_ENC}"

# by-id (World Wide Name)
ENV{DEVTYPE}=="disk", ENV{ID_WWN_WITH_EXTENSION}=="?*", 
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}"
ENV{DEVTYPE}=="partition", ENV{ID_WWN_WITH_EXTENSION}=="?*", 
SYMLINK+="disk/by-id/wwn-$env{ID_WWN_WITH_EXTENSION}-part%n"

# by-partlabel/by-partuuid links (partition metadata)
ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_UUID}=="?*", 
SYMLINK+="disk/by-partuuid/$env{ID_PART_ENTRY_UUID}"
ENV{ID_PART_ENTRY_SCHEME}=="gpt", ENV{ID_PART_ENTRY_NAME}=="?*", 
SYMLINK+="disk/by-partlabel/$env{ID_PART_ENTRY_NAME}"

LABEL="persistent_storage_end"
------------------------------------------------


Do you think this is an ubuntu udev rule bug or a mdadm bug?

Thank you
A.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 18:47       ` Asdo
@ 2012-02-06 22:31         ` NeilBrown
  2012-02-07 17:13           ` Asdo
  0 siblings, 1 reply; 13+ messages in thread
From: NeilBrown @ 2012-02-06 22:31 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2989 bytes --]

On Mon, 06 Feb 2012 19:47:38 +0100 Asdo <asdo@shiftmail.org> wrote:

> One or two more bug(s) in 3.2.2
> (note: my latest mail I am replying to is still valid)
> 
> AUTO line in mdadm.conf does not appear to work any longer in 3.2.2 
> compared to mdadm 3.1.4
> Now this line
> 
> "AUTO -all"
> 
> still autoassembles every array.
> There are many arrays not declared in my mdadm.conf, and which are not 
> for this host (hostname is different)
> but mdadm still autoassembles everything, e.g.:
> 
> # mdadm -I /dev/sdr8
> mdadm: /dev/sdr8 attached to /dev/md/perftest:r0d24, not enough to start 
> (1).
> 
> (note: "perftest" is even not the hostname)

Odd.. it works for me:

# cat /etc/mdadm.conf 
AUTO -all
# mdadm -Iv /dev/sda
mdadm: /dev/sda has metadata type 1.x for which auto-assembly is disabled
# mdadm -V
mdadm - v3.2.2 - 17th June 2011
# 

Can you show the complete output of the same commands (with sdr8 in place of sda of course :-)


> 
> I have just regressed to mdadm 3.1.4 to confirm that it worked back 
> then, and yes, I confirm that 3.1.4 was not doing any action upon:
> # mdadm -I /dev/sdr8
> --> nothing done
> when the line in config was:
> "AUTO -all"
> or even
> "AUTO +homehost -all"
> which is the line I am normally using.
> 
> 
> This is a problem in our fairly large system with 80+ HDDs and many 
> partitions which I am testing now which is full of every kind of arrays....
> I am normally using : "AUTO +homehost -all"  to prevent assembling a 
> bagzillion of arrays at boot, also because doing that gives race 
> conditions at boot and drops me to initramfs shell (see below next bug).
> 
> 
> 
> 
> 
> Another problem with 3.2.2:
> 
> At boot, this is from a serial dump:
> 
> udevd[218]: symlink '../../sdx13' 
> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
> udevd[189]: symlink '../../sdb1' 
> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
> 
> And sdb1 is not correctly inserted into array /dev/md0 which hence 
> starts degraded and so I am dropped into an initramfs shell.
> This looks like a race condition... I don't know if this is fault of 
> udev, udev rules or mdadm...
> This is with mdadm 3.2.2 and kernel 3.0.13 (called 3.0.0-15-server by 
> Ubuntu) on Ubuntu oneiric 11.10
> Having also the above bug of nonworking AUTO line, this problem happens 
> a lot with 80+ disks and lots of partitions. If the auto line worked, I 
> would have postponed most of the assembly's at a very late stage in the 
> boot process, maybe after a significant "sleep".
> 
> 
> Actually this race condition could be an ubuntu udev script bug :
> 
> Here are the ubuntu udev rules files I could find, related to mdadm or 
> containing "by-partlabel":

It does look like a udev thing more than an mdadm thing.

What do
   /dev/blkid -o udev -p /dev/sdb1
and
   /dev/blkid -o udev -p /dev/sdx12

report?

NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 22:31         ` NeilBrown
@ 2012-02-07 17:13           ` Asdo
  2012-02-09  0:55             ` NeilBrown
  0 siblings, 1 reply; 13+ messages in thread
From: Asdo @ 2012-02-07 17:13 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 02/06/12 23:31, NeilBrown wrote:
> On Mon, 06 Feb 2012 19:47:38 +0100 Asdo<asdo@shiftmail.org>  wrote:
>
>> One or two more bug(s) in 3.2.2
>> (note: my latest mail I am replying to is still valid)
>>
>> AUTO line in mdadm.conf does not appear to work any longer in 3.2.2
>> compared to mdadm 3.1.4
>> Now this line
>>
>> "AUTO -all"
>>
>> still autoassembles every array.
>> There are many arrays not declared in my mdadm.conf, and which are not
>> for this host (hostname is different)
>> but mdadm still autoassembles everything, e.g.:
>>
>> # mdadm -I /dev/sdr8
>> mdadm: /dev/sdr8 attached to /dev/md/perftest:r0d24, not enough to start
>> (1).
>>
>> (note: "perftest" is even not the hostname)
> Odd.. it works for me:
>
> # cat /etc/mdadm.conf
> AUTO -all
> # mdadm -Iv /dev/sda
> mdadm: /dev/sda has metadata type 1.x for which auto-assembly is disabled
> # mdadm -V
> mdadm - v3.2.2 - 17th June 2011
> #
>
> Can you show the complete output of the same commands (with sdr8 in place of sda of course :-)

I confirm the bug exists in 3.2.2
I compiled from source 3.2.2 from your git to make sure

("git checkout mdadm-3.2.2"  and then "make")

# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# ./mdadm --version
mdadm - v3.2.2 - 17th June 2011
# cat /etc/mdadm/mdadm.conf
AUTO -all


however the good news is that the bug is gone in 3.2.3 (still from your git)

# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011
# cat /etc/mdadm/mdadm.conf
AUTO -all






However in 3.2.3 there is another bug, or else I don't understand how 
AUTO works anymore:

# hostname perftest
# hostname
perftest
# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +homehost -all
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


??
Admittedly perftest is not the original hostname for this machine but it 
shouldn't matter (does it go reading /etc/hostname directly?)...
Same result is if I make the mdadm.conf file like this

HOMEHOST perftest
AUTO +homehost -all


Else, If I create the file like this:

# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +1.x homehost -all
# hostname
perftest
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/sr50d12p1n1, not enough to start (1).
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


Now it works, BUT it works *too much*, look:

# hostname foo
# hostname
foo
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# cat /etc/mdadm/mdadm.conf
HOMEHOST <system>
AUTO +1.x homehost -all
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011


Same behaviour is if I make the mdadm.conf file with an explicit 
HOMEHOST name:
# hostname
foo
# cat /etc/mdadm/mdadm.conf
HOMEHOST foo
AUTO +1.x homehost -all
# ./mdadm -Iv /dev/sdat1
mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
to start (1).
# ./mdadm --version
mdadm - v3.2.3 - 23rd December 2011



It does not seem correct behaviour to me.

If it is, could you explain how I should create the mdadm.conf file in 
order for mdadm to autoassemble *all* arrays for this host (matching 
`hostname` == array-hostname in 1.x) and never autoassemble arrays with 
different hostname?

Note I'm *not* using 0.90 metadata anywhere, so no special case is 
needed for that metadata version


I'm not sure if 3.1.4 had the "correct" behaviour... Yesterday it seemed 
to me it had, but today I can't seem to make it work anymore like I 
intended.





>
>> I have just regressed to mdadm 3.1.4 to confirm that it worked back
>> then, and yes, I confirm that 3.1.4 was not doing any action upon:
>> # mdadm -I /dev/sdr8
>> -->  nothing done
>> when the line in config was:
>> "AUTO -all"
>> or even
>> "AUTO +homehost -all"
>> which is the line I am normally using.
>>
>>
>> This is a problem in our fairly large system with 80+ HDDs and many
>> partitions which I am testing now which is full of every kind of arrays....
>> I am normally using : "AUTO +homehost -all"  to prevent assembling a
>> bagzillion of arrays at boot, also because doing that gives race
>> conditions at boot and drops me to initramfs shell (see below next bug).
>>
>>
>>
>>
>>
>> Another problem with 3.2.2:
>>
>> At boot, this is from a serial dump:
>>
>> udevd[218]: symlink '../../sdx13'
>> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
>> udevd[189]: symlink '../../sdb1'
>> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
>>
>> And sdb1 is not correctly inserted into array /dev/md0 which hence
>> starts degraded and so I am dropped into an initramfs shell.
>> This looks like a race condition... I don't know if this is fault of
>> udev, udev rules or mdadm...
>> This is with mdadm 3.2.2 and kernel 3.0.13 (called 3.0.0-15-server by
>> Ubuntu) on Ubuntu oneiric 11.10
>> Having also the above bug of nonworking AUTO line, this problem happens
>> a lot with 80+ disks and lots of partitions. If the auto line worked, I
>> would have postponed most of the assembly's at a very late stage in the
>> boot process, maybe after a significant "sleep".
>>
>>
>> Actually this race condition could be an ubuntu udev script bug :
>>
>> Here are the ubuntu udev rules files I could find, related to mdadm or
>> containing "by-partlabel":
> It does look like a udev thing more than an mdadm thing.
>
> What do
>     /dev/blkid -o udev -p /dev/sdb1
> and
>     /dev/blkid -o udev -p /dev/sdx12
>
> report?

Unfortunately I rebooted in the meanwhile.
Now sdb1 is assembled.

I am pretty sure sdb1 is really the same device of the old boot so here 
it goes:


# blkid -o udev -p /dev/sdb1
ID_FS_UUID=d6557fd5-0233-0ca1-8882-200cec91b3a3
ID_FS_UUID_ENC=d6557fd5-0233-0ca1-8882-200cec91b3a3
ID_FS_UUID_SUB=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
ID_FS_UUID_SUB_ENC=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
ID_FS_LABEL=hardstorage1:grubarr
ID_FS_LABEL_ENC=hardstorage1:grubarr
ID_FS_VERSION=1.0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid
ID_PART_ENTRY_SCHEME=gpt
ID_PART_ENTRY_NAME=Linux\x20RAID
ID_PART_ENTRY_UUID=31c747e8-826f-48a3-ace0-c8063d489810
ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
ID_PART_ENTRY_NUMBER=1


regarding sdx13 (I suppose sdx12 was a typo) I don't guarantee it's the 
same device as in the previous boot, because it's in the SAS-expanders 
path...
However it will be something similar anyway

# blkid -o udev -p /dev/sdx13
ID_FS_UUID=527dd3b2-decf-4278-cb92-e47bcea21a39
ID_FS_UUID_ENC=527dd3b2-decf-4278-cb92-e47bcea21a39
ID_FS_UUID_SUB=c1751a32-0ef6-ff30-04ad-16322edfe9b1
ID_FS_UUID_SUB_ENC=c1751a32-0ef6-ff30-04ad-16322edfe9b1
ID_FS_LABEL=perftest:sr50d12p7n6
ID_FS_LABEL_ENC=perftest:sr50d12p7n6
ID_FS_VERSION=1.0
ID_FS_TYPE=linux_raid_member
ID_FS_USAGE=raid
ID_PART_ENTRY_SCHEME=gpt
ID_PART_ENTRY_NAME=Linux\x20RAID
ID_PART_ENTRY_UUID=7a355609-793e-442f-b668-4168d2474f89
ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
ID_PART_ENTRY_NUMBER=13


Ok now I understand that I have hundreds of partitions, all with the same
ID_PART_ENTRY_NAME=Linux\x20RAID
and I am actually surprised to see only 2 clashes reported in the serial 
console dump.
I confirm that once the system boots, only the last identically-named 
symlink survives (obviously)
---------
# ll /dev/disk/by-partlabel/
total 0
drwxr-xr-x 2 root root  60 Feb  7 16:54 ./
drwxr-xr-x 8 root root 160 Feb  7 10:59 ../
lrwxrwxrwx 1 root root  12 Feb  7 16:54 Linux\x20RAID -> ../../sdas16
---------
But strangely there were only 2 clashes reported by udev

It it also interesting that sdb1 was the only partition which failed to 
assemble among the 8 basic raid1 arrays I have at boot (which I know 
really well and I checked at last boot and confirmed all other 15 
partitions sd[ab][12345678] were present and correctly assembled in 
couples making /dev/md[01234567]) only sdb1 was missing, the same 
partition that reported the clash... that's a bit too much for a 
coincidence.

What do you think?

Thank you
A.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-07 17:13           ` Asdo
@ 2012-02-09  0:55             ` NeilBrown
  0 siblings, 0 replies; 13+ messages in thread
From: NeilBrown @ 2012-02-09  0:55 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 10050 bytes --]

On Tue, 07 Feb 2012 18:13:05 +0100 Asdo <asdo@shiftmail.org> wrote:

> On 02/06/12 23:31, NeilBrown wrote:
> > On Mon, 06 Feb 2012 19:47:38 +0100 Asdo<asdo@shiftmail.org>  wrote:
> >
> >> One or two more bug(s) in 3.2.2
> >> (note: my latest mail I am replying to is still valid)
> >>
> >> AUTO line in mdadm.conf does not appear to work any longer in 3.2.2
> >> compared to mdadm 3.1.4
> >> Now this line
> >>
> >> "AUTO -all"
> >>
> >> still autoassembles every array.
> >> There are many arrays not declared in my mdadm.conf, and which are not
> >> for this host (hostname is different)
> >> but mdadm still autoassembles everything, e.g.:
> >>
> >> # mdadm -I /dev/sdr8
> >> mdadm: /dev/sdr8 attached to /dev/md/perftest:r0d24, not enough to start
> >> (1).
> >>
> >> (note: "perftest" is even not the hostname)
> > Odd.. it works for me:
> >
> > # cat /etc/mdadm.conf
> > AUTO -all
> > # mdadm -Iv /dev/sda
> > mdadm: /dev/sda has metadata type 1.x for which auto-assembly is disabled
> > # mdadm -V
> > mdadm - v3.2.2 - 17th June 2011
> > #
> >
> > Can you show the complete output of the same commands (with sdr8 in place of sda of course :-)
> 
> I confirm the bug exists in 3.2.2
> I compiled from source 3.2.2 from your git to make sure
> 
> ("git checkout mdadm-3.2.2"  and then "make")

Hmm - you are right.  I must have been testing a half-baked intermediate.

> 
> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
> to start (1).
> # ./mdadm --version
> mdadm - v3.2.2 - 17th June 2011
> # cat /etc/mdadm/mdadm.conf
> AUTO -all
> 
> 
> however the good news is that the bug is gone in 3.2.3 (still from your git)
> 
> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
> # ./mdadm --version
> mdadm - v3.2.3 - 23rd December 2011
> # cat /etc/mdadm/mdadm.conf
> AUTO -all
> 

Oh good, I must have fixed it.

> 
> 
> 
> 
> 
> However in 3.2.3 there is another bug, or else I don't understand how 
> AUTO works anymore:
> 
> # hostname perftest
> # hostname
> perftest
> # cat /etc/mdadm/mdadm.conf
> HOMEHOST <system>
> AUTO +homehost -all

This should be
  AUTO homehost -all

'homehost' is not the name of a metadata type, it is a directive like 'yes'
or 'no'.  So no '+' is wanted.

That said, there is a bug in there (fix just pushed out) but the above AUTO
line works correctly.


> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 has metadata type 1.x for which auto-assembly is disabled
> # ./mdadm --version
> mdadm - v3.2.3 - 23rd December 2011
> 
> 
> ??
> Admittedly perftest is not the original hostname for this machine but it 
> shouldn't matter (does it go reading /etc/hostname directly?)...
> Same result is if I make the mdadm.conf file like this
> 
> HOMEHOST perftest
> AUTO +homehost -all
> 
> 
> Else, If I create the file like this:
> 
> # cat /etc/mdadm/mdadm.conf
> HOMEHOST <system>
> AUTO +1.x homehost -all

You removed the '+' from the homehost which is good, but added the "+1.x"
which is not what you want - as I think you know.



> # hostname
> perftest
> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 attached to /dev/md/sr50d12p1n1, not enough to start (1).
> # ./mdadm --version
> mdadm - v3.2.3 - 23rd December 2011
> 
> 
> Now it works, BUT it works *too much*, look:
> 
> # hostname foo
> # hostname
> foo
> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
> to start (1).
> # cat /etc/mdadm/mdadm.conf
> HOMEHOST <system>
> AUTO +1.x homehost -all
> # ./mdadm --version
> mdadm - v3.2.3 - 23rd December 2011
> 
> 
> Same behaviour is if I make the mdadm.conf file with an explicit 
> HOMEHOST name:
> # hostname
> foo
> # cat /etc/mdadm/mdadm.conf
> HOMEHOST foo
> AUTO +1.x homehost -all
> # ./mdadm -Iv /dev/sdat1
> mdadm: /dev/sdat1 attached to /dev/md/perftest:sr50d12p1n1, not enough 
> to start (1).
> # ./mdadm --version
> mdadm - v3.2.3 - 23rd December 2011
> 
> 
> 
> It does not seem correct behaviour to me.
> 
> If it is, could you explain how I should create the mdadm.conf file in 
> order for mdadm to autoassemble *all* arrays for this host (matching 
> `hostname` == array-hostname in 1.x) and never autoassemble arrays with 
> different hostname?
> 
> Note I'm *not* using 0.90 metadata anywhere, so no special case is 
> needed for that metadata version
> 
> 
> I'm not sure if 3.1.4 had the "correct" behaviour... Yesterday it seemed 
> to me it had, but today I can't seem to make it work anymore like I 
> intended.
> 
> 
> 
> 
> 
> >
> >> I have just regressed to mdadm 3.1.4 to confirm that it worked back
> >> then, and yes, I confirm that 3.1.4 was not doing any action upon:
> >> # mdadm -I /dev/sdr8
> >> -->  nothing done
> >> when the line in config was:
> >> "AUTO -all"
> >> or even
> >> "AUTO +homehost -all"
> >> which is the line I am normally using.
> >>
> >>
> >> This is a problem in our fairly large system with 80+ HDDs and many
> >> partitions which I am testing now which is full of every kind of arrays....
> >> I am normally using : "AUTO +homehost -all"  to prevent assembling a
> >> bagzillion of arrays at boot, also because doing that gives race
> >> conditions at boot and drops me to initramfs shell (see below next bug).
> >>
> >>
> >>
> >>
> >>
> >> Another problem with 3.2.2:
> >>
> >> At boot, this is from a serial dump:
> >>
> >> udevd[218]: symlink '../../sdx13'
> >> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
> >> udevd[189]: symlink '../../sdb1'
> >> '/dev/disk/by-partlabel/Linux\x20RAID.udev-tmp' failed: File exists
> >>
> >> And sdb1 is not correctly inserted into array /dev/md0 which hence
> >> starts degraded and so I am dropped into an initramfs shell.
> >> This looks like a race condition... I don't know if this is fault of
> >> udev, udev rules or mdadm...
> >> This is with mdadm 3.2.2 and kernel 3.0.13 (called 3.0.0-15-server by
> >> Ubuntu) on Ubuntu oneiric 11.10
> >> Having also the above bug of nonworking AUTO line, this problem happens
> >> a lot with 80+ disks and lots of partitions. If the auto line worked, I
> >> would have postponed most of the assembly's at a very late stage in the
> >> boot process, maybe after a significant "sleep".
> >>
> >>
> >> Actually this race condition could be an ubuntu udev script bug :
> >>
> >> Here are the ubuntu udev rules files I could find, related to mdadm or
> >> containing "by-partlabel":
> > It does look like a udev thing more than an mdadm thing.
> >
> > What do
> >     /dev/blkid -o udev -p /dev/sdb1
> > and
> >     /dev/blkid -o udev -p /dev/sdx12
> >
> > report?
> 
> Unfortunately I rebooted in the meanwhile.
> Now sdb1 is assembled.
> 
> I am pretty sure sdb1 is really the same device of the old boot so here 
> it goes:
> 
> 
> # blkid -o udev -p /dev/sdb1
> ID_FS_UUID=d6557fd5-0233-0ca1-8882-200cec91b3a3
> ID_FS_UUID_ENC=d6557fd5-0233-0ca1-8882-200cec91b3a3
> ID_FS_UUID_SUB=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
> ID_FS_UUID_SUB_ENC=0ffdf74a-36f9-7a7a-9dbe-653bb37bdc8a
> ID_FS_LABEL=hardstorage1:grubarr
> ID_FS_LABEL_ENC=hardstorage1:grubarr
> ID_FS_VERSION=1.0
> ID_FS_TYPE=linux_raid_member
> ID_FS_USAGE=raid
> ID_PART_ENTRY_SCHEME=gpt
> ID_PART_ENTRY_NAME=Linux\x20RAID
> ID_PART_ENTRY_UUID=31c747e8-826f-48a3-ace0-c8063d489810
> ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
> ID_PART_ENTRY_NUMBER=1

The "ID_PART_ENTRY_SCHEME=gpt" is causing the disk/by-partuuid link to be
created and as you presumably have the same label on the other device (being
the other half of a RAID1) the udev rules files will make the same symlink in
both.
So this is definitely a bug in the udev rules files.  They should probably
ignore ID_PART_ENTRY_SCHEME if ID_FS_USAGE=="raid".


> 
> 
> regarding sdx13 (I suppose sdx12 was a typo) I don't guarantee it's the 
> same device as in the previous boot, because it's in the SAS-expanders 
> path...
> However it will be something similar anyway
> 
> # blkid -o udev -p /dev/sdx13
> ID_FS_UUID=527dd3b2-decf-4278-cb92-e47bcea21a39
> ID_FS_UUID_ENC=527dd3b2-decf-4278-cb92-e47bcea21a39
> ID_FS_UUID_SUB=c1751a32-0ef6-ff30-04ad-16322edfe9b1
> ID_FS_UUID_SUB_ENC=c1751a32-0ef6-ff30-04ad-16322edfe9b1
> ID_FS_LABEL=perftest:sr50d12p7n6
> ID_FS_LABEL_ENC=perftest:sr50d12p7n6
> ID_FS_VERSION=1.0
> ID_FS_TYPE=linux_raid_member
> ID_FS_USAGE=raid
> ID_PART_ENTRY_SCHEME=gpt
> ID_PART_ENTRY_NAME=Linux\x20RAID
> ID_PART_ENTRY_UUID=7a355609-793e-442f-b668-4168d2474f89
> ID_PART_ENTRY_TYPE=a19d880f-05fc-4d3b-a006-743f0f84911e
> ID_PART_ENTRY_NUMBER=13
> 
> 
> Ok now I understand that I have hundreds of partitions, all with the same
> ID_PART_ENTRY_NAME=Linux\x20RAID
> and I am actually surprised to see only 2 clashes reported in the serial 
> console dump.
> I confirm that once the system boots, only the last identically-named 
> symlink survives (obviously)
> ---------
> # ll /dev/disk/by-partlabel/
> total 0
> drwxr-xr-x 2 root root  60 Feb  7 16:54 ./
> drwxr-xr-x 8 root root 160 Feb  7 10:59 ../
> lrwxrwxrwx 1 root root  12 Feb  7 16:54 Linux\x20RAID -> ../../sdas16
> ---------
> But strangely there were only 2 clashes reported by udev
> 
> It it also interesting that sdb1 was the only partition which failed to 
> assemble among the 8 basic raid1 arrays I have at boot (which I know 
> really well and I checked at last boot and confirmed all other 15 
> partitions sd[ab][12345678] were present and correctly assembled in 
> couples making /dev/md[01234567]) only sdb1 was missing, the same 
> partition that reported the clash... that's a bit too much for a 
> coincidence.
> 
> What do you think?

Do the other partitions have the ID_PART_ENTRY_SCHEME=gpt setting?

NeilBrown

> 
> Thank you
> A.


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 17:07     ` Asdo
  2012-02-06 18:47       ` Asdo
@ 2012-02-06 22:20       ` NeilBrown
  2012-02-07 17:47         ` Asdo
  1 sibling, 1 reply; 13+ messages in thread
From: NeilBrown @ 2012-02-06 22:20 UTC (permalink / raw)
  To: Asdo; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 3934 bytes --]

On Mon, 06 Feb 2012 18:07:38 +0100 Asdo <asdo@shiftmail.org> wrote:

> On 02/02/12 23:58, Asdo wrote:
> >
> >>> Now it doesn't happen:
> >>> When I reinserted the disk, udev triggered the --incremental, to
> >>> reinsert the device, but mdadm refused to do anything because the old
> >>> slot was still occupied with a failed+detached device. I manually
> >>> removed the device from the raid then I ran --incremental, but mdadm
> >>> still refused to re-add the device to the RAID because the array was
> >>> running. I think that if it is a re-add, and especially if the 
> >>> bitmap is
> >>> active, I can't think of a situation in which the user would *not* want
> >>> to do an incremental re-add even if the array is running.
> >> Hmmm.. that doesn't seem right.  What version of mdadm are you running?
> >
> > 3.1.4
> >
> >> Maybe a newer one would get this right.
> > I need to try...
> > I think I need that.
> 
> Hi Neil,
> 
> Still some problems on mdadm 3.2.2 (from Ubuntu Precise) apparently:
> 
> Problem #1:
> 
> # mdadm -If /dev/sda4
> mdadm: incremental removal requires a kernel device name, not a file: 
> /dev/sda4
> 
> however this works:
> 
> # mdadm -If sda4
> mdadm: set sda4 faulty in md3
> mdadm: hot removed sda4 from md3
> 
> Is this by design?

Yes.

>                     Would your udev rule
> ACTION=="remove", RUN+="/sbin/mdadm -If $name"
> trigger the first or the second kind of invocation?

Yes.

> 
> 
> Problem #2:
> 
> by reinserting sda, it became sdax, and the array is still running like 
> this:
> 
> md3 : active raid1 sdb4[2]
>        10485688 blocks super 1.0 [2/1] [_U]
>        bitmap: 0/160 pages [0KB], 32KB chunk
> 
> please note the bitmap is active

True, but there is nothing in it (0 pages).  That implies that no bits are
set.  I guess that is possible if nothing has been written to the array since
the other device was removed.

> 
> so now I'm trying auto hot-add:
> 
> # mdadm  -I /dev/sdax4
> mdadm: not adding /dev/sdax4 to active array (without --run) /dev/md3
> 
> still the old problem I mentioned with 3.1.4.

I need to see -E and -X output on both drives to be able to see what is
happening here.  Also the content of /etc/mdadm.conf might be relevant.
If you could supply that info I might be able to explain what is happening.



> Trying more ways: (even with the "--run" which is suggested)
> 
> # mdadm --run -I /dev/sdax4
> mdadm: -I would set mdadm mode to "incremental", but it is already set 
> to "misc".
> 
> # mdadm -I --run /dev/sdax4
> mdadm: failed to add /dev/sdax4 to /dev/md3: Invalid argument.
> 

Hmm... I'm able to reproduce something like this.

Following patch seems to fix it, but I need to check the code more
thoroughly to be sure.  Note that this will *not* fix the "not adding ... not
active array" problem.

NeilBrown


diff --git a/Incremental.c b/Incremental.c
index 60175af..2be0d05 100644
--- a/Incremental.c
+++ b/Incremental.c
@@ -415,19 +415,19 @@ int Incremental(char *devname, int verbose, int runstop,
 				goto out_unlock;
 			}
 		}
-		info2.disk.major = major(stb.st_rdev);
-		info2.disk.minor = minor(stb.st_rdev);
+		info.disk.major = major(stb.st_rdev);
+		info.disk.minor = minor(stb.st_rdev);
 		/* add disk needs to know about containers */
 		if (st->ss->external)
 			sra->array.level = LEVEL_CONTAINER;
-		err = add_disk(mdfd, st, sra, &info2);
+		err = add_disk(mdfd, st, sra, &info);
 		if (err < 0 && errno == EBUSY) {
 			/* could be another device present with the same
 			 * disk.number. Find and reject any such
 			 */
 			find_reject(mdfd, st, sra, info.disk.number,
 				    info.events, verbose, chosen_name);
-			err = add_disk(mdfd, st, sra, &info2);
+			err = add_disk(mdfd, st, sra, &info);
 		}
 		if (err < 0) {
 			fprintf(stderr, Name ": failed to add %s to %s: %s.\n",

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: Some md/mdadm bugs
  2012-02-06 22:20       ` NeilBrown
@ 2012-02-07 17:47         ` Asdo
  0 siblings, 0 replies; 13+ messages in thread
From: Asdo @ 2012-02-07 17:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On 02/06/12 23:20, NeilBrown wrote:
>>
>> Problem #2:
>>
>> by reinserting sda, it became sdax, and the array is still running like
>> this:
>>
>> md3 : active raid1 sdb4[2]
>>         10485688 blocks super 1.0 [2/1] [_U]
>>         bitmap: 0/160 pages [0KB], 32KB chunk
>>
>> please note the bitmap is active
> True, but there is nothing in it (0 pages).  That implies that no bits are
> set.  I guess that is possible if nothing has been written to the array since
> the other device was removed.

Almost certain: the array is not really in use (no lvm, not mounted) 
even if running


>> so now I'm trying auto hot-add:
>>
>> # mdadm  -I /dev/sdax4
>> mdadm: not adding /dev/sdax4 to active array (without --run) /dev/md3
>>
>> still the old problem I mentioned with 3.1.4.
> I need to see -E and -X output on both drives to be able to see what is
> happening here.  Also the content of /etc/mdadm.conf might be relevant.
> If you could supply that info I might be able to explain what is happening.


Please note the names changed since yesterday, because of hot-swap tests 
and reboots:
now it's sda4 and sdb4


md3 : active raid1 sdb4[2]
       10485688 blocks super 1.0 [2/1] [_U]
       bitmap: 0/160 pages [0KB], 32KB chunk


# ./mdadm -E /dev/sda4
/dev/sda4:
           Magic : a92b4efc
         Version : 1.0
     Feature Map : 0x1
      Array UUID : 8da28111:cdb69fa9:8d607b48:78fb102d
            Name : hardstorage1:sys2boot
   Creation Time : Mon Mar 21 16:13:46 2011
      Raid Level : raid1
    Raid Devices : 2

  Avail Dev Size : 20971376 (10.00 GiB 10.74 GB)
      Array Size : 20971376 (10.00 GiB 10.74 GB)
    Super Offset : 20971504 sectors
           State : clean
     Device UUID : c470ba58:897d9cb5:4054c89a:d41608d3

Internal Bitmap : -81 sectors from superblock
     Update Time : Tue Feb  7 17:25:16 2012
        Checksum : a4deb673 - correct
          Events : 106


    Device Role : Active device 0
    Array State : AA ('A' == active, '.' == missing)

# ./mdadm -X /dev/sda4
         Filename : /dev/sda4
            Magic : 6d746962
          Version : 4
             UUID : 8da28111:cdb69fa9:8d607b48:78fb102d
           Events : 106
   Events Cleared : 61
            State : OK
        Chunksize : 32 KB
           Daemon : 5s flush period
       Write Mode : Normal
        Sync Size : 10485688 (10.00 GiB 10.74 GB)
           Bitmap : 327678 bits (chunks), 0 dirty (0.0%)


# ./mdadm -E /dev/sdb4
/dev/sdb4:
           Magic : a92b4efc
         Version : 1.0
     Feature Map : 0x1
      Array UUID : 8da28111:cdb69fa9:8d607b48:78fb102d
            Name : hardstorage1:sys2boot
   Creation Time : Mon Mar 21 16:13:46 2011
      Raid Level : raid1
    Raid Devices : 2

  Avail Dev Size : 20971376 (10.00 GiB 10.74 GB)
      Array Size : 20971376 (10.00 GiB 10.74 GB)
    Super Offset : 20971504 sectors
           State : clean
     Device UUID : 0c978768:dccaa84d:4cbe07ee:501f863e

Internal Bitmap : -81 sectors from superblock
     Update Time : Tue Feb  7 17:29:06 2012
        Checksum : b769d7e - correct
          Events : 108


    Device Role : Active device 1
    Array State : .A ('A' == active, '.' == missing)

# ./mdadm -X /dev/sdb4
         Filename : /dev/sdb4
            Magic : 6d746962
          Version : 4
             UUID : 8da28111:cdb69fa9:8d607b48:78fb102d
           Events : 108
   Events Cleared : 61
            State : OK
        Chunksize : 32 KB
           Daemon : 5s flush period
       Write Mode : Normal
        Sync Size : 10485688 (10.00 GiB 10.74 GB)
           Bitmap : 327678 bits (chunks), 0 dirty (0.0%)




# cat /etc/mdadm/mdadm.conf
AUTO +1.x

(I made it simple :-D )


>> Trying more ways: (even with the "--run" which is suggested)
>>
>> # mdadm --run -I /dev/sdax4
>> mdadm: -I would set mdadm mode to "incremental", but it is already set
>> to "misc".
>>
>> # mdadm -I --run /dev/sdax4
>> mdadm: failed to add /dev/sdax4 to /dev/md3: Invalid argument.
>>
> Hmm... I'm able to reproduce something like this.
>
> Following patch seems to fix it, but I need to check the code more
> thoroughly to be sure.

Congrats, it really seems to fix it at least for 3.2.3:

before (with 3.2.3 from your git):

# ./mdadm -I /dev/sda4
mdadm: not adding /dev/sda4 to active array (without --run) /dev/md3

# ./mdadm -I --run /dev/sda4
mdadm: failed to add /dev/sda4 to /dev/md3: Invalid argument.


3.2.3 + your patch:

# ./mdadm -I /dev/sda4
mdadm: not adding /dev/sda4 to active array (without --run) /dev/md3

# ./mdadm -I --run /dev/sda4
mdadm: /dev/sda4 attached to /dev/md3 which is already active.



>   Note that this will *not* fix the "not adding ... not
> active array" problem.

it's not a: "not adding ... to not active array..."
but instead it's a: "not adding ... to *active* array..."

However, yes, I think the behaviour without --run should be different 
than it is now

Thanks for your help
A.

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2012-02-09  0:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-02-02 19:08 Some md/mdadm bugs Asdo
2012-02-02 21:17 ` NeilBrown
2012-02-02 22:58   ` Asdo
2012-02-06 16:59     ` Joel
2012-02-06 18:47       ` Asdo
2012-02-06 18:50         ` Joel
2012-02-06 17:07     ` Asdo
2012-02-06 18:47       ` Asdo
2012-02-06 22:31         ` NeilBrown
2012-02-07 17:13           ` Asdo
2012-02-09  0:55             ` NeilBrown
2012-02-06 22:20       ` NeilBrown
2012-02-07 17:47         ` Asdo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).