Re: More ddf container woes

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Albert Pauw <albert.pauw@gmail.com>
To: NeilBrown <neilb@suse.de>
Cc: linux-raid@vger.kernel.org
Subject: Re: More ddf container woes
Date: Fri, 11 Mar 2011 12:50:16 +0100	[thread overview]
Message-ID: <4D7A0C78.2080402@gmail.com> (raw)
In-Reply-To: <4D788D2D.80706@gmail.com>

  More experiments with the same setup

On 03/10/11 09:34 AM, Albert Pauw wrote:
>  Hi Neil,
>
> I found some more trouble with the ddf code, separate from the stuff I 
> mentioned before (which is still present in the version I used below).
>
> Here's what I did and found:
>
> Note: Updated mdadm from the git repository up to and including the 
> commit "Manage:  be more careful about --add attempts."
>
> Used six disks, sdb - sdg out of which I created a 5-disk container, 
> leaving one disk unused for the moment:
>
> mdadm -C /dev/md127 -l container -e ddf -n 5 /dev/sd[b-f]
>
> Created two RAID sets in this container:
>
> mdadm -C /dev/md0 -l 1 -n 2 /dev/md127
> mdadm -C /dev/md1 -l 5 -n 3 /dev/md127
>
> Note: At this moment, only one mdmon is running (mdmon md127)
>
> After finishing creating both RAID sets, I fail two disks, one in each 
> RAID set:
>
> mdadm -f /dev/md0 /dev/sdb
>
> mdadm -f /dev/md1 /dev/sdd
>
> The first failed disk (sdb) is automatically removed from /dev/md0, 
> but oddly enough the disk stays marked as
> "active/Online" in the "mdadm -E /dev/md127" output, the second failed 
> disk (sdd) gets marked [F] in the RAID 5
> array, but NOT removed, only when I do a
>
> mdmon --all
>
> the failed disk in /dev/md1 is removed, this second failed disk IS 
> marked "Failed" in the "mdadm -E output".
>
> Note: Checking on the RAID arrays using "mdadm -D" they are both 
> marked as "clean, degraded".

I now failed the disk in reverse order, first the RAID5 set (md1), then 
the RAID 1 set (md0), and the behaviour was different.

Now both disks stay marked failed [F] in the subarrays (md0 and md1). 
Doing a "mdadm -E /dev/md127" shows all disks
"active/Online", so the container isn't told of the failure of the 
disks. Only after a "mdmon --all" both failed disks are removed
from their respective array. "mdadm -E /dev/md127" now shows both disks 
as failed, so the container knows about the failed disks now.

When I don't run "mdmon --all" and want to add a spare disk it fails 
with the message "mdadm: add failed for /dev/sdg: mdmon not running".

The rest of the response stays the same. Adding a clean new disk to the 
container makes both subarrays going into recovery with this
new disk, md1 first and after finishing this, md0 gets resynched (with 
the same disk!).

When I fail two disks of the RAID 5 set (md1), so the whole subarray is 
failed, and then add a spare disk to the container, only md0 (the RAID 1 
set)
picks it up, md1 doesn't get rebuild (and that's who it should be).

>
> I now add a new empty clean disk (/dev/sdg) to the container, after 
> which md1 (the RAID 5 set) is immediately starting to rebuild.
> The RAID 1 set (md0), however, is set to "resync=DELAYED", very odd, 
> because I only added one disk.
>
> Looking at the output of /proc/mdstat i see that disk sdg (the new 
> spare) is actually added to both RAID
> arrays, and after finishing the rebuild of md1 the other RAID set 
> (md0) is also rebuild, using the SAME spare disk (sdg).
>
>
> Albert
>

To sum it up, there are two problems here:

- A failed disk in a subarray isn't automatically removed and marked 
"Failed" in the container, although in some cases it does (see above).
Only after a manual "mdmon --all" will this take place.

- When two subarrays have failed disks, are degraded, but operational 
and I add a spare disk to the container, both will pick up the spare
disk for replacement. They won't do this in parallel, but in sequence, 
but nevertheless use the same disk.

Albert

next prev parent reply	other threads:[~2011-03-11 11:50 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-02-19 11:13 mdadm ddf questions Albert Pauw
2011-02-22  7:41 ` Albert Pauw
2011-02-23  6:17   ` NeilBrown
2011-02-25 17:53     ` Albert Pauw
2011-03-02 22:31       ` NeilBrown
2011-03-10  8:34         ` More ddf container woes Albert Pauw
2011-03-11 11:50           ` Albert Pauw [this message]
2011-03-14  8:02             ` NeilBrown
2011-03-14  9:00               ` Albert Pauw
2011-03-15  4:43                 ` NeilBrown
2011-03-15 19:07                   ` Albert Pauw
2011-03-02 22:26   ` mdadm ddf questions NeilBrown
2011-03-02 22:11 ` NeilBrown
2011-03-04  7:52   ` Albert Pauw
  -- strict thread matches above, loose matches on Subject: below --
2011-03-23 19:18 More ddf container woes Albert Pauw
2011-03-23 22:08 ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D7A0C78.2080402@gmail.com \
    --to=albert.pauw@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).