All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Question about mdadm commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b "be more careful about add attempts"
Date: Mon, 31 Oct 2011 10:16:49 +1100	[thread overview]
Message-ID: <20111031101649.657a1ab3@notabene.brown> (raw)
In-Reply-To: <CAGRgLy7biNy4Tj9wXhBosjC4QNy7dXC2aoimu7ezGqdL82Axxg@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5774 bytes --]

On Thu, 27 Oct 2011 11:10:54 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:

> Hello Neil,
> it makes perfect sense not to turn a device into a spare inadvertently.
> 
> However, with mdadm 3.1.4 under gdb, I tried the following:
> - I had a raid6 with 4 drives (sda/b/c/d), their "desc_nr" in the
> kernel were respectively (according to GET_DISK_INFO): 0,1,2,3.
> - I failed the two last drives (c & d) via mdadm and removed them from the array
> - I wiped the superblock on drive d.
> - I added the drive d back to the array
> So now the array had the following setup:
> sda: disc_nr=0, raid_disk=0
> sdb: disc_nr=1, raid_disk=1
> sdd: disc_nr=4, raid_disk=2
> So sdd was added to the array into slot 2, and received disc_nr=4
> 
> - Now I asked to re-add drive sdc back to array. In gdb I followed the
> re-add flow, to the place where it fills the mdu_disk_info_t structure
> from the superblock read from sdc. It put there the following content:
> disc.major = ...
> disc.minor = ...
> disc.number = 2
> disc.raid_disk = 2 (because previously this drive was in slot 2)
> disc.state = ...
> 
> Now in gdb I changed disc.number to 4 (to match the desc_nr of sdd).
> And then issued ADD_NEW_DISK. It succeeded, and the sdc drive received
> disc_nr=2 (while it was asking for 4). Of course, it could not have
> received the same raid_disk, because this raid_disk was already
> occupied by sdd. So it was added as a spare.
> 
> But you are saying:
> > If a device already exists with the same disk.number, a re-add cannot
> > succeed, so mdadm doesn't even try.
> while in my case it succeeded (while it actually did "add" and not "re-add").

We seem to be using word differently.
If I ask mdadm to do a "re-add" and it does an "add", then I consider that to
be "failure", however you seem to consider it to be a "success".

That seems to be the source of confusion.


> 
> That's why I was thinking it makes more sense to check disc.raid_disk
> and not disc.number in this check. Since disc.number is not the
> drive's role within the array (right?), it is simply a position of the
> drive in the list of all drives.
> So if you check raid_disk, and see that this raid slot is already
> occupied, then, naturally, the drive will be converted to spare, which
> we want to avoid.

It may well be appropriate to check raid_disk as well, yes.  It isn't so easy
though which is maybe why I didn't.

With 0.90 metadata, the disc.number is the same as disc.raid_disk for active
devices.  That might be another reason for the code being the way that it is.

Thanks,
NeilBrown



> 
> And the enough_fd() check protects us from adding a spare to a failed
> array (like you mentioned to me previously).
> 
> What am I missing?
> 
> Thanks,
>   Alex.
> 
> 
> 
> 
> 
> 
> 
> On Wed, Oct 26, 2011 at 11:51 PM, NeilBrown <neilb@suse.de> wrote:
> > On Wed, 26 Oct 2011 19:02:37 +0200 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> >> Greetings everybody,
> >> I have a question about the following code in Manage.c:Manage_subdevs()
> >>
> >> disc.number = mdi.disk.number;
> >> if (ioctl(fd, GET_DISK_INFO, &disc) != 0
> >>     || disc.major != 0 || disc.minor != 0
> >>     || !enough_fd(fd))
> >>     goto skip_re_add;
> >>
> >> I do not underatand why the checks: disc.major != 0 || disc.minor != 0
> >> are required. This basically means that the kernel already has an
> >> rdev->desc_nr equal to disc.number. But why fail the re-add procedure?
> >>
> >> Let's say that enough_fd() returns true, and we go ahead an issue
> >> ioctl(ADD_NEW_DISK). In this case, according to the kernel code in
> >> add_new_disk(), it will not even look at info->number. It will
> >> initialize rdev->desc_nr to -1, and will allocate a free desc_nr for
> >> the rdev later.
> >>
> >> Doing this with mdadm 3.1.4, where this check is not present, actually
> >> succeeds. I understand that this code was added for cases when
> >> enough_fd() returns false, which sounds perfectly fine to protect
> >> from.
> >>
> >> I was thinking that this code should actually check something like:
> >> if (ioctl(fd, GET_DISK_INFO, &disc) != 0
> >>     || disk.raid_disk != mdi.disk.raid_disk
> >>     || !enough_fd(fd))
> >>     goto skip_re_add;
> >>
> >> That is to check that the slot that was being occupied by the drive we
> >> are trying to add, is already occupied by a different drive (need also
> >> to cover cases of raid_disk <0, raid_disk >= raid_disks etc...) and
> >> not the desc_nr, which does not have any persistent meaning.
> >>
> >> Perhaps there are some compatibility issues with old kernels? Or
> >> special considerations for ... containers? non-persistent arrays?
> >
> > The point of this code is to make --re-add fail unless mdadm is certain that
> > the kernel will accept the re-add, rather than turn the device into a spare.
> >
> > If a device already exists with the same disk.number, a re-add cannot
> > succeed, so mdadm doesn't even try.
> >
> > When you say in 3.1.4 it "actually succeeds" - what succeeds?  Does it re-add
> > the device to the array, or does it turn the device into a spare?
> > I particularly do not want --re-add to turn a device into a spare because
> > people sometimes use it in cases where it cannot work, their device gets
> > turned into a spare, and they lose information that could have been used to
> > reconstruct the array.
> >
> > That that make sense?
> >
> > NeilBrown
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2011-10-30 23:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-26 17:02 Question about mdadm commit d6508f0cfb60edf07b36f1532eae4d9cddf7178b "be more careful about add attempts" Alexander Lyakas
2011-10-26 21:51 ` NeilBrown
2011-10-27  9:10   ` Alexander Lyakas
2011-10-30 23:16     ` NeilBrown [this message]
2011-10-31  8:57       ` Alexander Lyakas
2011-10-31  9:19         ` NeilBrown
2011-11-01 16:26           ` Alexander Lyakas
2011-11-01 22:52             ` NeilBrown
2011-11-08 16:23               ` Alexander Lyakas
2011-11-08 23:41                 ` NeilBrown
2011-11-17 11:13                   ` Alexander Lyakas
2011-11-21  2:44                     ` NeilBrown
2011-11-22  8:45                       ` Alexander Lyakas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111031101649.657a1ab3@notabene.brown \
    --to=neilb@suse.de \
    --cc=alex.bolshoy@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.