From: NeilBrown <neilb@suse.de>
To: Alexander Lyakas <alex.bolshoy@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active'
Date: Sun, 25 Sep 2011 20:15:10 +1000 [thread overview]
Message-ID: <20110925201510.24e0f468@notabene.brown> (raw)
In-Reply-To: <CAGRgLy7fKegKw6j-36o0uchTF35F4hT8NvujW4ghb5av-onCGQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 6306 bytes --]
On Fri, 23 Sep 2011 22:24:08 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
wrote:
> Thank you, Neil, for answering.
> I'm not sure that understand all of this, because my knowledge of
> Linux user-kernel interaction is, unfortunately, not sufficient. In
> the future, I hope to know more.
> For example, I don't understand, how opening a "/dev/mdXX" can create
> a device in the kernel, if the devnode "/dev/mdXX" does not exist. In
> that case, I actually fail to open it with ENOENT.
/dev/mdXX is a "device special file". It is not the device itself.
You can think of it like a symbolic link.
The "real" name for the device is something like "block device with major 9
and minor X" That thing can exist quite independently of whether
the /dev/mdXX thing exists. Just like a file may or may not exist
independently of whether some sym-link to it exists.
When the device (block,9,XX) appears, udev is told and it should create
things in /dev. when the device disappears, udev is told and it should
remove the /dev entry. But there can be races, and other things might
sometimes add or remove /dev entries (though they shouldn't). So the
existence of something in /dev isn't a guarantee that it really exists.
>
> But what I did is actually similar to what you advised:
> - if I fail to open the devnode with ENOENT, I know (?) that the
> device does not exist
> - otherwise, I do GET_ARRAY_INFO
> - if it returns ok, then I go ahead and do GET_DISK_INFOs to get the
> disks information
> - otherwise if it returns ENODEV, I close the fd and then I read /proc/mdstat
> - if the md is there, then I know it's inactive array (and I have to
> --stop it and reassemble or do incremental assembly)
> - if the md is not there, then I know that it really does not exist
> (this is the case when md deletion happened but the devnode did not
> disappear yet)
>
> Does it sound right? It passes stress testing pretty well.
Yes, that sounds right.
>
> By the way, I understand that /proc/mdstat can be only of 4K size...so
> if I have many arrays, I should probably switch to look at
> /sys/block....
Correct.
NeilBrown
>
> Thanks,
> Alex.
>
>
>
>
>
>
> On Wed, Sep 21, 2011 at 8:03 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > On Tue, 13 Sep 2011 11:49:12 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > wrote:
> >
> > > Hello Neil,
> > > I am sorry for opening this again, but I am convinced now that I don't
> > > understand what's going on:)
> > >
> > > Basically, I see that GET_ARRAY_INFO can also return ENODEV in case
> > > the device in the kernel exists, but "we are not initialized yet":
> > > /* if we are not initialised yet, only ADD_NEW_DISK, STOP_ARRAY,
> > > * RUN_ARRAY, and GET_ and SET_BITMAP_FILE are allowed */
> > > if ((!mddev->raid_disks && !mddev->external)
> > > && cmd != ADD_NEW_DISK && cmd != STOP_ARRAY
> > > && cmd != RUN_ARRAY && cmd != SET_BITMAP_FILE
> > > && cmd != GET_BITMAP_FILE) {
> > > err = -ENODEV;
> > > goto abort_unlock;
> > >
> > > I thought that ENODEV means that the device in the kernel does not
> > > exist, although I am not this familiar with the kernel sources (yet)
> > > to verify that.
> > >
> > > Basically, I just wanted to know whether there is a reliable way to
> > > determine whether the kernel MD device exists or no. (Obviously,
> > > success to open a devnode from user space is not enough).
> > >
> > > Thanks,
> > > Alex.
> >
> > What exactly do you mean by "the kernel MD device exists" ??
> >
> > When you open a device-special-file for an md device (major == 9) it
> > automatically creates an inactive array. You can then fill in the details
> > and activate it, or explicitly deactivate it. If you do that it will
> > disappear.
> >
> > Opening the devnode is enough to check that the device exists, because it
> > creates the device and then you know that it exists.
> > If you want to know if it already exists - whether inactive or not - look
> > in /proc/mdstat or /sys/block/md*.
> > If you want to know if it already exists and is active, look in /proc/mdstat,
> > or open the device and use GET_ARRAY_INFO, or look in /sys/block/md*
> > and look at the device size. or maybe /sys/block/mdXX/md/raid_disks.
> >
> > It depends on why you are asking.
> >
> > NeilBrown
> >
> >
> >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, Aug 30, 2011 at 12:25 AM, NeilBrown <neilb@suse.de> wrote:
> > > > On Mon, 29 Aug 2011 20:17:34 +0300 Alexander Lyakas <alex.bolshoy@gmail.com>
> > > > wrote:
> > > >
> > > >> Greetings everybody,
> > > >>
> > > >> I issue
> > > >> mdadm --stop /dev/md0
> > > >> and I want to reliably determine that the MD devnode (/dev/md0) is gone.
> > > >> So I look for the udev 'remove' event for that devnode.
> > > >> However, in some cases even after I see the udev event, I issue
> > > >> mdadm --detail /dev/md0
> > > >> and I get:
> > > >> mdadm: md device /dev/md0 does not appear to be active
> > > >>
> > > >> According to Detail.c, this means that mdadm can successfully do
> > > >> open("/dev/md0") and receive a valid fd.
> > > >> But later, when issuing ioctl(fd, GET_ARRAY_INFO) it receives ENODEV
> > > >> from the kernel.
> > > >>
> > > >> Can somebody suggest an explanation for this behavior? Is there a
> > > >> reliable way to know when a MD devnode is gone?
> > > >
> > > > run "udevadm settle" after stopping /dev/md0 is most likely to work.
> > > >
> > > > I suspect that udev removes the node *after* you see the 'remove' event.
> > > > Sometimes so soon after that you don't see the lag - sometimes a bit later.
> > > >
> > > > NeilBrown
> > > >
> > > >>
> > > >> Thanks,
> > > >> Alex.
> > > >> --
> > > >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > >> the body of a message to majordomo@vger.kernel.org
> > > >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > >
> > > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 190 bytes --]
next prev parent reply other threads:[~2011-09-25 10:15 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-08-29 17:17 MD devnode still present after 'remove' udev event, and mdadm reports 'does not appear to be active' Alexander Lyakas
2011-08-29 21:25 ` NeilBrown
2011-08-30 15:18 ` Alexander Lyakas
2011-08-31 0:54 ` NeilBrown
2011-09-01 21:18 ` Alexander Lyakas
2011-09-13 8:49 ` Alexander Lyakas
2011-09-21 5:03 ` NeilBrown
2011-09-23 19:24 ` Alexander Lyakas
2011-09-25 10:15 ` NeilBrown [this message]
2011-10-11 13:11 ` Alexander Lyakas
2011-10-12 3:45 ` NeilBrown
2011-10-19 12:01 ` Alexander Lyakas
2011-10-19 23:56 ` NeilBrown
2011-10-23 9:03 ` Alexander Lyakas
2011-10-23 22:55 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110925201510.24e0f468@notabene.brown \
--to=neilb@suse.de \
--cc=alex.bolshoy@gmail.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).