All of lore.kernel.org
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.de>
To: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: Francis Moreau <francis.moro@gmail.com>,
	linux-raid <linux-raid@vger.kernel.org>,
	sebastian.riemer@profitbricks.com
Subject: Re: /sys/block/md126 still exists even after stopping the array
Date: Mon, 29 Sep 2014 14:19:02 +1000	[thread overview]
Message-ID: <20140929141902.5038b2a3@notabene.brown> (raw)
In-Reply-To: <54254CBD.5080704@intel.com>

[-- Attachment #1: Type: text/plain, Size: 9567 bytes --]

On Fri, 26 Sep 2014 13:23:41 +0200 Artur Paszkiewicz
<artur.paszkiewicz@intel.com> wrote:

> On 09/26/2014 12:44 PM, NeilBrown wrote:
> > On Fri, 26 Sep 2014 12:23:27 +0200 Francis Moreau <francis.moro@gmail.com>
> > wrote:
> > 
> >> Hello Neil,
> >>
> >> On 09/26/2014 02:33 AM, NeilBrown wrote:
> >>> On Thu, 25 Sep 2014 18:12:07 +0200 Francis Moreau <francis.moro@gmail.com>
> >>> wrote:
> >> [...]
> >>>> I tried to find out what could have opened the md device by using fuser,
> >>>> but fuser reports no users.
> >>>
> >>> It is probably a transient open/close.
> >>>
> >>
> >> If it's open/close wouldn't the 'close' part make the device disapear ?
> > 
> > No. It's ... complicated.
> > 
> >>
> >>>>
> >>>> I took a look to the udev rules which are the one shipped by mdadm 3.3.2
> >>>> but nothing keep the device opened during the remove event.
> >>>>
> >>>> Could you give me some hints here to debug this ?
> >>>
> >>> Modify md_open in drivers/md/md.c to add
> >>>    printk("Opened by %s\n", current->comm);
> >>>
> >>> and build a new kernel.  That will tell you the name of the process which
> >>> opened the device.
> >>>
> >>
> >> I did that I also added a trace in md_release() but strangely no trace
> >> were outputed from there.
> > 
> > Without seeing your patch I can't guess what it happening, but I am *certain*
> > that md_release() would get called providing md_open didn't return an error.
> > 
> > It might be helpful to print out the pid and the md device number too
> >  task_tgid_vnr(current)
> > will give you the pid.
> >   mdname(mddev)
> > give the name of the device.
> > 
> > Probably there is a 'change' event happening just before the 'remove' event,
> > and udev runs "mdadm" on the 'change' event, and that ends up happening after
> > the device has been removed.
> > 
> > Is this really a problem?  Can't you just ignore it and pretend it isn't
> > there?
> > 
> > NeilBrown
> > 
> >>
> >> Here's the details of what I did:
> >>
> >> --- %< ---
> >> [root@localhost ~]# cat /proc/mdstat
> >> Personalities : [raid1]
> >> md125 : active raid1 vdc1[1] vdb1[0]
> >>       65472 blocks super 1.0 [2/2] [UU]
> >>
> >> md126 : active raid1 vdc2[1] vdb2[0]
> >>       209536 blocks super 1.2 [2/2] [UU]
> >>
> >> md127 : active raid1 vdb3[0] vdc3[1]
> >>       1819584 blocks super 1.2 [2/2] [UU]
> >>
> >> unused devices: <none>
> >>
> >> [root@localhost ~]# mdadm --stop --scan
> >>
> >> [root@localhost ~]# dmesg | grep md_
> >> [    1.474207] md_open(): opened by mdadm
> >> [    1.475316] md_open(): opened by mdadm
> >> [    1.492880] md_open(): opened by mdadm
> >> [    1.493201] md_open(): opened by mdadm
> >> [    1.494690] md_open(): opened by mdadm
> >> [    1.499369] md_open(): opened by mdadm
> >> [    1.533566] md_open(): opened by mdadm
> >> [    1.533697] md_open(): opened by mdadm
> >> [    1.554419] md_open(): opened by mdadm
> >> [    1.574451] md_open(): opened by mdadm
> >> [    1.574666] md_open(): opened by mdadm
> >> [    1.574877] md_open(): opened by mdadm
> >> [    1.576822] md_open(): opened by systemd-udevd
> >> [    1.576895] md_open(): opened by systemd-udevd
> >> [    1.577029] md_open(): opened by systemd-udevd
> >> [    1.581850] md_open(): opened by mdadm
> >> [    1.584054] md_open(): opened by systemd-udevd
> >> [    1.584770] md_open(): opened by mdadm
> >> [    1.585175] md_open(): opened by mdadm
> >> [    1.586328] md_open(): opened by systemd-udevd
> >> [    1.586933] md_open(): opened by systemd-udevd
> >> [    1.651265] md_open(): opened by mdadm
> >> [    1.651320] md_open(): opened by mdadm
> >> [    1.651364] md_open(): opened by mdadm
> >> [    1.651437] md_open(): opened by mdadm
> >> [    1.652376] md_open(): opened by mdadm
> >> [    1.652452] md_open(): opened by mdadm
> >> [   33.486704] md_open(): opened by mdadm
> >> [   33.489259] md_open(): opened by mdadm
> >> [   33.491000] md_open(): opened by mdadm
> >> [   33.491767] md_open(): opened by systemd-udevd
> >> [   33.692255] md_open(): opened by mdadm
> >> [   33.692288] md_open(): opened by mdadm
> >> [   33.692606] md_open(): opened by mdadm
> >> [   33.692858] md_open(): opened by mdadm
> >> [   33.692942] md_open(): opened by mdadm
> >> [   33.693237] md_open(): opened by mdadm
> >> [   33.694254] md_open(): opened by mdadm
> >> [   33.694275] md_open(): opened by mdadm
> >> [   33.694373] md_open(): opened by mdadm
> >> [   33.695558] md_open(): opened by mdadm
> >> [   33.695679] md_open(): opened by mdadm
> >> [   33.695855] md_open(): opened by mdadm
> >> [   33.695894] md_open(): opened by mdadm
> >>
> >> [root@localhost ~]# ls /dev/md125
> >> /dev/md125
> >>
> >> [root@localhost ~]# fuser /dev/md125
> >>
> >> [root@localhost ~]# ps aux | grep "mdadm\|systemd-udevd"
> >> root       366  0.0  0.1  38172  1696 ?        Ss   06:04   0:00
> >> /usr/lib/systemd/systemd-udevd
> >> root       465  0.0  0.0   4964   924 ?        Ss   06:04   0:00
> >> /sbin/mdadm --monitor --scan --daemonise --syslog
> >> --pid-file=/run/mdadm/mdadm.pid
> >>
> >> [root@localhost ~]# ls -l /proc/366/fd/
> >> total 0
> >> lrwx------ 1 root root 64 Sep 26 06:04 0 -> /dev/null
> >> lrwx------ 1 root root 64 Sep 26 06:04 1 -> /dev/null
> >> lrwx------ 1 root root 64 Sep 26 06:04 10 -> socket:[8665]
> >> lr-x------ 1 root root 64 Sep 26 06:04 11 -> /etc/udev/hwdb.bin
> >> lrwx------ 1 root root 64 Sep 26 06:04 12 -> anon_inode:[eventpoll]
> >> lrwx------ 1 root root 64 Sep 26 06:04 2 -> /dev/null
> >> lrwx------ 1 root root 64 Sep 26 06:04 3 -> socket:[8144]
> >> lrwx------ 1 root root 64 Sep 26 06:04 4 -> socket:[8103]
> >> lrwx------ 1 root root 64 Sep 26 06:04 5 -> socket:[8660]
> >> lrwx------ 1 root root 64 Sep 26 06:04 6 -> /run/udev/queue.bin
> >> lr-x------ 1 root root 64 Sep 26 06:04 7 -> anon_inode:inotify
> >> lrwx------ 1 root root 64 Sep 26 06:04 8 -> anon_inode:[signalfd]
> >> lrwx------ 1 root root 64 Sep 26 06:04 9 -> socket:[8664]
> >>
> >> [root@localhost ~]# ls -l /proc/465/fd/
> >> total 0
> >> lrwx------ 1 root root 64 Sep 26 06:04 0 -> /dev/null
> >> lrwx------ 1 root root 64 Sep 26 06:04 1 -> /dev/null
> >> lrwx------ 1 root root 64 Sep 26 06:04 2 -> /dev/null
> >> lr-x------ 1 root root 64 Sep 26 06:06 4 -> /proc/mdstat
> >> lrwx------ 1 root root 64 Sep 26 06:06 5 -> socket:[10038]
> >>
> >> [root@localhost ~]# cat /proc/mdstat
> >> Personalities : [raid1]
> >> unused devices: <none>
> >>
> >> [root@localhost ~]# ls /sys/block/md125/md/
> >> array_size  array_state  bitmap/  chunk_size  component_size  layout
> >> level  max_read_errors  metadata_version  new_dev  raid_disks
> >> reshape_direction  reshape_position  resync_start  safe_mode_delay
> >>
> >> --- >% ---
> >>
> >> So in my understanding, only mdadm and udevd are opening the MD devices
> >> and mdamd was the last to open the device. For some unknown reasons,
> >> md_release() is never called.
> >>
> >> This happens with:
> >>
> >>  - kernel 3.14.19
> >>  - mdadm 3.3.2
> >>  - systemd 208
> >>
> >> Can you see something wrong here ?
> >>
> >> Thanks.
> >> --
> 
> Hi,
> 
> I have also been debugging this issue and I came up with this
> fix/workaround. It works for me. Can you take a look a this?
> 
> Thanks,
> Artur
> 
> >From c547e39789cde93d4a7ea1d3f845d61b82e4f0ed Mon Sep 17 00:00:00 2001
> From: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> Date: Fri, 26 Sep 2014 12:20:46 +0200
> Subject: [PATCH] md: avoid creating new devices for stopped arrays in
>  md_open()
> 
> When an array is about to be destroyed, set mddev->gendisk->private_data
> to NULL as it is no longer needed and check it in md_open(). If
> bdev->bd_disk->private_data is NULL, then this indicates that the array
> is stopped and return -ENODEV.
> 
> Signed-off-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
> ---
>  drivers/md/md.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 1294238..7109d48 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -449,6 +449,7 @@ static void mddev_put(struct mddev *mddev)
>                 bs = mddev->bio_set;
>                 mddev->bio_set = NULL;
>                 if (mddev->gendisk) {
> +                       mddev->gendisk->private_data = NULL;
>                         /* We did a probe so need to clean up.  Call
>                          * queue_work inside the spinlock so that
>                          * flush_workqueue() after mddev_find will
> @@ -6693,9 +6694,14 @@ static int md_open(struct block_device *bdev, fmode_t mode)
>          * Succeed if we can lock the mddev, which confirms that
>          * it isn't being stopped right now.
>          */
> -       struct mddev *mddev = mddev_find(bdev->bd_dev);
> +       struct mddev *mddev;
>         int err;
> 
> +       if (!bdev->bd_disk->private_data)
> +               return -ENODEV;
> +
> +       mddev = mddev_find(bdev->bd_dev);
> +
>         if (!mddev)
>                 return -ENODEV;
> 

Thanks, but I don't think this is a complete fix.
It creates a small window after an array is stopped during which an attempt
to open the device will fail.  Once mddev_delayed_delete() completes, the
device can be opened again.
So it might occasionally fix the symptom, but it is very dependant on timing
and won't always work.

NeilBrown


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

  reply	other threads:[~2014-09-29  4:19 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-24 15:38 /sys/block/md126 still exists even after stopping the array Francis Moreau
2014-06-25  1:03 ` NeilBrown
2014-06-25  6:59   ` Francis Moreau
2014-07-24 13:40   ` Sebastian Parschauer
2014-07-24 13:51     ` Artur Paszkiewicz
2014-09-25 16:12   ` Francis Moreau
2014-09-26  0:33     ` NeilBrown
2014-09-26 10:23       ` Francis Moreau
2014-09-26 10:44         ` NeilBrown
2014-09-26 11:23           ` Artur Paszkiewicz
2014-09-29  4:19             ` NeilBrown [this message]
2014-09-26 12:21           ` Francis Moreau
2014-09-26 12:50             ` Francis Moreau
2014-09-29  4:47               ` NeilBrown
2014-09-29  4:37             ` NeilBrown
2014-09-29  8:45               ` Francis Moreau
2014-09-29 21:56                 ` NeilBrown
2014-09-30  7:43                   ` Francis Moreau
2014-10-07  7:05                     ` Francis Moreau
2014-10-07 23:54                       ` NeilBrown
2014-10-09  9:40                         ` Francis Moreau
2014-10-09  9:55                           ` NeilBrown
2014-10-10 19:34                             ` Francis Moreau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140929141902.5038b2a3@notabene.brown \
    --to=neilb@suse.de \
    --cc=artur.paszkiewicz@intel.com \
    --cc=francis.moro@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=sebastian.riemer@profitbricks.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.