linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Zhilong Liu <zlliu@suse.com>
To: Jes Sorensen <jes.sorensen@gmail.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: [mdadm PATCH] Create: move STOP_ARRAY to abort_locked
Date: Wed, 31 May 2017 18:25:54 +0800	[thread overview]
Message-ID: <92404b1c-fa39-05a1-db09-edd34624fddf@suse.com> (raw)
In-Reply-To: <8275d868-ad51-209b-e46f-9d7f1c60d1a3@gmail.com>



On 05/09/2017 01:54 AM, Jes Sorensen wrote:
> On 05/07/2017 09:50 PM, Zhilong Liu wrote:
>>
>>
>> On 05/05/2017 11:31 AM, Liu Zhilong wrote:
>>>
>>>
>>> On 05/04/2017 10:54 PM, Jes Sorensen wrote:
>>>> On 05/04/2017 08:20 AM, Zhilong Liu wrote:
>>>>> Hi Jes,
>>>>>
>>>>> apply for review, this is a bug I ever encountered.
>>>>
>>>> Zhilong,
>>>>
>>>> Under what circumstances do you see this?
>>>>
>>>
>>> Issued the command:
>>> linux-g0sr:/home/test # ./mdadm -CR /dev/md0 -l1 -n2 -b internal
>>> /dev/loop[0-1] --size 63
>>> ... ... ...
>>> mdadm: Given bitmap chunk size not supported.
>>> linux-g0sr:/home/test # ls /dev/md0
>>> /dev/md0
>>> linux-g0sr:/home/test # ls /sys/block/md0/md/
>>> array_size   bitmap      component_size  level metadata_version
>>> raid_disks         reshape_position safe_mode_delay
>>> array_state  chunk_size  layout          max_read_errors
>>> new_dev           reshape_direction  resync_start
>>>
>>> create_mddev() writes the devnm to
>>> /sys/module/md_mod/parameter/new_array,
>>> then in md.c, module_param_call() called the 'set' function of
>>> add_named_array(),
>>> md_alloc() init_and_add the kobject for devm, finally the devnm device
>>> has created
>>> and sysfs has registered after create_mddev executed successfully.
>>> Thus it's better
>>> to STOP_ARRAY in any case after create_mddev() invoked.
>>>
>>
>> this patch depends on the kernel commit:
>> 039b7225e6e9 ("md: allow creation of mdNNN arrays via
>> md_mod/parameters/new_array")
>> Neil's patch has set "mddev->hold_active = UNTIL_STOP", thus the
>> STOP_ARRAY can work
>> well on this situation.
>
> OK now I am confused - are you saying this change will only work after 
> Neil's kernel patch has been applied? That would be no good, mdadm 
> needs to work on older kernels too.
>

Sorry for the late reply for this.

Currently, the creating array method for later kernel(newer than v4.11), 
it can avoid the race problem
via to set 'create_on_open' as N and write the mddev -> 
/sys/module/md_mod/parameters/new_array,
then mdadm can send the ioctl to stop_array directly if creating array 
fail, it won't happen the race.
refer to commit: 78b6350dcaad ("md: support disabling of create-on-open 
semantics.")

And for older kernel, it's difficult to avoid the problem of the race 
when use 'change', 'add' and 'remove'
udev rules in a short period of time via to ioctl commands.
For example, this case tested on my latest Tumbleweed(20170524) with 
latest mdadm source.

Steps:
First part:
-  open one terminal to monitor the udev:
Terminal 1: # udevadm monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

-  type the command in another terminal:
Terminal 2: # ./mdadm -CR /dev/md1 --bitmap=internal -l1 -n2 /dev/loop1 
/dev/loop2 --size 63

    add_internal_bitmap received the abnormal chunk_size( < 64k), and 
return a failure.
    but it left the partially created array and didn't cleanup. For this 
test, the udev monitor prints:

Terminal 1: # udevadm monitor
... ...
KERNEL[146.077168] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[146.077211] add      /devices/virtual/block/md1 (block)
UDEV  [146.078112] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [146.084163] add      /devices/virtual/block/md1 (block)

Terminal 2: # ./mdadm -S /dev/md1

Terminal 1: # udevadm monitor
KERNEL[153.276209] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[153.276317] remove   /devices/virtual/block/md1 (block)
UDEV  [153.277034] remove   /devices/virtual/bdi/9:1 (bdi)
UDEV  [153.280801] remove   /devices/virtual/block/md1 (block)

Second part:
add the ioctl(stop_array) and compiled to monitor the udevs.
# git diff
diff --git a/Create.c b/Create.c
index 239545f..21568ca 100644
--- a/Create.c
+++ b/Create.c
@@ -1065,7 +1065,9 @@ int Create(struct supertype *st, char *mddev,
         map_remove(&map, fd2devnm(mdfd));
         map_unlock(&map);

-       if (mdfd >= 0)
+       if (mdfd >= 0) {
+               ioctl(mdfd, STOP_ARRAY, NULL);
                 close(mdfd);
+       }
         return 1;
  }

Terminal 2: # ./mdadm -CR /dev/md1 --bitmap=internal -l1 -n2 /dev/loop1 
/dev/loop2 --size 63

Terminal 1: # udevadm monitor
... ...
KERNEL[171.964597] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.965346] add      /devices/virtual/block/md1 (block)
UDEV  [171.965565] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.984195] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[171.984555] remove   /devices/virtual/block/md1 (block)
UDEV  [171.984590] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[172.004504] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[172.004890] add      /devices/virtual/block/md1 (block)
UDEV  [172.004923] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [172.005787] add      /devices/virtual/block/md1 (block)
UDEV  [172.009648] remove   /devices/virtual/block/md1 (block)
UDEV  [172.013232] add      /devices/virtual/block/md1 (block)

So shall give udev a moment to process 'add' events?

mdadm can work as expect when adding usleep(100 * 1000) before perform 
ioctl(STOP_ARRAY).
this is the udev monitor result tested by the following sample.

Terminal 1: # udevadm monitor
KERNEL[5476.780692] add      /devices/virtual/bdi/9:1 (bdi)
KERNEL[5476.780976] add      /devices/virtual/block/md1 (block)
UDEV  [5476.782355] add      /devices/virtual/bdi/9:1 (bdi)
UDEV  [5476.786056] add      /devices/virtual/block/md1 (block)
KERNEL[5476.896255] remove   /devices/virtual/bdi/9:1 (bdi)
KERNEL[5476.896367] remove   /devices/virtual/block/md1 (block)
UDEV  [5476.896895] remove   /devices/virtual/bdi/9:1 (bdi)
UDEV  [5476.900752] remove   /devices/virtual/block/md1 (block)


Such as like this:

diff --git a/Create.c b/Create.c
index 239545f..a07ace8 100644
--- a/Create.c
+++ b/Create.c
@@ -902,10 +902,8 @@ int Create(struct supertype *st, char *mddev,
                                         remove_partitions(fd);
                                 if (st->ss->add_to_super(st, &inf->disk,
                                                          fd, dv->devname,
- dv->data_offset)) {
-                                       ioctl(mdfd, STOP_ARRAY, NULL);
+ dv->data_offset))
                                         goto abort_locked;
-                               }
                                 st->ss->getinfo_super(st, inf, NULL);
                                 safe_mode_delay = inf->safe_mode_delay;

@@ -1006,7 +1004,6 @@ int Create(struct supertype *st, char *mddev,
                         sysfs_set_safemode(&info, safe_mode_delay);
                         if (err) {
                                 pr_err("failed to activate array.\n");
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                 goto abort;
                         }
                 } else if (c->readonly &&
@@ -1016,7 +1013,6 @@ int Create(struct supertype *st, char *mddev,
                                           "array_state", "readonly") < 0) {
                                 pr_err("Failed to start array: %s\n",
                                        strerror(errno));
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                 goto abort;
                         }
                 } else {
@@ -1028,7 +1024,6 @@ int Create(struct supertype *st, char *mddev,
                                 if (info.array.chunk_size & 
(info.array.chunk_size-1)) {
                                         cont_err("Problem may be that 
chunk size is not a power of 2\n");
                                 }
-                               ioctl(mdfd, STOP_ARRAY, NULL);
                                 goto abort;
                         }
                         /* if start_ro module parameter is set, array is
@@ -1065,7 +1060,11 @@ int Create(struct supertype *st, char *mddev,
         map_remove(&map, fd2devnm(mdfd));
         map_unlock(&map);

-       if (mdfd >= 0)
+       if (mdfd >= 0) {
+               /* Give udev a moment to finish 'add' events. */
+               usleep(100*1000);
+               ioctl(mdfd, STOP_ARRAY, NULL);
                 close(mdfd);
+       }
         return 1;
  }


Thanks,
-Zhilong

> Jes
>
>
>


      parent reply	other threads:[~2017-05-31 10:25 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-26  7:03 [mdadm PATCH] Create: move STOP_ARRAY to abort_locked Zhilong Liu
2017-05-04 12:20 ` Zhilong Liu
2017-05-04 14:54   ` Jes Sorensen
2017-05-04 17:42     ` Zhilong
2017-05-05  3:31     ` Liu Zhilong
2017-05-08  1:50       ` Zhilong Liu
2017-05-08 17:54         ` Jes Sorensen
2017-05-11 13:01           ` Zhilong Liu
2017-05-31 10:25           ` Zhilong Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=92404b1c-fa39-05a1-db09-edd34624fddf@suse.com \
    --to=zlliu@suse.com \
    --cc=jes.sorensen@gmail.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).