Re: cluster-md mddev->in_sync & mddev->safemode_delay may have bug

Linux RAID subsystem development
 help / color / mirror / Atom feed

From: "heming.zhao@suse.com" <heming.zhao@suse.com>
To: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>,
	linux-raid@vger.kernel.org
Cc: neilb@suse.com
Subject: Re: cluster-md mddev->in_sync & mddev->safemode_delay may have bug
Date: Thu, 16 Jul 2020 02:40:01 +0800	[thread overview]
Message-ID: <57e70970-814b-3a55-35cc-b1415a301895@suse.com> (raw)
In-Reply-To: <91c60c65-11c4-35e7-41d2-77a1febc3249@cloud.ionos.com>

Hello Guoqing,

Thank you for your kindly reply and review comments. I will resend that patch later.

Do you know who take care of cluster-md field in this mail list?
I want he/she to shed a little light on me.

On 7/16/20 2:17 AM, Guoqing Jiang wrote:
> On 7/15/20 5:48 AM, heming.zhao@suse.com wrote:
>> Hello List,
>>
>>
>> @Neil  @Guoqing,
>> Would you have time to take a look at this bug?
> 
> I don't focus on it now, and you need CC me if you want my attention.
> 
>> This mail replaces previous mail: commit 480523feae581 may introduce a bug.
>> Previous mail has some unclear description, I sort out & resend in this mail.
>>
>> This bug was reported from a SUSE customer.
>>
>> In cluster-md env, after below steps, "mdadm -D /dev/md0" shows "State: active" all the time.
>> ```
>> # mdadm -S --scan
>> # mdadm --zero-superblock /dev/sd{a,b}
>> # mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sda /dev/sdb
>>
>> # mdadm -D /dev/md0
>> /dev/md0:
>>            Version : 1.2
>>      Creation Time : Mon Jul  6 12:02:23 2020
>>         Raid Level : raid1
>>         Array Size : 64512 (63.00 MiB 66.06 MB)
>>      Used Dev Size : 64512 (63.00 MiB 66.06 MB)
>>       Raid Devices : 2
>>      Total Devices : 2
>>        Persistence : Superblock is persistent
>>
>>      Intent Bitmap : Internal
>>
>>        Update Time : Mon Jul  6 12:02:24 2020
>>              State : active <==== this line
>>     Active Devices : 2
>>    Working Devices : 2
>>     Failed Devices : 0
>>      Spare Devices : 0
>>
>> Consistency Policy : bitmap
>>
>>               Name : lp-clustermd1:0  (local to host lp-clustermd1)
>>       Cluster Name : hacluster
>>               UUID : 38ae5052:560c7d36:bb221e15:7437f460
>>             Events : 18
>>
>>     Number   Major   Minor   RaidDevice State
>>        0       8        0        0      active sync   /dev/sda
>>        1       8       16        1      active sync   /dev/sdb
>> ```
>>
>> with commit 480523feae581 (author: Neil Brown), the try_set_sync never true, so mddev->in_sync always 0.
>>
>> the simplest fix is bypass try_set_sync when array is clustered.
>> ```
>>  void md_check_recovery(struct mddev *mddev)
>>  {
>>     ... ...
>>         if (mddev_is_clustered(mddev)) {
>>             struct md_rdev *rdev;
>>             /* kick the device if another node issued a
>>              * remove disk.
>>              */
>>             rdev_for_each(rdev, mddev) {
>>                 if (test_and_clear_bit(ClusterRemove, &rdev->flags) &&
>>                         rdev->raid_disk < 0)
>>                     md_kick_rdev_from_array(rdev);
>>             }
>> +           try_set_sync = 1;
>>         }
>>     ... ...
>>  }
>> ```
>> this fix makes commit 480523feae581 doesn't work when clustered env.
>> I want to know what impact with above fix.
>> Or does there have other solution for this issue?
>>
>>
>> --------
>> And for mddev->safemode_delay issue
>>
>> There is also another bug when array change bitmap from internal to clustered.
>> the /sys/block/mdX/md/safe_mode_delay keep original value after changing bitmap type.
>> in safe_delay_store(), the code forbids setting mddev->safemode_delay when array is clustered.
>> So in cluster-md env, the expected safemode_delay value should be 0.
>>
>> reproduction steps:
>> ```
>> # mdadm --zero-superblock /dev/sd{b,c,d}
>> # mdadm -C /dev/md0 -b internal -e 1.2 -n 2 -l mirror /dev/sdb /dev/sdc
>> # cat /sys/block/md0/md/safe_mode_delay
>> 0.204
>> # mdadm -G /dev/md0 -b none
>> # mdadm --grow /dev/md0 --bitmap=clustered
>> # cat /sys/block/md0/md/safe_mode_delay
>> 0.204  <== doesn't change, should ZERO for cluster-md
> 
> I saw you have sent a patch, which is good. And I suggest you to improve the header
> with your above analysis instead of just have the reproduce steps in header.
> 
> Thanks,
> Guoqing
>

next prev parent reply	other threads:[~2020-07-15 18:40 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-15  3:48 cluster-md mddev->in_sync & mddev->safemode_delay may have bug heming.zhao
2020-07-15 18:17 ` Guoqing Jiang
2020-07-15 18:40   ` heming.zhao [this message]
2020-07-15 19:12     ` Guoqing Jiang
2020-07-16  0:54 ` NeilBrown
2020-07-16  5:52   ` heming.zhao
2020-07-16  6:10     ` Song Liu
2020-07-16  6:22       ` heming.zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57e70970-814b-3a55-35cc-b1415a301895@suse.com \
    --to=heming.zhao@suse.com \
    --cc=guoqing.jiang@cloud.ionos.com \
    --cc=linux-raid@vger.kernel.org \
    --cc=neilb@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox