linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: yebin <yebin10@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <tytso@mit.edu>, <adilger.kernel@dilger.ca>,
	<linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time
Date: Wed, 13 Oct 2021 20:33:45 +0800	[thread overview]
Message-ID: <6166D229.80809@huawei.com> (raw)
In-Reply-To: <20211013093847.GB19200@quack2.suse.cz>



On 2021/10/13 17:38, Jan Kara wrote:
> On Tue 12-10-21 19:46:24, yebin wrote:
>> On 2021/10/12 16:47, Jan Kara wrote:
>>> On Fri 08-10-21 10:38:31, yebin wrote:
>>>> On 2021/10/8 9:56, yebin wrote:
>>>>> On 2021/10/7 20:31, Jan Kara wrote:
>>>>>> On Sat 11-09-21 17:00:55, Ye Bin wrote:
>>>>>>> kmmpd:
>>>>>>> ...
>>>>>>>        diff = jiffies - last_update_time;
>>>>>>>        if (diff > mmp_check_interval * HZ) {
>>>>>>> ...
>>>>>>> As "mmp_check_interval = 2 * mmp_update_interval", 'diff' always little
>>>>>>> than 'mmp_update_interval', so there will never trigger detection.
>>>>>>> Introduce last_check_time record previous check time.
>>>>>>>
>>>>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>>>>> I think the check is there only for the case where write_mmp_block() +
>>>>>> sleep took longer than mmp_check_interval. I agree that should rarely
>>>>>> happen but on a really busy system it is possible and in that case
>>>>>> we would
>>>>>> miss updating mmp block for too long and so another node could have
>>>>>> started
>>>>>> using the filesystem. I actually don't see a reason why kmmpd should be
>>>>>> checking the block each mmp_check_interval as you do -
>>>>>> mmp_check_interval
>>>>>> is just for ext4_multi_mount_protect() to know how long it should wait
>>>>>> before considering mmp block stale... Am I missing something?
>>>>>>
>>>>>>                                   Honza
>>>>> I'm sorry, I didn't understand the detection mechanism here before. Now
>>>>> I understand
>>>>> the detection mechanism here.
>>>>> As you said, it's just an abnormal protection. There's really no problem.
>>>>>
>>>> Yeah, i did test as following steps
>>>> hostA                        hostB
>>>>      mount
>>>>        ext4_multi_mount_protect  -> seq == EXT4_MMP_SEQ_CLEAN
>>>>           delay 5s after label "skip" so hostB will see seq is
>>>> EXT4_MMP_SEQ_CLEAN
>>>>                          mount
>>>>                          ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN
>>>>                                  run  kmmpd
>>>>       run kmmpd
>>>>
>>>> Actually,in this  situation kmmpd will not detect  confliction.
>>>> In ext4_multi_mount_protect function we write mmp data first and wait
>>>> 'wait_time * HZ'  seconds,
>>>> read mmp data do check. Most of the time, If 'wait_time' is zero, it can pass
>>>> check.
>>> But how can be wait_time zero? As far as I'm reading the code, wait_time
>>> must be at least EXT4_MMP_MIN_CHECK_INTERVAL...
>>>
>>> 								Honza
>>   int ext4_multi_mount_protect(struct super_block *sb,
>>                                       ext4_fsblk_t mmp_block)
>>   {
>>           struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>>           struct buffer_head *bh = NULL;
>>           struct mmp_struct *mmp = NULL;
>>           u32 seq;
>>           unsigned int mmp_check_interval =
>> le16_to_cpu(es->s_mmp_update_interval);
>>           unsigned int wait_time = 0;                    --> wait_time is
>> equal with zero
>>           int retval;
>>
>>           if (mmp_block < le32_to_cpu(es->s_first_data_block) ||
>>               mmp_block >= ext4_blocks_count(es)) {
>>                   ext4_warning(sb, "Invalid MMP block in superblock");
>>                   goto failed;
>>           }
>>
>>           retval = read_mmp_block(sb, &bh, mmp_block);
>>           if (retval)
>>                   goto failed;
>>
>>           mmp = (struct mmp_struct *)(bh->b_data);
>>
>>           if (mmp_check_interval < EXT4_MMP_MIN_CHECK_INTERVAL)
>>                   mmp_check_interval = EXT4_MMP_MIN_CHECK_INTERVAL;
>>
>>           /*
>>            * If check_interval in MMP block is larger, use that instead of
>>            * update_interval from the superblock.
>>            */
>>           if (le16_to_cpu(mmp->mmp_check_interval) > mmp_check_interval)
>>                   mmp_check_interval = le16_to_cpu(mmp->mmp_check_interval);
>>
>>           seq = le32_to_cpu(mmp->mmp_seq);
>>           if (seq == EXT4_MMP_SEQ_CLEAN)   --> If hostA and hostB mount the
>> same block device at the same time,
>> --> HostA and hostB  maybe get 'seq' with the same value EXT4_MMP_SEQ_CLEAN.
>>                   goto skip;
> Oh, I see. Thanks for explanation.
>
>> ...
>> skip:
>>          /*
>>           * write a new random sequence number.
>>           */
>>          seq = mmp_new_seq();
>>          mmp->mmp_seq = cpu_to_le32(seq);
>>
>>          retval = write_mmp_block(sb, bh);
>>          if (retval)
>>                  goto failed;
>>
>>          /*
>>           * wait for MMP interval and check mmp_seq.
>>           */
>>          if (schedule_timeout_interruptible(HZ * wait_time) != 0) {
>> --> If seq is equal with EXT4_MMP_SEQ_CLEAN, wait_time is zero.
>>                  ext4_warning(sb, "MMP startup interrupted, failing mount");
>>                  goto failed;
>>          }
>>
>>          retval = read_mmp_block(sb, &bh, mmp_block); -->We may get the same
>> data with which we wrote, so we can't detect conflict at here.
> OK, I see. So the race in ext4_multi_mount_protect() goes like:
>
> hostA				hostB
>
> read_mmp_block()		read_mmp_block()
> - sees EXT4_MMP_SEQ_CLEAN	- sees EXT4_MMP_SEQ_CLEAN
> write_mmp_block()
> wait_time == 0 -> no wait
> read_mmp_block()
>    - all OK, mount
> 				write_mmp_block()
> 				wait_time == 0 -> no wait
> 				read_mmp_block()
> 				  - all OK, mount
Yes, that's what i mean.
>
> Do I get it right? Actually, if we passed seq we wrote in
> ext4_multi_mount_protect() to kmmpd (probably in sb), then kmmpd would
> notice the conflict on its first invocation but still that would be a bit
> late because there would be a time window where hostA and hostB would be
> both using the fs.
>
> We could reduce the likelyhood of this race by always waiting in
> ext4_multi_mount_protect() between write & read but I guess that is
> undesirable as it would slow down all clean mounts. Ted?
>
> 								Honza


  reply	other threads:[~2021-10-13 12:33 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-11  9:00 [PATCH -next v2 0/6] Fix some issues about mmp Ye Bin
2021-09-11  9:00 ` [PATCH -next v2 1/6] ext4: init seq with random value in kmmpd Ye Bin
2021-10-07 12:26   ` Jan Kara
2021-10-08  1:50     ` yebin
2021-09-11  9:00 ` [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time Ye Bin
2021-10-07 12:31   ` Jan Kara
2021-10-08  1:56     ` yebin
2021-10-08  2:38       ` yebin
2021-10-12  8:47         ` Jan Kara
2021-10-12 11:46           ` yebin
2021-10-13  9:38             ` Jan Kara
2021-10-13 12:33               ` yebin [this message]
2021-10-13 21:41               ` Theodore Ts'o
2021-10-15  3:21                 ` Andreas Dilger
2021-10-15  3:21                 ` Andreas Dilger
2021-09-11  9:00 ` [PATCH -next v2 3/6] ext4: compare to local seq and nodename when check conflict Ye Bin
2021-10-07 12:36   ` Jan Kara
2021-09-11  9:00 ` [PATCH -next v2 4/6] ext4: avoid to re-read mmp check data get from page cache Ye Bin
2021-10-07 12:44   ` Jan Kara
2021-10-08  3:52     ` yebin
2021-09-11  9:00 ` [PATCH -next v2 5/6] ext4: avoid to double free s_mmp_bh Ye Bin
2021-09-11  9:00 ` [PATCH -next v2 6/6] ext4: fix possible store wrong check interval value in disk when umount Ye Bin
2021-10-07 13:12   ` Jan Kara
2021-10-08  3:49     ` yebin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6166D229.80809@huawei.com \
    --to=yebin10@huawei.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).