From: yebin <yebin10@huawei.com>
To: Jan Kara <jack@suse.cz>
Cc: <tytso@mit.edu>, <adilger.kernel@dilger.ca>,
<linux-ext4@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time
Date: Wed, 13 Oct 2021 20:33:45 +0800 [thread overview]
Message-ID: <6166D229.80809@huawei.com> (raw)
In-Reply-To: <20211013093847.GB19200@quack2.suse.cz>
On 2021/10/13 17:38, Jan Kara wrote:
> On Tue 12-10-21 19:46:24, yebin wrote:
>> On 2021/10/12 16:47, Jan Kara wrote:
>>> On Fri 08-10-21 10:38:31, yebin wrote:
>>>> On 2021/10/8 9:56, yebin wrote:
>>>>> On 2021/10/7 20:31, Jan Kara wrote:
>>>>>> On Sat 11-09-21 17:00:55, Ye Bin wrote:
>>>>>>> kmmpd:
>>>>>>> ...
>>>>>>> diff = jiffies - last_update_time;
>>>>>>> if (diff > mmp_check_interval * HZ) {
>>>>>>> ...
>>>>>>> As "mmp_check_interval = 2 * mmp_update_interval", 'diff' always little
>>>>>>> than 'mmp_update_interval', so there will never trigger detection.
>>>>>>> Introduce last_check_time record previous check time.
>>>>>>>
>>>>>>> Signed-off-by: Ye Bin <yebin10@huawei.com>
>>>>>> I think the check is there only for the case where write_mmp_block() +
>>>>>> sleep took longer than mmp_check_interval. I agree that should rarely
>>>>>> happen but on a really busy system it is possible and in that case
>>>>>> we would
>>>>>> miss updating mmp block for too long and so another node could have
>>>>>> started
>>>>>> using the filesystem. I actually don't see a reason why kmmpd should be
>>>>>> checking the block each mmp_check_interval as you do -
>>>>>> mmp_check_interval
>>>>>> is just for ext4_multi_mount_protect() to know how long it should wait
>>>>>> before considering mmp block stale... Am I missing something?
>>>>>>
>>>>>> Honza
>>>>> I'm sorry, I didn't understand the detection mechanism here before. Now
>>>>> I understand
>>>>> the detection mechanism here.
>>>>> As you said, it's just an abnormal protection. There's really no problem.
>>>>>
>>>> Yeah, i did test as following steps
>>>> hostA hostB
>>>> mount
>>>> ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN
>>>> delay 5s after label "skip" so hostB will see seq is
>>>> EXT4_MMP_SEQ_CLEAN
>>>> mount
>>>> ext4_multi_mount_protect -> seq == EXT4_MMP_SEQ_CLEAN
>>>> run kmmpd
>>>> run kmmpd
>>>>
>>>> Actually,in this situation kmmpd will not detect confliction.
>>>> In ext4_multi_mount_protect function we write mmp data first and wait
>>>> 'wait_time * HZ' seconds,
>>>> read mmp data do check. Most of the time, If 'wait_time' is zero, it can pass
>>>> check.
>>> But how can be wait_time zero? As far as I'm reading the code, wait_time
>>> must be at least EXT4_MMP_MIN_CHECK_INTERVAL...
>>>
>>> Honza
>> int ext4_multi_mount_protect(struct super_block *sb,
>> ext4_fsblk_t mmp_block)
>> {
>> struct ext4_super_block *es = EXT4_SB(sb)->s_es;
>> struct buffer_head *bh = NULL;
>> struct mmp_struct *mmp = NULL;
>> u32 seq;
>> unsigned int mmp_check_interval =
>> le16_to_cpu(es->s_mmp_update_interval);
>> unsigned int wait_time = 0; --> wait_time is
>> equal with zero
>> int retval;
>>
>> if (mmp_block < le32_to_cpu(es->s_first_data_block) ||
>> mmp_block >= ext4_blocks_count(es)) {
>> ext4_warning(sb, "Invalid MMP block in superblock");
>> goto failed;
>> }
>>
>> retval = read_mmp_block(sb, &bh, mmp_block);
>> if (retval)
>> goto failed;
>>
>> mmp = (struct mmp_struct *)(bh->b_data);
>>
>> if (mmp_check_interval < EXT4_MMP_MIN_CHECK_INTERVAL)
>> mmp_check_interval = EXT4_MMP_MIN_CHECK_INTERVAL;
>>
>> /*
>> * If check_interval in MMP block is larger, use that instead of
>> * update_interval from the superblock.
>> */
>> if (le16_to_cpu(mmp->mmp_check_interval) > mmp_check_interval)
>> mmp_check_interval = le16_to_cpu(mmp->mmp_check_interval);
>>
>> seq = le32_to_cpu(mmp->mmp_seq);
>> if (seq == EXT4_MMP_SEQ_CLEAN) --> If hostA and hostB mount the
>> same block device at the same time,
>> --> HostA and hostB maybe get 'seq' with the same value EXT4_MMP_SEQ_CLEAN.
>> goto skip;
> Oh, I see. Thanks for explanation.
>
>> ...
>> skip:
>> /*
>> * write a new random sequence number.
>> */
>> seq = mmp_new_seq();
>> mmp->mmp_seq = cpu_to_le32(seq);
>>
>> retval = write_mmp_block(sb, bh);
>> if (retval)
>> goto failed;
>>
>> /*
>> * wait for MMP interval and check mmp_seq.
>> */
>> if (schedule_timeout_interruptible(HZ * wait_time) != 0) {
>> --> If seq is equal with EXT4_MMP_SEQ_CLEAN, wait_time is zero.
>> ext4_warning(sb, "MMP startup interrupted, failing mount");
>> goto failed;
>> }
>>
>> retval = read_mmp_block(sb, &bh, mmp_block); -->We may get the same
>> data with which we wrote, so we can't detect conflict at here.
> OK, I see. So the race in ext4_multi_mount_protect() goes like:
>
> hostA hostB
>
> read_mmp_block() read_mmp_block()
> - sees EXT4_MMP_SEQ_CLEAN - sees EXT4_MMP_SEQ_CLEAN
> write_mmp_block()
> wait_time == 0 -> no wait
> read_mmp_block()
> - all OK, mount
> write_mmp_block()
> wait_time == 0 -> no wait
> read_mmp_block()
> - all OK, mount
Yes, that's what i mean.
>
> Do I get it right? Actually, if we passed seq we wrote in
> ext4_multi_mount_protect() to kmmpd (probably in sb), then kmmpd would
> notice the conflict on its first invocation but still that would be a bit
> late because there would be a time window where hostA and hostB would be
> both using the fs.
>
> We could reduce the likelyhood of this race by always waiting in
> ext4_multi_mount_protect() between write & read but I guess that is
> undesirable as it would slow down all clean mounts. Ted?
>
> Honza
next prev parent reply other threads:[~2021-10-13 12:33 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-11 9:00 [PATCH -next v2 0/6] Fix some issues about mmp Ye Bin
2021-09-11 9:00 ` [PATCH -next v2 1/6] ext4: init seq with random value in kmmpd Ye Bin
2021-10-07 12:26 ` Jan Kara
2021-10-08 1:50 ` yebin
2021-09-11 9:00 ` [PATCH -next v2 2/6] ext4: introduce last_check_time record previous check time Ye Bin
2021-10-07 12:31 ` Jan Kara
2021-10-08 1:56 ` yebin
2021-10-08 2:38 ` yebin
2021-10-12 8:47 ` Jan Kara
2021-10-12 11:46 ` yebin
2021-10-13 9:38 ` Jan Kara
2021-10-13 12:33 ` yebin [this message]
2021-10-13 21:41 ` Theodore Ts'o
2021-10-15 3:21 ` Andreas Dilger
2021-10-15 3:21 ` Andreas Dilger
2021-09-11 9:00 ` [PATCH -next v2 3/6] ext4: compare to local seq and nodename when check conflict Ye Bin
2021-10-07 12:36 ` Jan Kara
2021-09-11 9:00 ` [PATCH -next v2 4/6] ext4: avoid to re-read mmp check data get from page cache Ye Bin
2021-10-07 12:44 ` Jan Kara
2021-10-08 3:52 ` yebin
2021-09-11 9:00 ` [PATCH -next v2 5/6] ext4: avoid to double free s_mmp_bh Ye Bin
2021-09-11 9:00 ` [PATCH -next v2 6/6] ext4: fix possible store wrong check interval value in disk when umount Ye Bin
2021-10-07 13:12 ` Jan Kara
2021-10-08 3:49 ` yebin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6166D229.80809@huawei.com \
--to=yebin10@huawei.com \
--cc=adilger.kernel@dilger.ca \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.