From: Chao Yu <chao@kernel.org>
To: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net
Subject: Re: [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write
Date: Thu, 6 Jun 2024 18:31:21 +0800 [thread overview]
Message-ID: <4ac4f5cf-7748-4ee4-8d68-8842d8bf1753@kernel.org> (raw)
In-Reply-To: <efae597c-d334-498b-9050-1a21bf40e21d@kernel.org>
On 2024/5/15 14:38, Chao Yu wrote:
> On 2024/5/15 12:42, Jaegeuk Kim wrote:
>> On 05/15, Chao Yu wrote:
>>> On 2024/5/15 0:09, Jaegeuk Kim wrote:
>>>> On 05/10, Chao Yu wrote:
>>>>> If lfs mode is on, buffered read may race w/ OPU dio write as below,
>>>>> it may cause buffered read hits unwritten data unexpectly, and for
>>>>> dio read, the race condition exists as well.
>>>>>
>>>>> Thread A Thread B
>>>>> - f2fs_file_write_iter
>>>>> - f2fs_dio_write_iter
>>>>> - __iomap_dio_rw
>>>>> - f2fs_iomap_begin
>>>>> - f2fs_map_blocks
>>>>> - __allocate_data_block
>>>>> - allocated blkaddr #x
>>>>> - iomap_dio_submit_bio
>>>>> - f2fs_file_read_iter
>>>>> - filemap_read
>>>>> - f2fs_read_data_folio
>>>>> - f2fs_mpage_readpages
>>>>> - f2fs_map_blocks
>>>>> : get blkaddr #x
>>>>> - f2fs_submit_read_bio
>>>>> IRQ
>>>>> - f2fs_read_end_io
>>>>> : read IO on blkaddr #x complete
>>>>> IRQ
>>>>> - iomap_dio_bio_end_io
>>>>> : direct write IO on blkaddr #x complete
>>>>>
>>>>> This patch introduces a new per-inode i_opu_rwsem lock to avoid
>>>>> such race condition.
>>>>
>>>> Wasn't this supposed to be managed by user-land?
>>>
>>> Actually, the test case is:
>>>
>>> 1. mount w/ lfs mode
>>> 2. touch file;
>>> 3. initialize file w/ 4k zeroed data; fsync;
>>> 4. continue triggering dio write 4k zeroed data to file;
>>> 5. and meanwhile, continue triggering buf/dio 4k read in file,
>>> use md5sum to verify the 4k data;
>>>
>>> It expects data is all zero, however it turned out it's not.
>>
>> Can we check outstanding write bios instead of abusing locks?
Jaegeuk, seems it can solve partial race cases, not all of them.
Do you suggest to use this compromised solution?
Thanks,
>
> I didn't figure out a way to solve this w/o lock, due to:
> - write bios can be issued after outstanding write bios check condition,
> result in the race.
> - once read() detects that there are outstanding write bios, we need to
> delay read flow rather than fail it, right? It looks using a lock is more
> proper here?
>
> Any suggestion?
>
> Thanks,
>
>>
>>>
>>> Thanks,
>>>
>>>>
>>>>>
>>>>> Fixes: f847c699cff3 ("f2fs: allow out-place-update for direct IO in LFS mode")
>>>>> Signed-off-by: Chao Yu <chao@kernel.org>
>>>>> ---
>>>>> v2:
>>>>> - fix to cover dio read path w/ i_opu_rwsem as well.
>>>>> fs/f2fs/f2fs.h | 1 +
>>>>> fs/f2fs/file.c | 28 ++++++++++++++++++++++++++--
>>>>> fs/f2fs/super.c | 1 +
>>>>> 3 files changed, 28 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>>>> index 30058e16a5d0..91cf4b3d6bc6 100644
>>>>> --- a/fs/f2fs/f2fs.h
>>>>> +++ b/fs/f2fs/f2fs.h
>>>>> @@ -847,6 +847,7 @@ struct f2fs_inode_info {
>>>>> /* avoid racing between foreground op and gc */
>>>>> struct f2fs_rwsem i_gc_rwsem[2];
>>>>> struct f2fs_rwsem i_xattr_sem; /* avoid racing between reading and changing EAs */
>>>>> + struct f2fs_rwsem i_opu_rwsem; /* avoid racing between buf read and opu dio write */
>>>>>
>>>>> int i_extra_isize; /* size of extra space located in i_addr */
>>>>> kprojid_t i_projid; /* id for project quota */
>>>>> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
>>>>> index 72ce1a522fb2..4ec260af321f 100644
>>>>> --- a/fs/f2fs/file.c
>>>>> +++ b/fs/f2fs/file.c
>>>>> @@ -4445,6 +4445,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>> const loff_t pos = iocb->ki_pos;
>>>>> const size_t count = iov_iter_count(to);
>>>>> struct iomap_dio *dio;
>>>>> + bool do_opu = f2fs_lfs_mode(sbi);
>>>>> ssize_t ret;
>>>>>
>>>>> if (count == 0)
>>>>> @@ -4457,8 +4458,14 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>> ret = -EAGAIN;
>>>>> goto out;
>>>>> }
>>>>> + if (do_opu && !f2fs_down_read_trylock(&fi->i_opu_rwsem)) {
>>>>> + f2fs_up_read(&fi->i_gc_rwsem[READ]);
>>>>> + ret = -EAGAIN;
>>>>> + goto out;
>>>>> + }
>>>>> } else {
>>>>> f2fs_down_read(&fi->i_gc_rwsem[READ]);
>>>>> + f2fs_down_read(&fi->i_opu_rwsem);
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -4477,6 +4484,7 @@ static ssize_t f2fs_dio_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>> ret = iomap_dio_complete(dio);
>>>>> }
>>>>>
>>>>> + f2fs_up_read(&fi->i_opu_rwsem);
>>>>> f2fs_up_read(&fi->i_gc_rwsem[READ]);
>>>>>
>>>>> file_accessed(file);
>>>>> @@ -4523,7 +4531,13 @@ static ssize_t f2fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to)
>>>>> if (f2fs_should_use_dio(inode, iocb, to)) {
>>>>> ret = f2fs_dio_read_iter(iocb, to);
>>>>> } else {
>>>>> + bool do_opu = f2fs_lfs_mode(F2FS_I_SB(inode));
>>>>> +
>>>>> + if (do_opu)
>>>>> + f2fs_down_read(&F2FS_I(inode)->i_opu_rwsem);
>>>>> ret = filemap_read(iocb, to, 0);
>>>>> + if (do_opu)
>>>>> + f2fs_up_read(&F2FS_I(inode)->i_opu_rwsem);
>>>>> if (ret > 0)
>>>>> f2fs_update_iostat(F2FS_I_SB(inode), inode,
>>>>> APP_BUFFERED_READ_IO, ret);
>>>>> @@ -4748,14 +4762,22 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>>>>> ret = -EAGAIN;
>>>>> goto out;
>>>>> }
>>>>> + if (do_opu && !f2fs_down_write_trylock(&fi->i_opu_rwsem)) {
>>>>> + f2fs_up_read(&fi->i_gc_rwsem[READ]);
>>>>> + f2fs_up_read(&fi->i_gc_rwsem[WRITE]);
>>>>> + ret = -EAGAIN;
>>>>> + goto out;
>>>>> + }
>>>>> } else {
>>>>> ret = f2fs_convert_inline_inode(inode);
>>>>> if (ret)
>>>>> goto out;
>>>>>
>>>>> f2fs_down_read(&fi->i_gc_rwsem[WRITE]);
>>>>> - if (do_opu)
>>>>> + if (do_opu) {
>>>>> f2fs_down_read(&fi->i_gc_rwsem[READ]);
>>>>> + f2fs_down_write(&fi->i_opu_rwsem);
>>>>> + }
>>>>> }
>>>>>
>>>>> /*
>>>>> @@ -4779,8 +4801,10 @@ static ssize_t f2fs_dio_write_iter(struct kiocb *iocb, struct iov_iter *from,
>>>>> ret = iomap_dio_complete(dio);
>>>>> }
>>>>>
>>>>> - if (do_opu)
>>>>> + if (do_opu) {
>>>>> + f2fs_up_write(&fi->i_opu_rwsem);
>>>>> f2fs_up_read(&fi->i_gc_rwsem[READ]);
>>>>> + }
>>>>> f2fs_up_read(&fi->i_gc_rwsem[WRITE]);
>>>>>
>>>>> if (ret < 0)
>>>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>>>> index daf2c4dbe150..b4ed3b094366 100644
>>>>> --- a/fs/f2fs/super.c
>>>>> +++ b/fs/f2fs/super.c
>>>>> @@ -1428,6 +1428,7 @@ static struct inode *f2fs_alloc_inode(struct super_block *sb)
>>>>> init_f2fs_rwsem(&fi->i_gc_rwsem[READ]);
>>>>> init_f2fs_rwsem(&fi->i_gc_rwsem[WRITE]);
>>>>> init_f2fs_rwsem(&fi->i_xattr_sem);
>>>>> + init_f2fs_rwsem(&fi->i_opu_rwsem);
>>>>>
>>>>> /* Will be used by directory only */
>>>>> fi->i_dir_level = F2FS_SB(sb)->dir_level;
>>>>> --
>>>>> 2.40.1
>
>
> _______________________________________________
> Linux-f2fs-devel mailing list
> Linux-f2fs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
_______________________________________________
Linux-f2fs-devel mailing list
Linux-f2fs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel
next prev parent reply other threads:[~2024-06-06 10:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-10 2:39 [f2fs-dev] [PATCH v2] f2fs: fix to avoid racing in between read and OPU dio write Chao Yu
2024-05-14 16:09 ` Jaegeuk Kim
2024-05-15 1:42 ` Chao Yu
2024-05-15 4:42 ` Jaegeuk Kim
2024-05-15 6:38 ` Chao Yu
2024-06-06 10:31 ` Chao Yu [this message]
2024-05-15 8:32 ` Wu Bo via Linux-f2fs-devel
2024-06-06 10:25 ` Chao Yu
2024-05-15 8:40 ` Markus Elfring via Linux-f2fs-devel
2024-05-17 8:15 ` kernel test robot
-- strict thread matches above, loose matches on Subject: below --
2024-06-25 14:25 Chao Yu
2024-06-26 2:01 ` Zhiguo Niu
2024-06-26 14:52 ` Chao Yu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4ac4f5cf-7748-4ee4-8d68-8842d8bf1753@kernel.org \
--to=chao@kernel.org \
--cc=jaegeuk@kernel.org \
--cc=linux-f2fs-devel@lists.sourceforge.net \
--cc=linux-kernel@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).