From: Baokun Li <libaokun1@huawei.com>
To: <linux-mm@kvack.org>, <linux-ext4@vger.kernel.org>
Cc: <tytso@mit.edu>, <adilger.kernel@dilger.ca>, <jack@suse.cz>,
<willy@infradead.org>, <akpm@linux-foundation.org>,
<ritesh.list@gmail.com>, <linux-kernel@vger.kernel.org>,
<yi.zhang@huawei.com>, <yangerkun@huawei.com>,
<yukuai3@huawei.com>, <libaokun1@huawei.com>
Subject: [PATCH -RFC 1/2] mm: avoid data corruption when extending DIO write race with buffered read
Date: Sat, 2 Dec 2023 17:14:31 +0800 [thread overview]
Message-ID: <20231202091432.8349-2-libaokun1@huawei.com> (raw)
In-Reply-To: <20231202091432.8349-1-libaokun1@huawei.com>
When DIO write and buffered read are performed on the same file on two
CPUs, the following race may occur:
cpu1 cpu2
Direct write 1024 from 4096 | Buffered read 8192 from 0
-----------------------------|----------------------------
... ...
ext4_file_write_iter
ext4_dio_write_iter
iomap_dio_rw
...
ext4_file_read_iter
generic_file_read_iter
filemap_read
filemap_get_pages
...
ext4_mpage_readpages
ext4_readpage_limit(inode)
i_size_read(inode) // 4096
ext4_dio_write_end_io
i_size_write(inode, 5120)
i_size_read(inode) // 5120
1. read alloc 8192
0 8192
|-------------------|-------------------|
2. read form disk (i_size 4096)
0 filled data 4096 filled zero 8192
|-------------------|-------------------|
3. copyout (i_size 5120)
0 copyout to uset buffer 5120 8192
|------------------------|--------------|
|~~~~|
Inconsistent data
In the above race, because of the change of inode_size, the actual data
read from the disk is only 4096 bytes, but copied to the user's buffer
5120 bytes, including 1024 bytes of zero-filled tail page, which results
in the data read by the user is not consistent with the data on the disk.
To solve this problem completely, we should take the lesser of the number
of bytes actually read or the inode_size and use that as the final read
size. The problem here is that we don't know how many bytes of valid data
filemap_get_pages() reads, or how many bytes of valid data are in a page,
so we have to rely on inode_size to determine the range of valid data.
So we read the inode_size before and after filemap_get_pages(), and take
the smaller of the two as the size of the copyout to reduce the
probability of the above issue being triggered.
Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
mm/filemap.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 71f00539ac00..47c1729afbb4 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2587,7 +2587,8 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
if ((iocb->ki_flags & IOCB_WAITQ) && already_read)
iocb->ki_flags |= IOCB_NOWAIT;
- if (unlikely(iocb->ki_pos >= i_size_read(inode)))
+ isize = i_size_read(inode);
+ if (unlikely(iocb->ki_pos >= isize))
break;
error = filemap_get_pages(iocb, iter->count, &fbatch, false);
@@ -2602,7 +2603,7 @@ ssize_t filemap_read(struct kiocb *iocb, struct iov_iter *iter,
* part of the page is not copied back to userspace (unless
* another truncate extends the file - this is desired though).
*/
- isize = i_size_read(inode);
+ isize = min_t(loff_t, isize, i_size_read(inode));
if (unlikely(iocb->ki_pos >= isize))
goto put_folios;
end_offset = min_t(loff_t, isize, iocb->ki_pos + iter->count);
--
2.31.1
next prev parent reply other threads:[~2023-12-02 9:10 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-02 9:14 [PATCH -RFC 0/2] mm/ext4: avoid data corruption when extending DIO write race with buffered read Baokun Li
2023-12-02 9:14 ` Baokun Li [this message]
2023-12-02 9:14 ` [PATCH -RFC 2/2] ext4: " Baokun Li
2023-12-04 12:11 ` [PATCH -RFC 0/2] mm/ext4: " Jan Kara
2023-12-04 13:50 ` Baokun Li
2023-12-04 14:41 ` Jan Kara
2023-12-05 12:50 ` Baokun Li
2023-12-06 19:37 ` Jan Kara
2023-12-07 3:01 ` Baokun Li
2023-12-07 14:15 ` Baokun Li
2023-12-11 17:49 ` Jan Kara
2023-12-12 2:15 ` Baokun Li
2023-12-12 4:36 ` Matthew Wilcox
2023-12-12 14:25 ` Jan Kara
2023-12-05 4:17 ` Theodore Ts'o
2023-12-05 13:19 ` Baokun Li
2023-12-06 21:55 ` Theodore Ts'o
2023-12-07 6:41 ` Baokun Li
2023-12-06 8:35 ` Dave Chinner
2023-12-06 9:02 ` Christoph Hellwig
2023-12-06 10:34 ` Dave Chinner
2023-12-06 12:20 ` Christoph Hellwig
2023-12-06 11:57 ` Baokun Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231202091432.8349-2-libaokun1@huawei.com \
--to=libaokun1@huawei.com \
--cc=adilger.kernel@dilger.ca \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox