From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
yi.zhang@huawei.com, yi.zhang@huaweicloud.com,
chengzhihao1@huawei.com, yukuai3@huawei.com,
yangerkun@huawei.com
Subject: [PATCH v4 01/10] ext4: remove writable userspace mappings before truncating page cache
Date: Mon, 16 Dec 2024 09:39:06 +0800 [thread overview]
Message-ID: <20241216013915.3392419-2-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20241216013915.3392419-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
When zeroing a range of folios on the filesystem which block size is
less than the page size, the file's mapped blocks within one page will
be marked as unwritten, we should remove writable userspace mappings to
ensure that ext4_page_mkwrite() can be called during subsequent write
access to these partial folios. Otherwise, data written by subsequent
mmap writes may not be saved to disk.
$mkfs.ext4 -b 1024 /dev/vdb
$mount /dev/vdb /mnt
$xfs_io -t -f -c "pwrite -S 0x58 0 4096" -c "mmap -rw 0 4096" \
-c "mwrite -S 0x5a 2048 2048" -c "fzero 2048 2048" \
-c "mwrite -S 0x59 2048 2048" -c "close" /mnt/foo
$od -Ax -t x1z /mnt/foo
000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
*
000800 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59 59
*
001000
$umount /mnt && mount /dev/vdb /mnt
$od -Ax -t x1z /mnt/foo
000000 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58 58
*
000800 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
001000
Fix this by introducing ext4_truncate_page_cache_block_range() to remove
writable userspace mappings when truncating a partial folio range.
Additionally, move the journal data mode-specific handlers and
truncate_pagecache_range() into this function, allowing it to serve as a
common helper that correctly manages the page cache in preparation for
block range manipulations.
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
fs/ext4/ext4.h | 2 ++
fs/ext4/extents.c | 19 ++++-----------
fs/ext4/inode.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 69 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 74f2071189b2..8843929b46ce 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3016,6 +3016,8 @@ extern int ext4_inode_attach_jinode(struct inode *inode);
extern int ext4_can_truncate(struct inode *inode);
extern int ext4_truncate(struct inode *);
extern int ext4_break_layouts(struct inode *);
+extern int ext4_truncate_page_cache_block_range(struct inode *inode,
+ loff_t start, loff_t end);
extern int ext4_punch_hole(struct file *file, loff_t offset, loff_t length);
extern void ext4_set_inode_flags(struct inode *, bool init);
extern int ext4_alloc_da_blocks(struct inode *inode);
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index a07a98a4b97a..8dc6b4271b15 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4667,22 +4667,13 @@ static long ext4_zero_range(struct file *file, loff_t offset,
goto out_mutex;
}
- /*
- * For journalled data we need to write (and checkpoint) pages
- * before discarding page cache to avoid inconsitent data on
- * disk in case of crash before zeroing trans is committed.
- */
- if (ext4_should_journal_data(inode)) {
- ret = filemap_write_and_wait_range(mapping, start,
- end - 1);
- if (ret) {
- filemap_invalidate_unlock(mapping);
- goto out_mutex;
- }
+ /* Now release the pages and zero block aligned part of pages */
+ ret = ext4_truncate_page_cache_block_range(inode, start, end);
+ if (ret) {
+ filemap_invalidate_unlock(mapping);
+ goto out_mutex;
}
- /* Now release the pages and zero block aligned part of pages */
- truncate_pagecache_range(inode, start, end - 1);
inode_set_mtime_to_ts(inode, inode_set_ctime_current(inode));
ret = ext4_alloc_file_blocks(file, lblk, max_blocks, new_size,
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 89aade6f45f6..c68a8b841148 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -31,6 +31,7 @@
#include <linux/writeback.h>
#include <linux/pagevec.h>
#include <linux/mpage.h>
+#include <linux/rmap.h>
#include <linux/namei.h>
#include <linux/uio.h>
#include <linux/bio.h>
@@ -3902,6 +3903,67 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
return ret;
}
+static inline void ext4_truncate_folio(struct inode *inode,
+ loff_t start, loff_t end)
+{
+ unsigned long blocksize = i_blocksize(inode);
+ struct folio *folio;
+
+ /* Nothing to be done if no complete block needs to be truncated. */
+ if (round_up(start, blocksize) >= round_down(end, blocksize))
+ return;
+
+ folio = filemap_lock_folio(inode->i_mapping, start >> PAGE_SHIFT);
+ if (IS_ERR(folio))
+ return;
+
+ if (folio_mkclean(folio))
+ folio_mark_dirty(folio);
+ folio_unlock(folio);
+ folio_put(folio);
+}
+
+int ext4_truncate_page_cache_block_range(struct inode *inode,
+ loff_t start, loff_t end)
+{
+ unsigned long blocksize = i_blocksize(inode);
+ int ret;
+
+ /*
+ * For journalled data we need to write (and checkpoint) pages
+ * before discarding page cache to avoid inconsitent data on disk
+ * in case of crash before freeing or unwritten converting trans
+ * is committed.
+ */
+ if (ext4_should_journal_data(inode)) {
+ ret = filemap_write_and_wait_range(inode->i_mapping, start,
+ end - 1);
+ if (ret)
+ return ret;
+ goto truncate_pagecache;
+ }
+
+ /*
+ * If the block size is less than the page size, the file's mapped
+ * blocks within one page could be freed or converted to unwritten.
+ * So it's necessary to remove writable userspace mappings, and then
+ * ext4_page_mkwrite() can be called during subsequent write access
+ * to these partial folios.
+ */
+ if (blocksize < PAGE_SIZE && start < inode->i_size) {
+ loff_t start_boundary = round_up(start, PAGE_SIZE);
+
+ ext4_truncate_folio(inode, start, min(start_boundary, end));
+ if (end > start_boundary)
+ ext4_truncate_folio(inode,
+ round_down(end, PAGE_SIZE), end);
+ }
+
+truncate_pagecache:
+ truncate_pagecache_range(inode, start, end - 1);
+ return 0;
+}
+
static void ext4_wait_dax_page(struct inode *inode)
{
filemap_invalidate_unlock(inode->i_mapping);
--
2.46.1
next prev parent reply other threads:[~2024-12-16 1:42 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 1:39 [PATCH v4 00/10] ext4: clean up and refactor fallocate Zhang Yi
2024-12-16 1:39 ` Zhang Yi [this message]
2024-12-16 15:00 ` [PATCH v4 01/10] ext4: remove writable userspace mappings before truncating page cache Jan Kara
2024-12-17 7:05 ` Zhang Yi
2024-12-16 15:15 ` Matthew Wilcox
2024-12-17 7:38 ` Zhang Yi
2024-12-18 9:56 ` Ojaswin Mujoo
2024-12-18 13:02 ` Zhang Yi
2024-12-19 7:19 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 02/10] ext4: don't explicit update times in ext4_fallocate() Zhang Yi
2024-12-18 9:58 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 03/10] ext4: don't write back data before punch hole in nojournal mode Zhang Yi
2024-12-16 15:02 ` Jan Kara
2024-12-17 14:31 ` Ojaswin Mujoo
2024-12-17 14:50 ` Ojaswin Mujoo
2024-12-18 7:10 ` Zhang Yi
2024-12-18 10:13 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 04/10] ext4: refactor ext4_punch_hole() Zhang Yi
2024-12-16 15:07 ` Jan Kara
2024-12-18 10:17 ` Ojaswin Mujoo
2024-12-18 13:13 ` Zhang Yi
2024-12-19 7:11 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 05/10] ext4: refactor ext4_zero_range() Zhang Yi
2024-12-16 15:24 ` Jan Kara
2024-12-19 7:12 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 06/10] ext4: refactor ext4_collapse_range() Zhang Yi
2024-12-18 10:18 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 07/10] ext4: refactor ext4_insert_range() Zhang Yi
2024-12-18 10:18 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 08/10] ext4: factor out ext4_do_fallocate() Zhang Yi
2024-12-18 10:18 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 09/10] ext4: move out inode_lock into ext4_fallocate() Zhang Yi
2024-12-18 10:19 ` Ojaswin Mujoo
2024-12-16 1:39 ` [PATCH v4 10/10] ext4: move out common parts " Zhang Yi
2024-12-18 10:20 ` Ojaswin Mujoo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241216013915.3392419-2-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=tytso@mit.edu \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).