From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
ritesh.list@gmail.com, yi.zhang@huawei.com,
yi.zhang@huaweicloud.com, chengzhihao1@huawei.com,
yukuai3@huawei.com
Subject: [PATCH v3 02/10] ext4: check the extent status again before inserting delalloc block
Date: Wed, 8 May 2024 14:12:12 +0800 [thread overview]
Message-ID: <20240508061220.967970-3-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20240508061220.967970-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).
If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.
Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
lock.
But this can race between ext4_page_mkwrite() & ext4 fallocate path
ext4_page_mkwrite() ext4_fallocate()
block_page_mkwrite()
ext4_da_map_blocks()
//find hole in extent status tree
ext4_alloc_file_blocks()
ext4_map_blocks()
//allocate block and unwritten extent
ext4_insert_delayed_block()
ext4_da_reserve_space()
//reserve one more block
ext4_es_insert_delayed_block()
//drop unwritten extent and add delayed extent by mistake
Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:
EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
This patch fixes the problem by looking up extent status tree again
while the i_data_sem is held in write mode. If it still can't find
any entry, then we insert a new da entry into the extent status tree.
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6a41172c06e1..6114ca79f464 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
if (ext4_es_is_hole(&es))
goto add_delayed;
+found:
/*
* Delayed extent could be allocated by fallocate.
* So we need to check it.
@@ -1781,6 +1782,26 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
+ /*
+ * Page fault path (ext4_page_mkwrite does not take i_rwsem)
+ * and fallocate path (no folio lock) can race. Make sure we
+ * lookup the extent status tree here again while i_data_sem
+ * is held in write mode, before inserting a new da entry in
+ * the extent status tree.
+ */
+ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ if (!ext4_es_is_hole(&es)) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto found;
+ }
+ } else if (!ext4_has_inline_data(inode)) {
+ retval = ext4_map_query_blocks(NULL, inode, map);
+ if (retval) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ return retval;
+ }
+ }
+
retval = ext4_insert_delayed_block(inode, map->m_lblk);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
--
2.39.2
next prev parent reply other threads:[~2024-05-08 6:22 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-08 6:12 [PATCH v3 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
2024-05-08 6:12 ` [PATCH v3 01/10] ext4: factor out a common helper to query extent map Zhang Yi
2024-05-08 6:12 ` Zhang Yi [this message]
2024-05-08 15:02 ` [PATCH v3 02/10] ext4: check the extent status again before inserting delalloc block Markus Elfring
2024-05-09 8:26 ` Zhang Yi
2024-05-08 6:12 ` [PATCH v3 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
2024-05-12 15:10 ` Jan Kara
2024-05-13 14:17 ` Zhang Yi
2024-05-08 6:12 ` [PATCH v3 04/10] ext4: trim delalloc extent Zhang Yi
2024-05-08 15:21 ` Markus Elfring
2024-05-09 8:27 ` Zhang Yi
2024-05-08 6:12 ` [PATCH v3 05/10] ext4: drop iblock parameter Zhang Yi
2024-05-08 6:12 ` [PATCH v3 06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks Zhang Yi
2024-05-12 15:19 ` Jan Kara
2024-05-08 6:12 ` [PATCH v3 07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters Zhang Yi
2024-05-08 6:12 ` [PATCH v3 08/10] ext4: factor out check for whether a cluster is allocated Zhang Yi
2024-05-12 15:40 ` Jan Kara
2024-05-14 2:37 ` Zhang Yi
2024-05-08 6:12 ` [PATCH v3 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks Zhang Yi
2024-05-12 21:47 ` Jan Kara
2024-05-08 6:12 ` [PATCH v3 10/10] ext4: make ext4_da_map_blocks() buffer_head unaware Zhang Yi
2024-05-12 21:51 ` Jan Kara
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240508061220.967970-3-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).