From: Zhang Yi <yi.zhang@huaweicloud.com>
To: linux-ext4@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
tytso@mit.edu, adilger.kernel@dilger.ca, jack@suse.cz,
ritesh.list@gmail.com, yi.zhang@huawei.com,
yi.zhang@huaweicloud.com, chengzhihao1@huawei.com,
yukuai3@huawei.com
Subject: [PATCH v5 02/10] ext4: check the extent status again before inserting delalloc block
Date: Fri, 17 May 2024 20:39:57 +0800 [thread overview]
Message-ID: <20240517124005.347221-3-yi.zhang@huaweicloud.com> (raw)
In-Reply-To: <20240517124005.347221-1-yi.zhang@huaweicloud.com>
From: Zhang Yi <yi.zhang@huawei.com>
ext4_da_map_blocks looks up for any extent entry in the extent status
tree (w/o i_data_sem) and then the looks up for any ondisk extent
mapping (with i_data_sem in read mode).
If it finds a hole in the extent status tree or if it couldn't find any
entry at all, it then takes the i_data_sem in write mode to add a da
entry into the extent status tree. This can actually race with page
mkwrite & fallocate path.
Note that this is ok between
1. ext4 buffered-write path v/s ext4_page_mkwrite(), because of the
folio lock
2. ext4 buffered write path v/s ext4 fallocate because of the inode
lock.
But this can race between ext4_page_mkwrite() & ext4 fallocate path
ext4_page_mkwrite() ext4_fallocate()
block_page_mkwrite()
ext4_da_map_blocks()
//find hole in extent status tree
ext4_alloc_file_blocks()
ext4_map_blocks()
//allocate block and unwritten extent
ext4_insert_delayed_block()
ext4_da_reserve_space()
//reserve one more block
ext4_es_insert_delayed_block()
//drop unwritten extent and add delayed extent by mistake
Then, the delalloc extent is wrong until writeback and the extra
reserved block can't be released any more and it triggers below warning:
EXT4-fs (pmem2): Inode 13 (00000000bbbd4d23): i_reserved_data_blocks(1) not cleared!
Fix the problem by looking up extent status tree again while the
i_data_sem is held in write mode. If it still can't find any entry, then
we insert a new da entry into the extent status tree.
Cc: stable@vger.kernel.org
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 6a41172c06e1..6114ca79f464 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1737,6 +1737,7 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
if (ext4_es_is_hole(&es))
goto add_delayed;
+found:
/*
* Delayed extent could be allocated by fallocate.
* So we need to check it.
@@ -1781,6 +1782,26 @@ static int ext4_da_map_blocks(struct inode *inode, sector_t iblock,
add_delayed:
down_write(&EXT4_I(inode)->i_data_sem);
+ /*
+ * Page fault path (ext4_page_mkwrite does not take i_rwsem)
+ * and fallocate path (no folio lock) can race. Make sure we
+ * lookup the extent status tree here again while i_data_sem
+ * is held in write mode, before inserting a new da entry in
+ * the extent status tree.
+ */
+ if (ext4_es_lookup_extent(inode, iblock, NULL, &es)) {
+ if (!ext4_es_is_hole(&es)) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ goto found;
+ }
+ } else if (!ext4_has_inline_data(inode)) {
+ retval = ext4_map_query_blocks(NULL, inode, map);
+ if (retval) {
+ up_write(&EXT4_I(inode)->i_data_sem);
+ return retval;
+ }
+ }
+
retval = ext4_insert_delayed_block(inode, map->m_lblk);
up_write(&EXT4_I(inode)->i_data_sem);
if (retval)
--
2.39.2
next prev parent reply other threads:[~2024-05-17 12:50 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-17 12:39 [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Zhang Yi
2024-05-17 12:39 ` [PATCH v5 01/10] ext4: factor out a common helper to query extent map Zhang Yi
2024-05-17 16:19 ` Markus Elfring
2024-05-17 12:39 ` Zhang Yi [this message]
2024-05-17 12:39 ` [PATCH v5 03/10] ext4: warn if delalloc counters are not zero on inactive Zhang Yi
2024-05-20 9:35 ` Jan Kara
2024-09-24 3:25 ` Lai, Yi
2024-09-24 8:38 ` Zhang Yi
2024-09-25 9:52 ` Lai, Yi
2024-09-25 11:34 ` Zhang Yi
2024-05-17 12:39 ` [PATCH v5 04/10] ext4: trim delalloc extent Zhang Yi
2024-05-17 12:40 ` [PATCH v5 05/10] ext4: drop iblock parameter Zhang Yi
2024-05-17 12:40 ` [PATCH v5 06/10] ext4: make ext4_es_insert_delayed_block() insert multi-blocks Zhang Yi
2024-05-17 12:40 ` [PATCH v5 07/10] ext4: make ext4_da_reserve_space() reserve multi-clusters Zhang Yi
2024-05-17 12:40 ` [PATCH v5 08/10] ext4: factor out a helper to check the cluster allocation state Zhang Yi
2024-05-20 9:37 ` Jan Kara
2024-05-17 12:40 ` [PATCH v5 09/10] ext4: make ext4_insert_delayed_block() insert multi-blocks Zhang Yi
2024-05-20 9:39 ` Jan Kara
2024-05-17 12:40 ` [PATCH v5 10/10] ext4: make ext4_da_map_blocks() buffer_head unaware Zhang Yi
2024-06-28 17:17 ` [PATCH v5 00/10] ext4: support adding multi-delalloc blocks Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240517124005.347221-3-yi.zhang@huaweicloud.com \
--to=yi.zhang@huaweicloud.com \
--cc=adilger.kernel@dilger.ca \
--cc=chengzhihao1@huawei.com \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ritesh.list@gmail.com \
--cc=tytso@mit.edu \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).