From: Jan Kara <jack@suse.cz>
To: Ted Tso <tytso@mit.edu>
Cc: <linux-ext4@vger.kernel.org>, Jan Kara <jack@suse.cz>
Subject: [PATCH 05/13] ext4: Commit transaction before writing back pages in data=journal mode
Date: Wed, 29 Mar 2023 17:49:36 +0200 [thread overview]
Message-ID: <20230329154950.19720-5-jack@suse.cz> (raw)
In-Reply-To: <20230329125740.4127-1-jack@suse.cz>
When journalling data we currently just walk over pages, journal those
that are marked for delayed dirtying (only pinned pages dirtied behing
our back these days) and checkpoint other dirty pages. Because some
pages may be part of running transaction the result is that after
filemap_write_and_wait() we are not guaranteed pages are stable on disk.
Thus places that want to flush current pagecache content need to jump
through hoops to make sure journalled data is not lost. This is
manageable in cases completely controlled by ext4 (such as extent
shifting operations or inode eviction) but it gets ugly for stuff like
fsverity. Furthermore it is rather error prone as people often do not
realize journalled data needs special handling.
So change ext4_writepages() to commit transaction with inode's data
before going through the writeback loop in WB_SYNC_ALL mode. As a result
filemap_write_and_wait() is now really getting pages to stable storage
and makes pagecache pages safe to reclaim. Consequently we can remove
the special handling of journalled data from several places in follow up
patches.
Note that this will make fsync(2) for journalled data more expensive as
we will end up not only committing the transaction we need but also
checkpointing the data (which we may have previously skipped if the data
was part of the running transaction). If we really cared, we would need
to introduce special VFS function for writing out & invalidating page
cache for a range, use ->launder_page callback to perform checkpointing,
and use it from all the places that need this functionality. But at this
point I'm not convinced the complexity is worth it.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/ext4/inode.c | 26 ++++++++++++++++++++++++--
1 file changed, 24 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 85299c90b0f7..3ab2d56b6840 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1562,6 +1562,7 @@ struct mpage_da_data {
struct ext4_io_submit io_submit; /* IO submission data */
unsigned int do_map:1;
unsigned int scanned_until_end:1;
+ unsigned int journalled_more_data:1;
};
static void mpage_release_unused_pages(struct mpage_da_data *mpd,
@@ -2539,6 +2540,7 @@ static int mpage_prepare_extent_to_map(struct mpage_da_data *mpd)
mpd, &folio->page);
if (err < 0)
goto out;
+ mpd->journalled_more_data = 1;
}
mpage_page_done(mpd, &folio->page);
} else {
@@ -2628,10 +2630,23 @@ static int ext4_do_writepages(struct mpage_da_data *mpd)
/*
* data=journal mode does not do delalloc so we just need to writeout /
- * journal already mapped buffers
+ * journal already mapped buffers. On the other hand we need to commit
+ * transaction to make data stable. We expect all the data to be
+ * already in the journal (the only exception are DMA pinned pages
+ * dirtied behind our back) so we commit transaction here and run the
+ * writeback loop to checkpoint them. The checkpointing is not actually
+ * necessary to make data persistent *but* quite a few places (extent
+ * shifting operations, fsverity, ...) depend on being able to drop
+ * pagecache pages after calling filemap_write_and_wait() and for that
+ * checkpointing needs to happen.
*/
- if (ext4_should_journal_data(inode))
+ if (ext4_should_journal_data(inode)) {
mpd->can_map = 0;
+ if (wbc->sync_mode == WB_SYNC_ALL)
+ ext4_fc_commit(sbi->s_journal,
+ EXT4_I(inode)->i_datasync_tid);
+ }
+ mpd->journalled_more_data = 0;
if (ext4_should_dioread_nolock(inode)) {
/*
@@ -2812,6 +2827,13 @@ static int ext4_writepages(struct address_space *mapping,
percpu_down_read(&EXT4_SB(sb)->s_writepages_rwsem);
ret = ext4_do_writepages(&mpd);
+ /*
+ * For data=journal writeback we could have come across pages marked
+ * for delayed dirtying (PageChecked) which were just added to the
+ * running transaction. Try once more to get them to stable storage.
+ */
+ if (!ret && mpd.journalled_more_data)
+ ret = ext4_do_writepages(&mpd);
percpu_up_read(&EXT4_SB(sb)->s_writepages_rwsem);
return ret;
--
2.35.3
next prev parent reply other threads:[~2023-03-29 15:50 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-29 15:49 [PATCH 0/13 v1] ext4: Make ext4_writepages() write all journalled data Jan Kara
2023-03-29 15:49 ` [PATCH 01/13] jdb2: Don't refuse invalidation of already invalidated buffers Jan Kara
2023-03-29 15:49 ` [PATCH 02/13] ext4: Mark pages with journalled data dirty Jan Kara
2023-03-29 15:49 ` [PATCH 03/13] ext4: Keep " Jan Kara
2023-03-29 15:49 ` [PATCH 04/13] ext4: Clear dirty bit from pages without data to write Jan Kara
2023-03-29 15:49 ` Jan Kara [this message]
2023-03-29 15:49 ` [PATCH 06/13] ext4: Drop special handling of journalled data from ext4_sync_file() Jan Kara
2023-03-30 0:05 ` Christoph Hellwig
2023-03-30 8:21 ` Jan Kara
2023-03-29 15:49 ` [PATCH 07/13] ext4: Drop special handling of journalled data from extent shifting operations Jan Kara
2023-03-29 15:49 ` [PATCH 08/13] ext4: Fix special handling of journalled data from extent zeroing Jan Kara
2023-03-29 15:49 ` [PATCH 09/13] ext4: Drop special handling of journalled data from ext4_evict_inode() Jan Kara
2023-03-29 15:49 ` [PATCH 10/13] ext4: Drop special handling of journalled data from ext4_quota_on() Jan Kara
2023-03-29 15:49 ` [PATCH 11/13] ext4: Simplify handling of journalled data in ext4_bmap() Jan Kara
2023-03-29 15:49 ` [PATCH 12/13] ext4: Update comment in mpage_prepare_extent_to_map() Jan Kara
2023-03-29 15:49 ` [PATCH 13/13] Revert "ext4: Fix warnings when freezing filesystem with journaled data" Jan Kara
2023-04-15 3:07 ` [PATCH 0/13 v1] ext4: Make ext4_writepages() write all journalled data Theodore Ts'o
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230329154950.19720-5-jack@suse.cz \
--to=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox