From: Jan Kara <jack@suse.cz>
To: <linux-fsdevel@vger.kernel.org>
Cc: <linux-block@vger.kernel.org>,
Christian Brauner <brauner@kernel.org>,
Al Viro <viro@ZenIV.linux.org.uk>, <linux-ext4@vger.kernel.org>,
Ted Tso <tytso@mit.edu>,
"Tigran A. Aivazian" <aivazian.tigran@gmail.com>,
David Sterba <dsterba@suse.com>,
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>,
Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
David Hildenbrand <david@kernel.org>,
linux-mm@kvack.org, linux-aio@kvack.org,
Benjamin LaHaise <bcrl@kvack.org>, Jan Kara <jack@suse.cz>
Subject: [PATCH 33/42] fs: Provide functions for handling mapping_metadata_bhs directly
Date: Thu, 26 Mar 2026 10:54:27 +0100 [thread overview]
Message-ID: <20260326095354.16340-75-jack@suse.cz> (raw)
In-Reply-To: <20260326082428.31660-1-jack@suse.cz>
As part of transition toward moving mapping_metadata_bhs to fs-private
part of the inode, provide functions for operations on this list
directly instead of going through the inode / mapping.
Signed-off-by: Jan Kara <jack@suse.cz>
---
fs/buffer.c | 110 +++++++++++++++++-------------------
include/linux/buffer_head.h | 44 ++++++++++++---
2 files changed, 87 insertions(+), 67 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index b0436481d0f1..cbed175f418b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -467,31 +467,25 @@ EXPORT_SYMBOL(mark_buffer_async_write);
* a successful fsync(). For example, ext2 indirect blocks need to be
* written back and waited upon before fsync() returns.
*
- * The functions mark_buffer_dirty_inode(), fsync_inode_buffers(),
- * mmb_has_buffers() and invalidate_inode_buffers() are provided for the
- * management of a list of dependent buffers in mapping_metadata_bhs struct.
+ * The functions mmb_mark_buffer_dirty(), mmb_sync(), mmb_has_buffers()
+ * and mmb_invalidate() are provided for the management of a list of dependent
+ * buffers in mapping_metadata_bhs struct.
*
* The locking is a little subtle: The list of buffer heads is protected by
* the lock in mapping_metadata_bhs so functions coming from bdev mapping
* (such as try_to_free_buffers()) need to safely get to mapping_metadata_bhs
* using RCU, grab the lock, verify we didn't race with somebody detaching the
* bh / moving it to different inode and only then proceeding.
- *
- * FIXME: mark_buffer_dirty_inode() is a data-plane operation. It should
- * take an address_space, not an inode. And it should be called
- * mark_buffer_dirty_fsync() to clearly define why those buffers are being
- * queued up.
- *
- * FIXME: mark_buffer_dirty_inode() doesn't need to add the buffer to the
- * list if it is already on a list. Because if the buffer is on a list,
- * it *must* already be on the right one. If not, the filesystem is being
- * silly. This will save a ton of locking. But first we have to ensure
- * that buffers are taken *off* the old inode's list when they are freed
- * (presumably in truncate). That requires careful auditing of all
- * filesystems (do it inside bforget()). It could also be done by bringing
- * b_inode back.
*/
+void mmb_init(struct mapping_metadata_bhs *mmb, struct address_space *mapping)
+{
+ spin_lock_init(&mmb->lock);
+ INIT_LIST_HEAD(&mmb->list);
+ mmb->mapping = mapping;
+}
+EXPORT_SYMBOL(mmb_init);
+
static void __remove_assoc_queue(struct mapping_metadata_bhs *mmb,
struct buffer_head *bh)
{
@@ -533,12 +527,12 @@ bool mmb_has_buffers(struct mapping_metadata_bhs *mmb)
EXPORT_SYMBOL_GPL(mmb_has_buffers);
/**
- * sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
- * @mapping: the mapping which wants those buffers written
+ * mmb_sync - write out & wait upon all buffers in a list
+ * @mmb: the list of buffers to write
*
- * Starts I/O against the buffers at mapping->i_metadata_bhs and waits upon
- * that I/O. Basically, this is a convenience function for fsync(). @mapping
- * is a file or directory which needs those buffers to be written for a
+ * Starts I/O against the buffers in the given list and waits upon
+ * that I/O. Basically, this is a convenience function for fsync(). @mmb is
+ * for a file or directory which needs those buffers to be written for a
* successful fsync().
*
* We have conflicting pressures: we want to make sure that all
@@ -553,9 +547,8 @@ EXPORT_SYMBOL_GPL(mmb_has_buffers);
* buffer stays on our list until IO completes (at which point it can be
* reaped).
*/
-int sync_mapping_buffers(struct address_space *mapping)
+int mmb_sync(struct mapping_metadata_bhs *mmb)
{
- struct mapping_metadata_bhs *mmb = &mapping->i_metadata_bhs;
struct buffer_head *bh;
int err = 0;
struct blk_plug plug;
@@ -626,33 +619,35 @@ int sync_mapping_buffers(struct address_space *mapping)
spin_unlock(&mmb->lock);
return err;
}
-EXPORT_SYMBOL(sync_mapping_buffers);
+EXPORT_SYMBOL(mmb_sync);
/**
- * generic_buffers_fsync_noflush - generic buffer fsync implementation
- * for simple filesystems with no inode lock
+ * mmb_fsync_noflush - fsync implementation for simple filesystems with
+ * metadata buffers list
*
* @file: file to synchronize
+ * @mmb: list of metadata bhs to flush
* @start: start offset in bytes
* @end: end offset in bytes (inclusive)
* @datasync: only synchronize essential metadata if true
*
- * This is a generic implementation of the fsync method for simple
- * filesystems which track all non-inode metadata in the buffers list
- * hanging off the address_space structure.
+ * This is an implementation of the fsync method for simple filesystems which
+ * track all non-inode metadata in the buffers list hanging off the @mmb
+ * structure.
*/
-int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end,
- bool datasync)
+int mmb_fsync_noflush(struct file *file, struct mapping_metadata_bhs *mmb,
+ loff_t start, loff_t end, bool datasync)
{
struct inode *inode = file->f_mapping->host;
int err;
- int ret;
+ int ret = 0;
err = file_write_and_wait_range(file, start, end);
if (err)
return err;
- ret = sync_mapping_buffers(inode->i_mapping);
+ if (mmb)
+ ret = mmb_sync(mmb);
if (!(inode_state_read_once(inode) & I_DIRTY_ALL))
goto out;
if (datasync && !(inode_state_read_once(inode) & I_DIRTY_DATASYNC))
@@ -669,34 +664,35 @@ int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end,
ret = err;
return ret;
}
-EXPORT_SYMBOL(generic_buffers_fsync_noflush);
+EXPORT_SYMBOL(mmb_fsync_noflush);
/**
- * generic_buffers_fsync - generic buffer fsync implementation
- * for simple filesystems with no inode lock
+ * mmb_fsync - fsync implementation for simple filesystems with metadata
+ * buffers list
*
* @file: file to synchronize
+ * @mmb: list of metadata bhs to flush
* @start: start offset in bytes
* @end: end offset in bytes (inclusive)
* @datasync: only synchronize essential metadata if true
*
- * This is a generic implementation of the fsync method for simple
- * filesystems which track all non-inode metadata in the buffers list
- * hanging off the address_space structure. This also makes sure that
- * a device cache flush operation is called at the end.
+ * This is an implementation of the fsync method for simple filesystems which
+ * track all non-inode metadata in the buffers list hanging off the @mmb
+ * structure. This also makes sure that a device cache flush operation is
+ * called at the end.
*/
-int generic_buffers_fsync(struct file *file, loff_t start, loff_t end,
- bool datasync)
+int mmb_fsync(struct file *file, struct mapping_metadata_bhs *mmb,
+ loff_t start, loff_t end, bool datasync)
{
struct inode *inode = file->f_mapping->host;
int ret;
- ret = generic_buffers_fsync_noflush(file, start, end, datasync);
+ ret = mmb_fsync_noflush(file, mmb, start, end, datasync);
if (!ret)
ret = blkdev_issue_flush(inode->i_sb->s_bdev);
return ret;
}
-EXPORT_SYMBOL(generic_buffers_fsync);
+EXPORT_SYMBOL(mmb_fsync);
/*
* Called when we've recently written block `bblock', and it is known that
@@ -717,20 +713,18 @@ void write_boundary_block(struct block_device *bdev,
}
}
-void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode)
+void mmb_mark_buffer_dirty(struct buffer_head *bh,
+ struct mapping_metadata_bhs *mmb)
{
- struct address_space *mapping = inode->i_mapping;
-
mark_buffer_dirty(bh);
if (!bh->b_mmb) {
- spin_lock(&mapping->i_metadata_bhs.lock);
- list_move_tail(&bh->b_assoc_buffers,
- &mapping->i_metadata_bhs.list);
- bh->b_mmb = &mapping->i_metadata_bhs;
- spin_unlock(&mapping->i_metadata_bhs.lock);
+ spin_lock(&mmb->lock);
+ list_move_tail(&bh->b_assoc_buffers, &mmb->list);
+ bh->b_mmb = mmb;
+ spin_unlock(&mmb->lock);
}
}
-EXPORT_SYMBOL(mark_buffer_dirty_inode);
+EXPORT_SYMBOL(mmb_mark_buffer_dirty);
/**
* block_dirty_folio - Mark a folio as dirty.
@@ -797,14 +791,12 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio)
EXPORT_SYMBOL(block_dirty_folio);
/*
- * Invalidate any and all dirty buffers on a given inode. We are
+ * Invalidate any and all dirty buffers on a given buffers list. We are
* probably unmounting the fs, but that doesn't mean we have already
* done a sync(). Just drop the buffers from the inode list.
*/
-void invalidate_inode_buffers(struct inode *inode)
+void mmb_invalidate(struct mapping_metadata_bhs *mmb)
{
- struct mapping_metadata_bhs *mmb = &inode->i_data.i_metadata_bhs;
-
if (mmb_has_buffers(mmb)) {
spin_lock(&mmb->lock);
while (!list_empty(&mmb->list))
@@ -812,7 +804,7 @@ void invalidate_inode_buffers(struct inode *inode)
spin_unlock(&mmb->lock);
}
}
-EXPORT_SYMBOL(invalidate_inode_buffers);
+EXPORT_SYMBOL(mmb_invalidate);
/*
* Create the appropriate buffers when given a folio for data area and
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index 44094fd476f5..e207dcca7a25 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -205,12 +205,30 @@ struct buffer_head *create_empty_buffers(struct folio *folio,
void end_buffer_read_sync(struct buffer_head *bh, int uptodate);
void end_buffer_write_sync(struct buffer_head *bh, int uptodate);
-/* Things to do with buffers at mapping->private_list */
-void mark_buffer_dirty_inode(struct buffer_head *bh, struct inode *inode);
-int generic_buffers_fsync_noflush(struct file *file, loff_t start, loff_t end,
- bool datasync);
-int generic_buffers_fsync(struct file *file, loff_t start, loff_t end,
- bool datasync);
+/* Things to do with metadata buffers list */
+void mmb_mark_buffer_dirty(struct buffer_head *bh, struct mapping_metadata_bhs *mmb);
+static inline void mark_buffer_dirty_inode(struct buffer_head *bh,
+ struct inode *inode)
+{
+ mmb_mark_buffer_dirty(bh, &inode->i_data.i_metadata_bhs);
+}
+int mmb_fsync_noflush(struct file *file, struct mapping_metadata_bhs *mmb,
+ loff_t start, loff_t end, bool datasync);
+static inline int generic_buffers_fsync_noflush(struct file *file,
+ loff_t start, loff_t end,
+ bool datasync)
+{
+ return mmb_fsync_noflush(file, &file->f_mapping->i_metadata_bhs,
+ start, end, datasync);
+}
+int mmb_fsync(struct file *file, struct mapping_metadata_bhs *mmb,
+ loff_t start, loff_t end, bool datasync);
+static inline int generic_buffers_fsync(struct file *file,
+ loff_t start, loff_t end, bool datasync)
+{
+ return mmb_fsync(file, &file->f_mapping->i_metadata_bhs,
+ start, end, datasync);
+}
void clean_bdev_aliases(struct block_device *bdev, sector_t block,
sector_t len);
static inline void clean_bdev_bh_alias(struct buffer_head *bh)
@@ -515,9 +533,18 @@ bool block_dirty_folio(struct address_space *mapping, struct folio *folio);
void buffer_init(void);
bool try_to_free_buffers(struct folio *folio);
+void mmb_init(struct mapping_metadata_bhs *mmb, struct address_space *mapping);
bool mmb_has_buffers(struct mapping_metadata_bhs *mmb);
-void invalidate_inode_buffers(struct inode *inode);
-int sync_mapping_buffers(struct address_space *mapping);
+void mmb_invalidate(struct mapping_metadata_bhs *mmb);
+int mmb_sync(struct mapping_metadata_bhs *mmb);
+static inline void invalidate_inode_buffers(struct inode *inode)
+{
+ mmb_invalidate(&inode->i_data.i_metadata_bhs);
+}
+static inline int sync_mapping_buffers(struct address_space *mapping)
+{
+ return mmb_sync(&mapping->i_metadata_bhs);
+}
void invalidate_bh_lrus(void);
void invalidate_bh_lrus_cpu(void);
bool has_bh_in_lru(int cpu, void *dummy);
@@ -527,6 +554,7 @@ extern int buffer_heads_over_limit;
static inline void buffer_init(void) {}
static inline bool try_to_free_buffers(struct folio *folio) { return true; }
+static inline int mmb_sync(struct mapping_metadata_bhs *mmb) { return 0; }
static inline void invalidate_inode_buffers(struct inode *inode) {}
static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; }
static inline void invalidate_bh_lrus(void) {}
--
2.51.0
next prev parent reply other threads:[~2026-03-26 9:56 UTC|newest]
Thread overview: 44+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-26 9:53 [PATCH v3 0/42] fs: Move metadata bh tracking from address_space Jan Kara
2026-03-26 9:53 ` [PATCH 01/42] ext4: Use inode_has_buffers() Jan Kara
2026-03-26 9:53 ` [PATCH 02/42] gfs2: Don't zero i_private_data Jan Kara
2026-03-26 9:53 ` [PATCH 03/42] ntfs3: Drop pointless sync_mapping_buffers() and invalidate_inode_buffers() calls Jan Kara
2026-03-26 9:53 ` [PATCH 04/42] ocfs2: Drop pointless sync_mapping_buffers() calls Jan Kara
2026-03-26 9:53 ` [PATCH 05/42] bdev: Drop pointless invalidate_inode_buffers() call Jan Kara
2026-03-26 9:54 ` [PATCH 06/42] ufs: Drop pointless invalidate_mapping_buffers() call Jan Kara
2026-03-26 9:54 ` [PATCH 07/42] exfat: Drop pointless invalidate_inode_buffers() call Jan Kara
2026-03-26 9:54 ` [PATCH 08/42] fs: Remove inode lock from __generic_file_fsync() Jan Kara
2026-03-26 9:54 ` [PATCH 09/42] udf: Switch to generic_buffers_fsync() Jan Kara
2026-03-26 9:54 ` [PATCH 10/42] minix: " Jan Kara
2026-03-26 9:54 ` [PATCH 11/42] bfs: " Jan Kara
2026-03-26 9:54 ` [PATCH 12/42] fat: Switch to generic_buffers_fsync_noflush() Jan Kara
2026-03-26 9:54 ` [PATCH 13/42] fs: Drop sync_mapping_buffers() from __generic_file_fsync() Jan Kara
2026-03-26 9:54 ` [PATCH 14/42] fs: Rename generic_file_fsync() to simple_fsync() Jan Kara
2026-03-26 9:54 ` [PATCH 15/42] fat: Sync and invalidate metadata buffers from fat_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 16/42] udf: Sync and invalidate metadata buffers from udf_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 17/42] minix: Sync and invalidate metadata buffers from minix_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 18/42] ext2: Sync and invalidate metadata buffers from ext2_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 19/42] ext4: Sync and invalidate metadata buffers from ext4_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 20/42] bfs: Sync and invalidate metadata buffers from bfs_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 21/42] affs: Sync and invalidate metadata buffers from affs_evict_inode() Jan Kara
2026-03-26 9:54 ` [PATCH 22/42] fs: Ignore inode metadata buffers in inode_lru_isolate() Jan Kara
2026-03-26 9:54 ` [PATCH 23/42] fs: Stop using i_private_data for metadata bh tracking Jan Kara
2026-03-26 9:54 ` [PATCH 24/42] hugetlbfs: Stop using i_private_data Jan Kara
2026-03-26 9:54 ` [PATCH 25/42] aio: Stop using i_private_data and i_private_lock Jan Kara
2026-03-26 9:54 ` [PATCH 26/42] fs: Remove i_private_data Jan Kara
2026-03-26 9:54 ` [PATCH 27/42] kvm: Use private inode list instead of i_private_list Jan Kara
2026-03-26 9:54 ` [PATCH 28/42] fs: Drop osync_buffers_list() Jan Kara
2026-03-26 9:54 ` [PATCH 29/42] fs: Fold fsync_buffers_list() into sync_mapping_buffers() Jan Kara
2026-03-26 9:54 ` [PATCH 30/42] fs: Move metadata bhs tracking to a separate struct Jan Kara
2026-03-26 9:54 ` [PATCH 31/42] fs: Make bhs point to mapping_metadata_bhs Jan Kara
2026-03-26 9:54 ` [PATCH 32/42] fs: Switch inode_has_buffers() to take mapping_metadata_bhs Jan Kara
2026-03-26 9:54 ` Jan Kara [this message]
2026-03-26 9:54 ` [PATCH 34/42] ext2: Track metadata bhs in fs-private inode part Jan Kara
2026-03-26 9:54 ` [PATCH 35/42] affs: " Jan Kara
2026-03-26 9:54 ` [PATCH 36/42] bfs: " Jan Kara
2026-03-26 9:54 ` [PATCH 37/42] fat: " Jan Kara
2026-03-26 9:54 ` [PATCH 38/42] udf: " Jan Kara
2026-03-26 9:54 ` [PATCH 39/42] minix: " Jan Kara
2026-03-26 9:54 ` [PATCH 40/42] ext4: " Jan Kara
2026-03-26 9:54 ` [PATCH 41/42] fs: Drop mapping_metadata_bhs from address space Jan Kara
2026-03-26 9:54 ` [PATCH 42/42] fs: Drop i_private_list from address_space Jan Kara
2026-03-26 14:06 ` [PATCH v3 0/42] fs: Move metadata bh tracking " Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260326095354.16340-75-jack@suse.cz \
--to=jack@suse.cz \
--cc=aivazian.tigran@gmail.com \
--cc=bcrl@kvack.org \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=dsterba@suse.com \
--cc=hirofumi@mail.parknet.co.jp \
--cc=linux-aio@kvack.org \
--cc=linux-block@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
--cc=tytso@mit.edu \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox