All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ojaswin Mujoo <ojaswin@linux.ibm.com>
To: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org
Cc: djwong@kernel.org, john.g.garry@oracle.com, willy@infradead.org,
	hch@lst.de, ritesh.list@gmail.com, jack@suse.cz,
	Luis Chamberlain <mcgrof@kernel.org>,
	dgc@kernel.org, tytso@mit.edu, p.raghav@samsung.com,
	andres@anarazel.de, linux-kernel@vger.kernel.org
Subject: [RFC 2/3] iomap: Enable stable writes for RWF_WRITETHROUGH inodes
Date: Mon,  9 Mar 2026 23:04:32 +0530	[thread overview]
Message-ID: <3704b81046b11f8b8da0367c7c8ad8767f42e5df.1773076216.git.ojaswin@linux.ibm.com> (raw)
In-Reply-To: <cover.1773076216.git.ojaswin@linux.ibm.com>

Currently, RWF_WRITETHROUGH writes wait for writeback to complete
on a folio before performing the writethrough. This serializes
writethrough with each other and the writeback path. However, it is also
desirable have similar guarantees between RWF_WRITETHROUGH and non
writethrough writes.

Hence, ensure stable writes are enabled on an inode's mapping as
long as a writethrough write is ongoing. This way, all paths will
wait for RWF_WRITETHROUGH to complete on a folio before proceeding.

To track inflight writethrough writes, we use an atomic counter in the
inode->i_mapping. This struct was chosen because (i) writethrough is an
operation on the folio and (ii) we don't want to add bloat to struct
inode.

Suggested-by: Dave Chinner <dgc@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
---
 fs/inode.c             |  1 +
 fs/iomap/buffered-io.c | 35 +++++++++++++++++++++++++++++++++--
 fs/iomap/direct-io.c   |  2 ++
 include/linux/fs.h     |  2 ++
 include/linux/iomap.h  |  2 ++
 5 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/fs/inode.c b/fs/inode.c
index cc12b68e021b..5b779c112ff8 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -280,6 +280,7 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
 	mapping->flags = 0;
 	mapping->wb_err = 0;
 	atomic_set(&mapping->i_mmap_writable, 0);
+	atomic_set(&mapping->i_wt_count, 0);
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
 	atomic_set(&mapping->nr_thps, 0);
 #endif
diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
index ab169daa1126..9d4d459af1a0 100644
--- a/fs/iomap/buffered-io.c
+++ b/fs/iomap/buffered-io.c
@@ -1150,11 +1150,41 @@ static bool iomap_writethrough_checks(struct kiocb *iocb, size_t off, loff_t len
 	return true;
 }
 
+/**
+ * inode_writethrough_begin - signal start of a RWF_WRITETHROUGH request
+ * @inode: inode the writethrough happens on
+ *
+ * This is called when we are about to start a writethrough on an inode.
+ * If it is the first writethrough, set the mapping as stable to ensure
+ * other folio operations wait for writeback to finish.
+ *
+ * To avoid a race, just set the mapping stable first and then increment
+ * writethrough count, so that the stable writes are enforced as soon as
+ * writethrough count becomes non zero.
+ */
+inline void inode_writethrough_begin(struct inode *inode)
+{
+	mapping_set_stable_writes(inode->i_mapping);
+	atomic_inc(&inode->i_mapping->i_wt_count);
+}
+
+/**
+ * inode_writethrough_end - signal finish of a RWF_WRITETHROUGH request
+ * @inode: inode the writethrough I/O happened on
+ *
+ * This is called once we've finished processing a writethrough request
+ */
+inline void inode_writethrough_end(struct inode *inode)
+{
+	if (atomic_dec_and_test(&inode->i_mapping->i_wt_count))
+		mapping_clear_stable_writes(inode->i_mapping);
+}
+
 /*
  * With writethrough, we might potentially be writing through a partial
  * folio hence we don't clear the dirty bit (yet)
  */
-static void folio_prepare_writethrough(struct folio *folio)
+static void folio_prepare_writethrough(struct inode *inode, struct folio *folio)
 {
 	if (folio_test_writeback(folio))
 		folio_wait_writeback(folio);
@@ -1167,6 +1197,7 @@ static void folio_prepare_writethrough(struct folio *folio)
 		/* Refer folio_clear_dirty_for_io() for why this is needed */
 		folio_mark_dirty(folio);
 
+	inode_writethrough_begin(inode);
 }
 
 /**
@@ -1203,7 +1234,7 @@ static int iomap_writethrough_begin(struct kiocb *iocb, struct folio *folio,
 	bool fully_written;
 	u64 zero = 0;
 
-	folio_prepare_writethrough(folio);
+	folio_prepare_writethrough(iter->inode, folio);
 
 	wt_ctx->bvec = kmalloc(sizeof(struct bio_vec), GFP_KERNEL | GFP_NOFS);
 	if (!wt_ctx->bvec)
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index f4d8ff08a83a..12680d97d765 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -140,6 +140,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
 		kiocb_invalidate_post_direct_write(iocb, dio->size);
 
 	inode_dio_end(file_inode(iocb->ki_filp));
+	if (dio->flags & IOMAP_DIO_BUF_WRITETHROUGH)
+		inode_writethrough_end(file_inode(iocb->ki_filp));
 
 	if (ret > 0) {
 		iocb->ki_pos += ret;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ca291957140e..6b7491fdd51a 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -456,6 +456,7 @@ extern const struct address_space_operations empty_aops;
  *   memory mappings.
  * @gfp_mask: Memory allocation flags to use for allocating pages.
  * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
+ * @i_wt_count: Number of RWF_WRITETHROUGH writes ongoing in mapping.
  * @nr_thps: Number of THPs in the pagecache (non-shmem only).
  * @i_mmap: Tree of private and shared mappings.
  * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
@@ -474,6 +475,7 @@ struct address_space {
 	struct rw_semaphore	invalidate_lock;
 	gfp_t			gfp_mask;
 	atomic_t		i_mmap_writable;
+	atomic_t		i_wt_count;
 #ifdef CONFIG_READ_ONLY_THP_FOR_FS
 	/* number of thp, only for non-shmem files */
 	atomic_t		nr_thps;
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index b96574bb2918..6d08b966ceaf 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -630,6 +630,8 @@ struct iomap_writethrough_ops {
 ssize_t iomap_file_writethrough_write(struct kiocb *iocb, struct iov_iter *i,
 				      const struct iomap_writethrough_ops *wt_ops,
 				      void *private);
+inline void inode_writethrough_begin(struct inode *inode);
+inline void inode_writethrough_end(struct inode *inode);
 
 #ifdef CONFIG_SWAP
 struct file;
-- 
2.52.0


  parent reply	other threads:[~2026-03-09 17:35 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 17:34 [RFC 0/3] Add buffered write-through support to iomap & xfs Ojaswin Mujoo
2026-03-09 17:34 ` [RFC 1/3] iomap: Support buffered RWF_WRITETHROUGH via async dio backend Ojaswin Mujoo
2026-03-10  6:48   ` Dave Chinner
2026-03-11 10:35     ` Ojaswin Mujoo
2026-03-11 12:05       ` Dave Chinner
2026-03-13  7:43         ` Ojaswin Mujoo
2026-03-11  6:32   ` kernel test robot
2026-03-12  4:59   ` kernel test robot
2026-03-09 17:34 ` Ojaswin Mujoo [this message]
2026-03-10  3:57   ` [RFC 2/3] iomap: Enable stable writes for RWF_WRITETHROUGH inodes Darrick J. Wong
2026-03-10  5:25     ` Ritesh Harjani
2026-03-11  6:27       ` Ojaswin Mujoo
2026-03-09 17:34 ` [RFC 3/3] xfs: Add RWF_WRITETHROUGH support to xfs Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3704b81046b11f8b8da0367c7c8ad8767f42e5df.1773076216.git.ojaswin@linux.ibm.com \
    --to=ojaswin@linux.ibm.com \
    --cc=andres@anarazel.de \
    --cc=dgc@kernel.org \
    --cc=djwong@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.