All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Cc: linux-xfs@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	john.g.garry@oracle.com, willy@infradead.org, hch@lst.de,
	ritesh.list@gmail.com, jack@suse.cz,
	Luis Chamberlain <mcgrof@kernel.org>,
	dgc@kernel.org, tytso@mit.edu, p.raghav@samsung.com,
	andres@anarazel.de, linux-kernel@vger.kernel.org
Subject: Re: [RFC 2/3] iomap: Enable stable writes for RWF_WRITETHROUGH inodes
Date: Mon, 9 Mar 2026 20:57:19 -0700	[thread overview]
Message-ID: <20260310035719.GI1105363@frogsfrogsfrogs> (raw)
In-Reply-To: <3704b81046b11f8b8da0367c7c8ad8767f42e5df.1773076216.git.ojaswin@linux.ibm.com>

On Mon, Mar 09, 2026 at 11:04:32PM +0530, Ojaswin Mujoo wrote:
> Currently, RWF_WRITETHROUGH writes wait for writeback to complete
> on a folio before performing the writethrough. This serializes
> writethrough with each other and the writeback path. However, it is also
> desirable have similar guarantees between RWF_WRITETHROUGH and non
> writethrough writes.
> 
> Hence, ensure stable writes are enabled on an inode's mapping as
> long as a writethrough write is ongoing. This way, all paths will
> wait for RWF_WRITETHROUGH to complete on a folio before proceeding.
> 
> To track inflight writethrough writes, we use an atomic counter in the
> inode->i_mapping. This struct was chosen because (i) writethrough is an
> operation on the folio and (ii) we don't want to add bloat to struct
> inode.

What if we just set it whenever someone successfully initiates a
RWF_WRITETHROUGH write?  Then we wouldn't need all this atomic counter
machinery.

Also: What if some filesystem (not xfs, obviously) finds a need to
change the stablepages bit while there might be writethrough writes in
progress?  It's a little awkward to have a flag /and/ a counter; why not
change mapping_{set,clear}_stable_pages to inc and dec the counter and
base the test off that?

--D

> Suggested-by: Dave Chinner <dgc@kernel.org>
> Suggested-by: Jan Kara <jack@suse.cz>
> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
> ---
>  fs/inode.c             |  1 +
>  fs/iomap/buffered-io.c | 35 +++++++++++++++++++++++++++++++++--
>  fs/iomap/direct-io.c   |  2 ++
>  include/linux/fs.h     |  2 ++
>  include/linux/iomap.h  |  2 ++
>  5 files changed, 40 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index cc12b68e021b..5b779c112ff8 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -280,6 +280,7 @@ int inode_init_always_gfp(struct super_block *sb, struct inode *inode, gfp_t gfp
>  	mapping->flags = 0;
>  	mapping->wb_err = 0;
>  	atomic_set(&mapping->i_mmap_writable, 0);
> +	atomic_set(&mapping->i_wt_count, 0);
>  #ifdef CONFIG_READ_ONLY_THP_FOR_FS
>  	atomic_set(&mapping->nr_thps, 0);
>  #endif
> diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c
> index ab169daa1126..9d4d459af1a0 100644
> --- a/fs/iomap/buffered-io.c
> +++ b/fs/iomap/buffered-io.c
> @@ -1150,11 +1150,41 @@ static bool iomap_writethrough_checks(struct kiocb *iocb, size_t off, loff_t len
>  	return true;
>  }
>  
> +/**
> + * inode_writethrough_begin - signal start of a RWF_WRITETHROUGH request
> + * @inode: inode the writethrough happens on
> + *
> + * This is called when we are about to start a writethrough on an inode.
> + * If it is the first writethrough, set the mapping as stable to ensure
> + * other folio operations wait for writeback to finish.
> + *
> + * To avoid a race, just set the mapping stable first and then increment
> + * writethrough count, so that the stable writes are enforced as soon as
> + * writethrough count becomes non zero.
> + */
> +inline void inode_writethrough_begin(struct inode *inode)
> +{
> +	mapping_set_stable_writes(inode->i_mapping);
> +	atomic_inc(&inode->i_mapping->i_wt_count);
> +}
> +
> +/**
> + * inode_writethrough_end - signal finish of a RWF_WRITETHROUGH request
> + * @inode: inode the writethrough I/O happened on
> + *
> + * This is called once we've finished processing a writethrough request
> + */
> +inline void inode_writethrough_end(struct inode *inode)
> +{
> +	if (atomic_dec_and_test(&inode->i_mapping->i_wt_count))
> +		mapping_clear_stable_writes(inode->i_mapping);
> +}
> +
>  /*
>   * With writethrough, we might potentially be writing through a partial
>   * folio hence we don't clear the dirty bit (yet)
>   */
> -static void folio_prepare_writethrough(struct folio *folio)
> +static void folio_prepare_writethrough(struct inode *inode, struct folio *folio)
>  {
>  	if (folio_test_writeback(folio))
>  		folio_wait_writeback(folio);
> @@ -1167,6 +1197,7 @@ static void folio_prepare_writethrough(struct folio *folio)
>  		/* Refer folio_clear_dirty_for_io() for why this is needed */
>  		folio_mark_dirty(folio);
>  
> +	inode_writethrough_begin(inode);
>  }
>  
>  /**
> @@ -1203,7 +1234,7 @@ static int iomap_writethrough_begin(struct kiocb *iocb, struct folio *folio,
>  	bool fully_written;
>  	u64 zero = 0;
>  
> -	folio_prepare_writethrough(folio);
> +	folio_prepare_writethrough(iter->inode, folio);
>  
>  	wt_ctx->bvec = kmalloc(sizeof(struct bio_vec), GFP_KERNEL | GFP_NOFS);
>  	if (!wt_ctx->bvec)
> diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
> index f4d8ff08a83a..12680d97d765 100644
> --- a/fs/iomap/direct-io.c
> +++ b/fs/iomap/direct-io.c
> @@ -140,6 +140,8 @@ ssize_t iomap_dio_complete(struct iomap_dio *dio)
>  		kiocb_invalidate_post_direct_write(iocb, dio->size);
>  
>  	inode_dio_end(file_inode(iocb->ki_filp));
> +	if (dio->flags & IOMAP_DIO_BUF_WRITETHROUGH)
> +		inode_writethrough_end(file_inode(iocb->ki_filp));
>  
>  	if (ret > 0) {
>  		iocb->ki_pos += ret;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ca291957140e..6b7491fdd51a 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -456,6 +456,7 @@ extern const struct address_space_operations empty_aops;
>   *   memory mappings.
>   * @gfp_mask: Memory allocation flags to use for allocating pages.
>   * @i_mmap_writable: Number of VM_SHARED, VM_MAYWRITE mappings.
> + * @i_wt_count: Number of RWF_WRITETHROUGH writes ongoing in mapping.
>   * @nr_thps: Number of THPs in the pagecache (non-shmem only).
>   * @i_mmap: Tree of private and shared mappings.
>   * @i_mmap_rwsem: Protects @i_mmap and @i_mmap_writable.
> @@ -474,6 +475,7 @@ struct address_space {
>  	struct rw_semaphore	invalidate_lock;
>  	gfp_t			gfp_mask;
>  	atomic_t		i_mmap_writable;
> +	atomic_t		i_wt_count;
>  #ifdef CONFIG_READ_ONLY_THP_FOR_FS
>  	/* number of thp, only for non-shmem files */
>  	atomic_t		nr_thps;
> diff --git a/include/linux/iomap.h b/include/linux/iomap.h
> index b96574bb2918..6d08b966ceaf 100644
> --- a/include/linux/iomap.h
> +++ b/include/linux/iomap.h
> @@ -630,6 +630,8 @@ struct iomap_writethrough_ops {
>  ssize_t iomap_file_writethrough_write(struct kiocb *iocb, struct iov_iter *i,
>  				      const struct iomap_writethrough_ops *wt_ops,
>  				      void *private);
> +inline void inode_writethrough_begin(struct inode *inode);
> +inline void inode_writethrough_end(struct inode *inode);
>  
>  #ifdef CONFIG_SWAP
>  struct file;
> -- 
> 2.52.0
> 
> 

  reply	other threads:[~2026-03-10  3:57 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-09 17:34 [RFC 0/3] Add buffered write-through support to iomap & xfs Ojaswin Mujoo
2026-03-09 17:34 ` [RFC 1/3] iomap: Support buffered RWF_WRITETHROUGH via async dio backend Ojaswin Mujoo
2026-03-10  6:48   ` Dave Chinner
2026-03-11 10:35     ` Ojaswin Mujoo
2026-03-11 12:05       ` Dave Chinner
2026-03-13  7:43         ` Ojaswin Mujoo
2026-03-11  6:32   ` kernel test robot
2026-03-12  4:59   ` kernel test robot
2026-03-09 17:34 ` [RFC 2/3] iomap: Enable stable writes for RWF_WRITETHROUGH inodes Ojaswin Mujoo
2026-03-10  3:57   ` Darrick J. Wong [this message]
2026-03-10  5:25     ` Ritesh Harjani
2026-03-11  6:27       ` Ojaswin Mujoo
2026-03-09 17:34 ` [RFC 3/3] xfs: Add RWF_WRITETHROUGH support to xfs Ojaswin Mujoo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310035719.GI1105363@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=andres@anarazel.de \
    --cc=dgc@kernel.org \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=john.g.garry@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=ojaswin@linux.ibm.com \
    --cc=p.raghav@samsung.com \
    --cc=ritesh.list@gmail.com \
    --cc=tytso@mit.edu \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.