linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling
@ 2025-04-21 10:50 陈涛涛 Taotao Chen
  2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, willy@infradead.org
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	陈涛涛 Taotao Chen

From: Taotao Chen <chentaotao@didiglobal.com>

This series implements proper handling of IOCB_DONTCACHE flag
in ext4 filesystem.

Taotao Chen (3):
  mm/filemap: initialize fsdata with iocb->ki_flags
  ext4: implement IOCB_DONTCACHE handling in write operations
  ext4: support FOP_DONTCACHE flag

 fs/ext4/file.c  |  2 +-
 fs/ext4/inode.c | 21 +++++++++++++++++----
 mm/filemap.c    |  6 +++++-
 3 files changed, 23 insertions(+), 6 deletions(-)

-- 
2.34.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
  2025-05-14  3:51   ` Theodore Ts'o
  2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
  2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
  2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, willy@infradead.org
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	陈涛涛 Taotao Chen

From: Taotao Chen <chentaotao@didiglobal.com>

Initialize fsdata with &iocb->ki_flags to allow filesystems to check
iocb flags like IOCB_DONTCACHE during ->write_begin().

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
 mm/filemap.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/filemap.c b/mm/filemap.c
index 7b90cbeb4a1a..9174b6310f0b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
 		size_t offset;		/* Offset into folio */
 		size_t bytes;		/* Bytes to write to folio */
 		size_t copied;		/* Bytes copied from user */
-		void *fsdata = NULL;
+		/*
+		 * Initialize fsdata with iocb flags pointer so that filesystems
+		 * can check ki_flags (like IOCB_DONTCACHE) in write operations.
+		 */
+		void *fsdata = &iocb->ki_flags;
 
 		bytes = iov_iter_count(i);
 retry:
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
  2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
  2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
  2025-05-14 11:52   ` Theodore Ts'o
  2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
  2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, willy@infradead.org
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	陈涛涛 Taotao Chen

From: Taotao Chen <chentaotao@didiglobal.com>

Implement IOCB_DONTCACHE flag handling in ext4 write paths:
1. Check IOCB_DONTCACHE flag passed via fsdata
2. Propagate FGP_DONTCACHE to page cache operations

Existing write behavior remains unchanged when IOCB_DONTCACHE is not set.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
 fs/ext4/inode.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..787dd152a47e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 {
 	struct inode *inode = mapping->host;
 	int ret, needed_blocks;
+	int iocb_flag;
 	handle_t *handle;
 	int retries = 0;
 	struct folio *folio;
 	pgoff_t index;
+	fgf_t fgp = FGP_WRITEBEGIN;
 	unsigned from, to;
 
 	ret = ext4_emergency_state(inode->i_sb);
 	if (unlikely(ret))
 		return ret;
 
+	iocb_flag = (int)(uintptr_t)(*fsdata);
+	if (iocb_flag & IOCB_DONTCACHE)
+		fgp |= FGP_DONTCACHE;
+
 	trace_ext4_write_begin(inode, pos, len);
 	/*
 	 * Reserve one block more for addition to orphan list in case
@@ -1184,7 +1190,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
 	 * the folio (if needed) without using GFP_NOFS.
 	 */
 retry_grab:
-	folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+	folio = __filemap_get_folio(mapping, index, fgp,
 					mapping_gfp_mask(mapping));
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
@@ -2917,6 +2923,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 			       struct folio **foliop, void **fsdata)
 {
 	int ret, retries = 0;
+	int iocb_flag;
+	fgf_t fgp = FGP_WRITEBEGIN;
 	struct folio *folio;
 	pgoff_t index;
 	struct inode *inode = mapping->host;
@@ -2928,10 +2936,15 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	index = pos >> PAGE_SHIFT;
 
 	if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) {
+		ret = ext4_write_begin(file, mapping, pos, len, foliop, fsdata);
 		*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
-		return ext4_write_begin(file, mapping, pos,
-					len, foliop, fsdata);
+		return ret;
 	}
+
+	iocb_flag = (int)(uintptr_t)(*fsdata);
+	if (iocb_flag & IOCB_DONTCACHE)
+		fgp |= FGP_DONTCACHE;
+
 	*fsdata = (void *)0;
 	trace_ext4_da_write_begin(inode, pos, len);
 
@@ -2945,7 +2958,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
 	}
 
 retry:
-	folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+	folio = __filemap_get_folio(mapping, index, fgp,
 			mapping_gfp_mask(mapping));
 	if (IS_ERR(folio))
 		return PTR_ERR(folio);
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/3] ext4: support FOP_DONTCACHE flag
  2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
  2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
  2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
  2 siblings, 0 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
  To: tytso@mit.edu, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, willy@infradead.org
  Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	陈涛涛 Taotao Chen

From: Taotao Chen <chentaotao@didiglobal.com>

After the core logic for handling IOCB_DONTCACHE was implemented
in the previous patch, we now formally enable the feature by
adding FOP_DONTCACHE to ext4's file operations flags.

Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
 fs/ext4/file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index beb078ee4811..c514903d85c7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -977,7 +977,7 @@ const struct file_operations ext4_file_operations = {
 	.splice_write	= iter_file_splice_write,
 	.fallocate	= ext4_fallocate,
 	.fop_flags	= FOP_MMAP_SYNC | FOP_BUFFER_RASYNC |
-			  FOP_DIO_PARALLEL_WRITE,
+			  FOP_DIO_PARALLEL_WRITE | FOP_DONTCACHE,
 };
 
 const struct inode_operations ext4_file_inode_operations = {
-- 
2.34.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-05-14  3:51   ` Theodore Ts'o
  2025-05-14 13:38     ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14  3:51 UTC (permalink / raw)
  To: 陈涛涛 Taotao Chen
  Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
	willy@infradead.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7b90cbeb4a1a..9174b6310f0b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
>  		size_t offset;		/* Offset into folio */
>  		size_t bytes;		/* Bytes to write to folio */
>  		size_t copied;		/* Bytes copied from user */
> -		void *fsdata = NULL;
> +		/*
> +		 * Initialize fsdata with iocb flags pointer so that filesystems
> +		 * can check ki_flags (like IOCB_DONTCACHE) in write operations.
> +		 */
> +		void *fsdata = &iocb->ki_flags;

Unfortunately, this is't safe.  There may very well be code
paths which depend on fsdata being initialized to NULL before
calling write_begin().

In fact in the patch 2/3 in this series, ext4_write_end() will get
confused in the non-delayed allocation case, since ext4_write_begin()
doesn't force fsdata to be not be &iocb->ki_flags, and this will cause
ext4_write_end() to potentially get confused and do the wrong thing.

I understand that it would be a lot more inconvenient change the
function signature of write_begin() to pass through iocb->ki_fags via
a new parameter.  But I think that probably is the best way to go.

Cheers,

					- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
  2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-05-14 11:52   ` Theodore Ts'o
  0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 11:52 UTC (permalink / raw)
  To: 陈涛涛 Taotao Chen
  Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
	willy@infradead.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 94c7d2d828a6..787dd152a47e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
>  {
>  	struct inode *inode = mapping->host;
>  	int ret, needed_blocks;
> +	int iocb_flag;
>  	handle_t *handle;
>  	int retries = 0;
>  	struct folio *folio;
>  	pgoff_t index;
> +	fgf_t fgp = FGP_WRITEBEGIN;
>  	unsigned from, to;
>  
>  	ret = ext4_emergency_state(inode->i_sb);
>  	if (unlikely(ret))
>  		return ret;
>  
> +	iocb_flag = (int)(uintptr_t)(*fsdata);
> +	if (iocb_flag & IOCB_DONTCACHE)
> +		fgp |= FGP_DONTCACHE;
> +

See my comment against the first patch in this series.  It *should* be
possible to solve the problem just for ext4 by adding this line here:

	*fsdata = (void *)0;

The problem is that it's super-fragile, since how *fsdata gets used
changes at different points in time, so it makes code review and
maintenance more difficult.  (As evidenced by the fact that you missed
this; this is not a criticism on your programming ability, but rather
for the design choise of overloading the use of *fsdata.  This is a
trap that someone else might fall into when doing future code
changes.)

And of course, the question is whether PATCH 1/3 could potentially
break other file systems.  We would need audit all of the other
*_write_begin() functions, and then document this for the sake of
future file system developers that might want to change their
write_begin() function.

This is why my preference would be to add an extra flags paramter to
write_begin(), but that is going to be a lot more work.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-05-14  3:51   ` Theodore Ts'o
@ 2025-05-14 13:38     ` Matthew Wilcox
  2025-05-14 14:23       ` Theodore Ts'o
  2025-05-14 15:06       ` Christoph Hellwig
  0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-05-14 13:38 UTC (permalink / raw)
  To: Theodore Ts'o
  Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> I understand that it would be a lot more inconvenient change the
> function signature of write_begin() to pass through iocb->ki_fags via
> a new parameter.  But I think that probably is the best way to go.

I'd suggest that passing in iocb rather than file is the way to go.
Most callers of ->write_begin already pass NULL as the first argument so
would not need to change.  i915/gem passes a non-NULL file, but it only
operates on shmem and shmem does not use the file argument, so they can
pass NULL instead.  fs/buffer.c simply passes through the file passed
to write_begin and can be changed to pass through the iocb passed in.
exfat_extend_valid_size() has an iocb in its caller and can pass in the
iocb instead.  generic_perform_write() has an iocb.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-05-14 13:38     ` Matthew Wilcox
@ 2025-05-14 14:23       ` Theodore Ts'o
  2025-05-14 15:06       ` Christoph Hellwig
  1 sibling, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 14:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
	akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter.  But I think that probably is the best way to go.
> 
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change.  i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead.  fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead.  generic_perform_write() has an iocb.

Mmm, nice!  I agree, that's probably the way to go.

There might be some callers if write_begin() that might require some
restructing because they don't have an iocb.  For example,
shmem_pwrite() in drivers/gpu/i915/gem/i915_gem_shmem.c came up when I
did a quick grep.

     	      	     	    	     	     - Ted
					     

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-05-14 13:38     ` Matthew Wilcox
  2025-05-14 14:23       ` Theodore Ts'o
@ 2025-05-14 15:06       ` Christoph Hellwig
  2025-05-18 14:01         ` Taotao Chen
  1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-05-14 15:06 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Theodore Ts'o, 陈涛涛 Taotao Chen,
	adilger.kernel@dilger.ca, akpm@linux-foundation.org,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org

On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter.  But I think that probably is the best way to go.
> 
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change.  i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead.

i915/gem needs to stop using write_begin/end and just do an iter_write.
Someone who has the hardware and/or CI setup needs to figure out if
vfs_write and kernel_write is fine, or this is magic enough to skip
checks and protection and go straight to callig ->iter_write.

> fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead.  generic_perform_write() has an iocb.

And we really need to stop theading this through the address_space
ops because it's not a generic method but a callback for the file
system..


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
  2025-05-14 15:06       ` Christoph Hellwig
@ 2025-05-18 14:01         ` Taotao Chen
  0 siblings, 0 replies; 10+ messages in thread
From: Taotao Chen @ 2025-05-18 14:01 UTC (permalink / raw)
  To: hch, tytso, willy
  Cc: adilger.kernel, akpm, chentaotao, linux-ext4, linux-fsdevel,
	linux-kernel, linux-mm

Hi Christoph, Matthew, Ted

Thanks for the suggestions.

Replacing file with iocb in write_begin(), updating call sites,
and adjusting i915/gem usage makes a lot of sense. I’ll send a
v2 to reflect this change.

Thanks,  
Taotao


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-18 14:07 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
2025-05-14  3:51   ` Theodore Ts'o
2025-05-14 13:38     ` Matthew Wilcox
2025-05-14 14:23       ` Theodore Ts'o
2025-05-14 15:06       ` Christoph Hellwig
2025-05-18 14:01         ` Taotao Chen
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
2025-05-14 11:52   ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).