* [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling
@ 2025-04-21 10:50 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
This series implements proper handling of IOCB_DONTCACHE flag
in ext4 filesystem.
Taotao Chen (3):
mm/filemap: initialize fsdata with iocb->ki_flags
ext4: implement IOCB_DONTCACHE handling in write operations
ext4: support FOP_DONTCACHE flag
fs/ext4/file.c | 2 +-
fs/ext4/inode.c | 21 +++++++++++++++++----
mm/filemap.c | 6 +++++-
3 files changed, 23 insertions(+), 6 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2025-05-14 3:51 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
Initialize fsdata with &iocb->ki_flags to allow filesystems to check
iocb flags like IOCB_DONTCACHE during ->write_begin().
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
mm/filemap.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b90cbeb4a1a..9174b6310f0b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
size_t offset; /* Offset into folio */
size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
- void *fsdata = NULL;
+ /*
+ * Initialize fsdata with iocb flags pointer so that filesystems
+ * can check ki_flags (like IOCB_DONTCACHE) in write operations.
+ */
+ void *fsdata = &iocb->ki_flags;
bytes = iov_iter_count(i);
retry:
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2025-05-14 11:52 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
Implement IOCB_DONTCACHE flag handling in ext4 write paths:
1. Check IOCB_DONTCACHE flag passed via fsdata
2. Propagate FGP_DONTCACHE to page cache operations
Existing write behavior remains unchanged when IOCB_DONTCACHE is not set.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
fs/ext4/inode.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..787dd152a47e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
{
struct inode *inode = mapping->host;
int ret, needed_blocks;
+ int iocb_flag;
handle_t *handle;
int retries = 0;
struct folio *folio;
pgoff_t index;
+ fgf_t fgp = FGP_WRITEBEGIN;
unsigned from, to;
ret = ext4_emergency_state(inode->i_sb);
if (unlikely(ret))
return ret;
+ iocb_flag = (int)(uintptr_t)(*fsdata);
+ if (iocb_flag & IOCB_DONTCACHE)
+ fgp |= FGP_DONTCACHE;
+
trace_ext4_write_begin(inode, pos, len);
/*
* Reserve one block more for addition to orphan list in case
@@ -1184,7 +1190,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
* the folio (if needed) without using GFP_NOFS.
*/
retry_grab:
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+ folio = __filemap_get_folio(mapping, index, fgp,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
@@ -2917,6 +2923,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
struct folio **foliop, void **fsdata)
{
int ret, retries = 0;
+ int iocb_flag;
+ fgf_t fgp = FGP_WRITEBEGIN;
struct folio *folio;
pgoff_t index;
struct inode *inode = mapping->host;
@@ -2928,10 +2936,15 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
index = pos >> PAGE_SHIFT;
if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) {
+ ret = ext4_write_begin(file, mapping, pos, len, foliop, fsdata);
*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
- return ext4_write_begin(file, mapping, pos,
- len, foliop, fsdata);
+ return ret;
}
+
+ iocb_flag = (int)(uintptr_t)(*fsdata);
+ if (iocb_flag & IOCB_DONTCACHE)
+ fgp |= FGP_DONTCACHE;
+
*fsdata = (void *)0;
trace_ext4_da_write_begin(inode, pos, len);
@@ -2945,7 +2958,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
}
retry:
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+ folio = __filemap_get_folio(mapping, index, fgp,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] ext4: support FOP_DONTCACHE flag
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2 siblings, 0 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
After the core logic for handling IOCB_DONTCACHE was implemented
in the previous patch, we now formally enable the feature by
adding FOP_DONTCACHE to ext4's file operations flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
fs/ext4/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index beb078ee4811..c514903d85c7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -977,7 +977,7 @@ const struct file_operations ext4_file_operations = {
.splice_write = iter_file_splice_write,
.fallocate = ext4_fallocate,
.fop_flags = FOP_MMAP_SYNC | FOP_BUFFER_RASYNC |
- FOP_DIO_PARALLEL_WRITE,
+ FOP_DIO_PARALLEL_WRITE | FOP_DONTCACHE,
};
const struct inode_operations ext4_file_inode_operations = {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-05-14 3:51 ` Theodore Ts'o
2025-05-14 13:38 ` Matthew Wilcox
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 3:51 UTC (permalink / raw)
To: 陈涛涛 Taotao Chen
Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
willy@infradead.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7b90cbeb4a1a..9174b6310f0b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> size_t offset; /* Offset into folio */
> size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> - void *fsdata = NULL;
> + /*
> + * Initialize fsdata with iocb flags pointer so that filesystems
> + * can check ki_flags (like IOCB_DONTCACHE) in write operations.
> + */
> + void *fsdata = &iocb->ki_flags;
Unfortunately, this is't safe. There may very well be code
paths which depend on fsdata being initialized to NULL before
calling write_begin().
In fact in the patch 2/3 in this series, ext4_write_end() will get
confused in the non-delayed allocation case, since ext4_write_begin()
doesn't force fsdata to be not be &iocb->ki_flags, and this will cause
ext4_write_end() to potentially get confused and do the wrong thing.
I understand that it would be a lot more inconvenient change the
function signature of write_begin() to pass through iocb->ki_fags via
a new parameter. But I think that probably is the best way to go.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-05-14 11:52 ` Theodore Ts'o
0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 11:52 UTC (permalink / raw)
To: 陈涛涛 Taotao Chen
Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
willy@infradead.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 94c7d2d828a6..787dd152a47e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
> {
> struct inode *inode = mapping->host;
> int ret, needed_blocks;
> + int iocb_flag;
> handle_t *handle;
> int retries = 0;
> struct folio *folio;
> pgoff_t index;
> + fgf_t fgp = FGP_WRITEBEGIN;
> unsigned from, to;
>
> ret = ext4_emergency_state(inode->i_sb);
> if (unlikely(ret))
> return ret;
>
> + iocb_flag = (int)(uintptr_t)(*fsdata);
> + if (iocb_flag & IOCB_DONTCACHE)
> + fgp |= FGP_DONTCACHE;
> +
See my comment against the first patch in this series. It *should* be
possible to solve the problem just for ext4 by adding this line here:
*fsdata = (void *)0;
The problem is that it's super-fragile, since how *fsdata gets used
changes at different points in time, so it makes code review and
maintenance more difficult. (As evidenced by the fact that you missed
this; this is not a criticism on your programming ability, but rather
for the design choise of overloading the use of *fsdata. This is a
trap that someone else might fall into when doing future code
changes.)
And of course, the question is whether PATCH 1/3 could potentially
break other file systems. We would need audit all of the other
*_write_begin() functions, and then document this for the sake of
future file system developers that might want to change their
write_begin() function.
This is why my preference would be to add an extra flags paramter to
write_begin(), but that is going to be a lot more work.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 3:51 ` Theodore Ts'o
@ 2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-05-14 13:38 UTC (permalink / raw)
To: Theodore Ts'o
Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> I understand that it would be a lot more inconvenient change the
> function signature of write_begin() to pass through iocb->ki_fags via
> a new parameter. But I think that probably is the best way to go.
I'd suggest that passing in iocb rather than file is the way to go.
Most callers of ->write_begin already pass NULL as the first argument so
would not need to change. i915/gem passes a non-NULL file, but it only
operates on shmem and shmem does not use the file argument, so they can
pass NULL instead. fs/buffer.c simply passes through the file passed
to write_begin and can be changed to pass through the iocb passed in.
exfat_extend_valid_size() has an iocb in its caller and can pass in the
iocb instead. generic_perform_write() has an iocb.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 13:38 ` Matthew Wilcox
@ 2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 14:23 UTC (permalink / raw)
To: Matthew Wilcox
Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter. But I think that probably is the best way to go.
>
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change. i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead. fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead. generic_perform_write() has an iocb.
Mmm, nice! I agree, that's probably the way to go.
There might be some callers if write_begin() that might require some
restructing because they don't have an iocb. For example,
shmem_pwrite() in drivers/gpu/i915/gem/i915_gem_shmem.c came up when I
did a quick grep.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
@ 2025-05-14 15:06 ` Christoph Hellwig
2025-05-18 14:01 ` Taotao Chen
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-05-14 15:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Theodore Ts'o, 陈涛涛 Taotao Chen,
adilger.kernel@dilger.ca, akpm@linux-foundation.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter. But I think that probably is the best way to go.
>
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change. i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead.
i915/gem needs to stop using write_begin/end and just do an iter_write.
Someone who has the hardware and/or CI setup needs to figure out if
vfs_write and kernel_write is fine, or this is magic enough to skip
checks and protection and go straight to callig ->iter_write.
> fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead. generic_perform_write() has an iocb.
And we really need to stop theading this through the address_space
ops because it's not a generic method but a callback for the file
system..
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 15:06 ` Christoph Hellwig
@ 2025-05-18 14:01 ` Taotao Chen
0 siblings, 0 replies; 10+ messages in thread
From: Taotao Chen @ 2025-05-18 14:01 UTC (permalink / raw)
To: hch, tytso, willy
Cc: adilger.kernel, akpm, chentaotao, linux-ext4, linux-fsdevel,
linux-kernel, linux-mm
Hi Christoph, Matthew, Ted
Thanks for the suggestions.
Replacing file with iocb in write_begin(), updating call sites,
and adjusting i915/gem usage makes a lot of sense. I’ll send a
v2 to reflect this change.
Thanks,
Taotao
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-18 14:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
2025-05-14 3:51 ` Theodore Ts'o
2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
2025-05-18 14:01 ` Taotao Chen
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
2025-05-14 11:52 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.