* [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling
@ 2025-04-21 10:50 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
This series implements proper handling of IOCB_DONTCACHE flag
in ext4 filesystem.
Taotao Chen (3):
mm/filemap: initialize fsdata with iocb->ki_flags
ext4: implement IOCB_DONTCACHE handling in write operations
ext4: support FOP_DONTCACHE flag
fs/ext4/file.c | 2 +-
fs/ext4/inode.c | 21 +++++++++++++++++----
mm/filemap.c | 6 +++++-
3 files changed, 23 insertions(+), 6 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2025-05-14 3:51 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
Initialize fsdata with &iocb->ki_flags to allow filesystems to check
iocb flags like IOCB_DONTCACHE during ->write_begin().
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
mm/filemap.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 7b90cbeb4a1a..9174b6310f0b 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
size_t offset; /* Offset into folio */
size_t bytes; /* Bytes to write to folio */
size_t copied; /* Bytes copied from user */
- void *fsdata = NULL;
+ /*
+ * Initialize fsdata with iocb flags pointer so that filesystems
+ * can check ki_flags (like IOCB_DONTCACHE) in write operations.
+ */
+ void *fsdata = &iocb->ki_flags;
bytes = iov_iter_count(i);
retry:
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2025-05-14 11:52 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
2 siblings, 1 reply; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
Implement IOCB_DONTCACHE flag handling in ext4 write paths:
1. Check IOCB_DONTCACHE flag passed via fsdata
2. Propagate FGP_DONTCACHE to page cache operations
Existing write behavior remains unchanged when IOCB_DONTCACHE is not set.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
fs/ext4/inode.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 94c7d2d828a6..787dd152a47e 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
{
struct inode *inode = mapping->host;
int ret, needed_blocks;
+ int iocb_flag;
handle_t *handle;
int retries = 0;
struct folio *folio;
pgoff_t index;
+ fgf_t fgp = FGP_WRITEBEGIN;
unsigned from, to;
ret = ext4_emergency_state(inode->i_sb);
if (unlikely(ret))
return ret;
+ iocb_flag = (int)(uintptr_t)(*fsdata);
+ if (iocb_flag & IOCB_DONTCACHE)
+ fgp |= FGP_DONTCACHE;
+
trace_ext4_write_begin(inode, pos, len);
/*
* Reserve one block more for addition to orphan list in case
@@ -1184,7 +1190,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
* the folio (if needed) without using GFP_NOFS.
*/
retry_grab:
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+ folio = __filemap_get_folio(mapping, index, fgp,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
@@ -2917,6 +2923,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
struct folio **foliop, void **fsdata)
{
int ret, retries = 0;
+ int iocb_flag;
+ fgf_t fgp = FGP_WRITEBEGIN;
struct folio *folio;
pgoff_t index;
struct inode *inode = mapping->host;
@@ -2928,10 +2936,15 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
index = pos >> PAGE_SHIFT;
if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) {
+ ret = ext4_write_begin(file, mapping, pos, len, foliop, fsdata);
*fsdata = (void *)FALL_BACK_TO_NONDELALLOC;
- return ext4_write_begin(file, mapping, pos,
- len, foliop, fsdata);
+ return ret;
}
+
+ iocb_flag = (int)(uintptr_t)(*fsdata);
+ if (iocb_flag & IOCB_DONTCACHE)
+ fgp |= FGP_DONTCACHE;
+
*fsdata = (void *)0;
trace_ext4_da_write_begin(inode, pos, len);
@@ -2945,7 +2958,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
}
retry:
- folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN,
+ folio = __filemap_get_folio(mapping, index, fgp,
mapping_gfp_mask(mapping));
if (IS_ERR(folio))
return PTR_ERR(folio);
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/3] ext4: support FOP_DONTCACHE flag
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen
2 siblings, 0 replies; 10+ messages in thread
From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw)
To: tytso@mit.edu, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, willy@infradead.org
Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
陈涛涛 Taotao Chen
From: Taotao Chen <chentaotao@didiglobal.com>
After the core logic for handling IOCB_DONTCACHE was implemented
in the previous patch, we now formally enable the feature by
adding FOP_DONTCACHE to ext4's file operations flags.
Signed-off-by: Taotao Chen <chentaotao@didiglobal.com>
---
fs/ext4/file.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index beb078ee4811..c514903d85c7 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -977,7 +977,7 @@ const struct file_operations ext4_file_operations = {
.splice_write = iter_file_splice_write,
.fallocate = ext4_fallocate,
.fop_flags = FOP_MMAP_SYNC | FOP_BUFFER_RASYNC |
- FOP_DIO_PARALLEL_WRITE,
+ FOP_DIO_PARALLEL_WRITE | FOP_DONTCACHE,
};
const struct inode_operations ext4_file_inode_operations = {
--
2.34.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
@ 2025-05-14 3:51 ` Theodore Ts'o
2025-05-14 13:38 ` Matthew Wilcox
0 siblings, 1 reply; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 3:51 UTC (permalink / raw)
To: 陈涛涛 Taotao Chen
Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
willy@infradead.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 7b90cbeb4a1a..9174b6310f0b 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i)
> size_t offset; /* Offset into folio */
> size_t bytes; /* Bytes to write to folio */
> size_t copied; /* Bytes copied from user */
> - void *fsdata = NULL;
> + /*
> + * Initialize fsdata with iocb flags pointer so that filesystems
> + * can check ki_flags (like IOCB_DONTCACHE) in write operations.
> + */
> + void *fsdata = &iocb->ki_flags;
Unfortunately, this is't safe. There may very well be code
paths which depend on fsdata being initialized to NULL before
calling write_begin().
In fact in the patch 2/3 in this series, ext4_write_end() will get
confused in the non-delayed allocation case, since ext4_write_begin()
doesn't force fsdata to be not be &iocb->ki_flags, and this will cause
ext4_write_end() to potentially get confused and do the wrong thing.
I understand that it would be a lot more inconvenient change the
function signature of write_begin() to pass through iocb->ki_fags via
a new parameter. But I think that probably is the best way to go.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
@ 2025-05-14 11:52 ` Theodore Ts'o
0 siblings, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 11:52 UTC (permalink / raw)
To: 陈涛涛 Taotao Chen
Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org,
willy@infradead.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote:
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 94c7d2d828a6..787dd152a47e 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping,
> {
> struct inode *inode = mapping->host;
> int ret, needed_blocks;
> + int iocb_flag;
> handle_t *handle;
> int retries = 0;
> struct folio *folio;
> pgoff_t index;
> + fgf_t fgp = FGP_WRITEBEGIN;
> unsigned from, to;
>
> ret = ext4_emergency_state(inode->i_sb);
> if (unlikely(ret))
> return ret;
>
> + iocb_flag = (int)(uintptr_t)(*fsdata);
> + if (iocb_flag & IOCB_DONTCACHE)
> + fgp |= FGP_DONTCACHE;
> +
See my comment against the first patch in this series. It *should* be
possible to solve the problem just for ext4 by adding this line here:
*fsdata = (void *)0;
The problem is that it's super-fragile, since how *fsdata gets used
changes at different points in time, so it makes code review and
maintenance more difficult. (As evidenced by the fact that you missed
this; this is not a criticism on your programming ability, but rather
for the design choise of overloading the use of *fsdata. This is a
trap that someone else might fall into when doing future code
changes.)
And of course, the question is whether PATCH 1/3 could potentially
break other file systems. We would need audit all of the other
*_write_begin() functions, and then document this for the sake of
future file system developers that might want to change their
write_begin() function.
This is why my preference would be to add an extra flags paramter to
write_begin(), but that is going to be a lot more work.
Cheers,
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 3:51 ` Theodore Ts'o
@ 2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
0 siblings, 2 replies; 10+ messages in thread
From: Matthew Wilcox @ 2025-05-14 13:38 UTC (permalink / raw)
To: Theodore Ts'o
Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> I understand that it would be a lot more inconvenient change the
> function signature of write_begin() to pass through iocb->ki_fags via
> a new parameter. But I think that probably is the best way to go.
I'd suggest that passing in iocb rather than file is the way to go.
Most callers of ->write_begin already pass NULL as the first argument so
would not need to change. i915/gem passes a non-NULL file, but it only
operates on shmem and shmem does not use the file argument, so they can
pass NULL instead. fs/buffer.c simply passes through the file passed
to write_begin and can be changed to pass through the iocb passed in.
exfat_extend_valid_size() has an iocb in its caller and can pass in the
iocb instead. generic_perform_write() has an iocb.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 13:38 ` Matthew Wilcox
@ 2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
1 sibling, 0 replies; 10+ messages in thread
From: Theodore Ts'o @ 2025-05-14 14:23 UTC (permalink / raw)
To: Matthew Wilcox
Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca,
akpm@linux-foundation.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org
On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter. But I think that probably is the best way to go.
>
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change. i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead. fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead. generic_perform_write() has an iocb.
Mmm, nice! I agree, that's probably the way to go.
There might be some callers if write_begin() that might require some
restructing because they don't have an iocb. For example,
shmem_pwrite() in drivers/gpu/i915/gem/i915_gem_shmem.c came up when I
did a quick grep.
- Ted
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
@ 2025-05-14 15:06 ` Christoph Hellwig
2025-05-18 14:01 ` Taotao Chen
1 sibling, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2025-05-14 15:06 UTC (permalink / raw)
To: Matthew Wilcox
Cc: Theodore Ts'o, 陈涛涛 Taotao Chen,
adilger.kernel@dilger.ca, akpm@linux-foundation.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org
On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote:
> On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote:
> > I understand that it would be a lot more inconvenient change the
> > function signature of write_begin() to pass through iocb->ki_fags via
> > a new parameter. But I think that probably is the best way to go.
>
> I'd suggest that passing in iocb rather than file is the way to go.
> Most callers of ->write_begin already pass NULL as the first argument so
> would not need to change. i915/gem passes a non-NULL file, but it only
> operates on shmem and shmem does not use the file argument, so they can
> pass NULL instead.
i915/gem needs to stop using write_begin/end and just do an iter_write.
Someone who has the hardware and/or CI setup needs to figure out if
vfs_write and kernel_write is fine, or this is magic enough to skip
checks and protection and go straight to callig ->iter_write.
> fs/buffer.c simply passes through the file passed
> to write_begin and can be changed to pass through the iocb passed in.
> exfat_extend_valid_size() has an iocb in its caller and can pass in the
> iocb instead. generic_perform_write() has an iocb.
And we really need to stop theading this through the address_space
ops because it's not a generic method but a callback for the file
system..
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags
2025-05-14 15:06 ` Christoph Hellwig
@ 2025-05-18 14:01 ` Taotao Chen
0 siblings, 0 replies; 10+ messages in thread
From: Taotao Chen @ 2025-05-18 14:01 UTC (permalink / raw)
To: hch, tytso, willy
Cc: adilger.kernel, akpm, chentaotao, linux-ext4, linux-fsdevel,
linux-kernel, linux-mm
Hi Christoph, Matthew, Ted
Thanks for the suggestions.
Replacing file with iocb in write_begin(), updating call sites,
and adjusting i915/gem usage makes a lot of sense. I’ll send a
v2 to reflect this change.
Thanks,
Taotao
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-18 14:07 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen
2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen
2025-05-14 3:51 ` Theodore Ts'o
2025-05-14 13:38 ` Matthew Wilcox
2025-05-14 14:23 ` Theodore Ts'o
2025-05-14 15:06 ` Christoph Hellwig
2025-05-18 14:01 ` Taotao Chen
2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen
2025-05-14 11:52 ` Theodore Ts'o
2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).