* [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling @ 2025-04-21 10:50 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen ` (2 more replies) 0 siblings, 3 replies; 10+ messages in thread From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw) To: tytso@mit.edu, adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 陈涛涛 Taotao Chen From: Taotao Chen <chentaotao@didiglobal.com> This series implements proper handling of IOCB_DONTCACHE flag in ext4 filesystem. Taotao Chen (3): mm/filemap: initialize fsdata with iocb->ki_flags ext4: implement IOCB_DONTCACHE handling in write operations ext4: support FOP_DONTCACHE flag fs/ext4/file.c | 2 +- fs/ext4/inode.c | 21 +++++++++++++++++---- mm/filemap.c | 6 +++++- 3 files changed, 23 insertions(+), 6 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen @ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen 2025-05-14 3:51 ` Theodore Ts'o 2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen 2 siblings, 1 reply; 10+ messages in thread From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw) To: tytso@mit.edu, adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 陈涛涛 Taotao Chen From: Taotao Chen <chentaotao@didiglobal.com> Initialize fsdata with &iocb->ki_flags to allow filesystems to check iocb flags like IOCB_DONTCACHE during ->write_begin(). Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> --- mm/filemap.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/filemap.c b/mm/filemap.c index 7b90cbeb4a1a..9174b6310f0b 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) size_t offset; /* Offset into folio */ size_t bytes; /* Bytes to write to folio */ size_t copied; /* Bytes copied from user */ - void *fsdata = NULL; + /* + * Initialize fsdata with iocb flags pointer so that filesystems + * can check ki_flags (like IOCB_DONTCACHE) in write operations. + */ + void *fsdata = &iocb->ki_flags; bytes = iov_iter_count(i); retry: -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen @ 2025-05-14 3:51 ` Theodore Ts'o 2025-05-14 13:38 ` Matthew Wilcox 0 siblings, 1 reply; 10+ messages in thread From: Theodore Ts'o @ 2025-05-14 3:51 UTC (permalink / raw) To: 陈涛涛 Taotao Chen Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote: > diff --git a/mm/filemap.c b/mm/filemap.c > index 7b90cbeb4a1a..9174b6310f0b 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -4087,7 +4087,11 @@ ssize_t generic_perform_write(struct kiocb *iocb, struct iov_iter *i) > size_t offset; /* Offset into folio */ > size_t bytes; /* Bytes to write to folio */ > size_t copied; /* Bytes copied from user */ > - void *fsdata = NULL; > + /* > + * Initialize fsdata with iocb flags pointer so that filesystems > + * can check ki_flags (like IOCB_DONTCACHE) in write operations. > + */ > + void *fsdata = &iocb->ki_flags; Unfortunately, this is't safe. There may very well be code paths which depend on fsdata being initialized to NULL before calling write_begin(). In fact in the patch 2/3 in this series, ext4_write_end() will get confused in the non-delayed allocation case, since ext4_write_begin() doesn't force fsdata to be not be &iocb->ki_flags, and this will cause ext4_write_end() to potentially get confused and do the wrong thing. I understand that it would be a lot more inconvenient change the function signature of write_begin() to pass through iocb->ki_fags via a new parameter. But I think that probably is the best way to go. Cheers, - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-05-14 3:51 ` Theodore Ts'o @ 2025-05-14 13:38 ` Matthew Wilcox 2025-05-14 14:23 ` Theodore Ts'o 2025-05-14 15:06 ` Christoph Hellwig 0 siblings, 2 replies; 10+ messages in thread From: Matthew Wilcox @ 2025-05-14 13:38 UTC (permalink / raw) To: Theodore Ts'o Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca, akpm@linux-foundation.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote: > I understand that it would be a lot more inconvenient change the > function signature of write_begin() to pass through iocb->ki_fags via > a new parameter. But I think that probably is the best way to go. I'd suggest that passing in iocb rather than file is the way to go. Most callers of ->write_begin already pass NULL as the first argument so would not need to change. i915/gem passes a non-NULL file, but it only operates on shmem and shmem does not use the file argument, so they can pass NULL instead. fs/buffer.c simply passes through the file passed to write_begin and can be changed to pass through the iocb passed in. exfat_extend_valid_size() has an iocb in its caller and can pass in the iocb instead. generic_perform_write() has an iocb. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-05-14 13:38 ` Matthew Wilcox @ 2025-05-14 14:23 ` Theodore Ts'o 2025-05-14 15:06 ` Christoph Hellwig 1 sibling, 0 replies; 10+ messages in thread From: Theodore Ts'o @ 2025-05-14 14:23 UTC (permalink / raw) To: Matthew Wilcox Cc: 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca, akpm@linux-foundation.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote: > On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote: > > I understand that it would be a lot more inconvenient change the > > function signature of write_begin() to pass through iocb->ki_fags via > > a new parameter. But I think that probably is the best way to go. > > I'd suggest that passing in iocb rather than file is the way to go. > Most callers of ->write_begin already pass NULL as the first argument so > would not need to change. i915/gem passes a non-NULL file, but it only > operates on shmem and shmem does not use the file argument, so they can > pass NULL instead. fs/buffer.c simply passes through the file passed > to write_begin and can be changed to pass through the iocb passed in. > exfat_extend_valid_size() has an iocb in its caller and can pass in the > iocb instead. generic_perform_write() has an iocb. Mmm, nice! I agree, that's probably the way to go. There might be some callers if write_begin() that might require some restructing because they don't have an iocb. For example, shmem_pwrite() in drivers/gpu/i915/gem/i915_gem_shmem.c came up when I did a quick grep. - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-05-14 13:38 ` Matthew Wilcox 2025-05-14 14:23 ` Theodore Ts'o @ 2025-05-14 15:06 ` Christoph Hellwig 2025-05-18 14:01 ` Taotao Chen 1 sibling, 1 reply; 10+ messages in thread From: Christoph Hellwig @ 2025-05-14 15:06 UTC (permalink / raw) To: Matthew Wilcox Cc: Theodore Ts'o, 陈涛涛 Taotao Chen, adilger.kernel@dilger.ca, akpm@linux-foundation.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Wed, May 14, 2025 at 02:38:05PM +0100, Matthew Wilcox wrote: > On Tue, May 13, 2025 at 11:51:25PM -0400, Theodore Ts'o wrote: > > I understand that it would be a lot more inconvenient change the > > function signature of write_begin() to pass through iocb->ki_fags via > > a new parameter. But I think that probably is the best way to go. > > I'd suggest that passing in iocb rather than file is the way to go. > Most callers of ->write_begin already pass NULL as the first argument so > would not need to change. i915/gem passes a non-NULL file, but it only > operates on shmem and shmem does not use the file argument, so they can > pass NULL instead. i915/gem needs to stop using write_begin/end and just do an iter_write. Someone who has the hardware and/or CI setup needs to figure out if vfs_write and kernel_write is fine, or this is magic enough to skip checks and protection and go straight to callig ->iter_write. > fs/buffer.c simply passes through the file passed > to write_begin and can be changed to pass through the iocb passed in. > exfat_extend_valid_size() has an iocb in its caller and can pass in the > iocb instead. generic_perform_write() has an iocb. And we really need to stop theading this through the address_space ops because it's not a generic method but a callback for the file system.. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 2025-05-14 15:06 ` Christoph Hellwig @ 2025-05-18 14:01 ` Taotao Chen 0 siblings, 0 replies; 10+ messages in thread From: Taotao Chen @ 2025-05-18 14:01 UTC (permalink / raw) To: hch, tytso, willy Cc: adilger.kernel, akpm, chentaotao, linux-ext4, linux-fsdevel, linux-kernel, linux-mm Hi Christoph, Matthew, Ted Thanks for the suggestions. Replacing file with iocb in write_begin(), updating call sites, and adjusting i915/gem usage makes a lot of sense. I’ll send a v2 to reflect this change. Thanks, Taotao ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen @ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen 2025-05-14 11:52 ` Theodore Ts'o 2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen 2 siblings, 1 reply; 10+ messages in thread From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw) To: tytso@mit.edu, adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 陈涛涛 Taotao Chen From: Taotao Chen <chentaotao@didiglobal.com> Implement IOCB_DONTCACHE flag handling in ext4 write paths: 1. Check IOCB_DONTCACHE flag passed via fsdata 2. Propagate FGP_DONTCACHE to page cache operations Existing write behavior remains unchanged when IOCB_DONTCACHE is not set. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> --- fs/ext4/inode.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 94c7d2d828a6..787dd152a47e 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, { struct inode *inode = mapping->host; int ret, needed_blocks; + int iocb_flag; handle_t *handle; int retries = 0; struct folio *folio; pgoff_t index; + fgf_t fgp = FGP_WRITEBEGIN; unsigned from, to; ret = ext4_emergency_state(inode->i_sb); if (unlikely(ret)) return ret; + iocb_flag = (int)(uintptr_t)(*fsdata); + if (iocb_flag & IOCB_DONTCACHE) + fgp |= FGP_DONTCACHE; + trace_ext4_write_begin(inode, pos, len); /* * Reserve one block more for addition to orphan list in case @@ -1184,7 +1190,7 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, * the folio (if needed) without using GFP_NOFS. */ retry_grab: - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + folio = __filemap_get_folio(mapping, index, fgp, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) return PTR_ERR(folio); @@ -2917,6 +2923,8 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, struct folio **foliop, void **fsdata) { int ret, retries = 0; + int iocb_flag; + fgf_t fgp = FGP_WRITEBEGIN; struct folio *folio; pgoff_t index; struct inode *inode = mapping->host; @@ -2928,10 +2936,15 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, index = pos >> PAGE_SHIFT; if (ext4_nonda_switch(inode->i_sb) || ext4_verity_in_progress(inode)) { + ret = ext4_write_begin(file, mapping, pos, len, foliop, fsdata); *fsdata = (void *)FALL_BACK_TO_NONDELALLOC; - return ext4_write_begin(file, mapping, pos, - len, foliop, fsdata); + return ret; } + + iocb_flag = (int)(uintptr_t)(*fsdata); + if (iocb_flag & IOCB_DONTCACHE) + fgp |= FGP_DONTCACHE; + *fsdata = (void *)0; trace_ext4_da_write_begin(inode, pos, len); @@ -2945,7 +2958,7 @@ static int ext4_da_write_begin(struct file *file, struct address_space *mapping, } retry: - folio = __filemap_get_folio(mapping, index, FGP_WRITEBEGIN, + folio = __filemap_get_folio(mapping, index, fgp, mapping_gfp_mask(mapping)); if (IS_ERR(folio)) return PTR_ERR(folio); -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen @ 2025-05-14 11:52 ` Theodore Ts'o 0 siblings, 0 replies; 10+ messages in thread From: Theodore Ts'o @ 2025-05-14 11:52 UTC (permalink / raw) To: 陈涛涛 Taotao Chen Cc: adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org On Mon, Apr 21, 2025 at 10:50:30AM +0000, 陈涛涛 Taotao Chen wrote: > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 94c7d2d828a6..787dd152a47e 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -1147,16 +1147,22 @@ static int ext4_write_begin(struct file *file, struct address_space *mapping, > { > struct inode *inode = mapping->host; > int ret, needed_blocks; > + int iocb_flag; > handle_t *handle; > int retries = 0; > struct folio *folio; > pgoff_t index; > + fgf_t fgp = FGP_WRITEBEGIN; > unsigned from, to; > > ret = ext4_emergency_state(inode->i_sb); > if (unlikely(ret)) > return ret; > > + iocb_flag = (int)(uintptr_t)(*fsdata); > + if (iocb_flag & IOCB_DONTCACHE) > + fgp |= FGP_DONTCACHE; > + See my comment against the first patch in this series. It *should* be possible to solve the problem just for ext4 by adding this line here: *fsdata = (void *)0; The problem is that it's super-fragile, since how *fsdata gets used changes at different points in time, so it makes code review and maintenance more difficult. (As evidenced by the fact that you missed this; this is not a criticism on your programming ability, but rather for the design choise of overloading the use of *fsdata. This is a trap that someone else might fall into when doing future code changes.) And of course, the question is whether PATCH 1/3 could potentially break other file systems. We would need audit all of the other *_write_begin() functions, and then document this for the sake of future file system developers that might want to change their write_begin() function. This is why my preference would be to add an extra flags paramter to write_begin(), but that is going to be a lot more work. Cheers, - Ted ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/3] ext4: support FOP_DONTCACHE flag 2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen @ 2025-04-21 10:50 ` 陈涛涛 Taotao Chen 2 siblings, 0 replies; 10+ messages in thread From: 陈涛涛 Taotao Chen @ 2025-04-21 10:50 UTC (permalink / raw) To: tytso@mit.edu, adilger.kernel@dilger.ca, akpm@linux-foundation.org, willy@infradead.org Cc: linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, 陈涛涛 Taotao Chen From: Taotao Chen <chentaotao@didiglobal.com> After the core logic for handling IOCB_DONTCACHE was implemented in the previous patch, we now formally enable the feature by adding FOP_DONTCACHE to ext4's file operations flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> --- fs/ext4/file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/file.c b/fs/ext4/file.c index beb078ee4811..c514903d85c7 100644 --- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -977,7 +977,7 @@ const struct file_operations ext4_file_operations = { .splice_write = iter_file_splice_write, .fallocate = ext4_fallocate, .fop_flags = FOP_MMAP_SYNC | FOP_BUFFER_RASYNC | - FOP_DIO_PARALLEL_WRITE, + FOP_DIO_PARALLEL_WRITE | FOP_DONTCACHE, }; const struct inode_operations ext4_file_inode_operations = { -- 2.34.1 ^ permalink raw reply related [flat|nested] 10+ messages in thread
end of thread, other threads:[~2025-05-18 14:07 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-21 10:50 [PATCH 0/3] ext4: Implement IOCB_DONTCACHE handling 陈涛涛 Taotao Chen 2025-04-21 10:50 ` [PATCH 1/3] mm/filemap: initialize fsdata with iocb->ki_flags 陈涛涛 Taotao Chen 2025-05-14 3:51 ` Theodore Ts'o 2025-05-14 13:38 ` Matthew Wilcox 2025-05-14 14:23 ` Theodore Ts'o 2025-05-14 15:06 ` Christoph Hellwig 2025-05-18 14:01 ` Taotao Chen 2025-04-21 10:50 ` [PATCH 2/3] ext4: implement IOCB_DONTCACHE handling in write operations 陈涛涛 Taotao Chen 2025-05-14 11:52 ` Theodore Ts'o 2025-04-21 10:50 ` [PATCH 3/3] ext4: support FOP_DONTCACHE flag 陈涛涛 Taotao Chen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).