* [PATCH] fuse: move page cache invalidation after AIO to workqueue @ 2026-03-03 10:23 Bernd Schubert 2026-03-03 12:03 ` Jingbo Xu 0 siblings, 1 reply; 5+ messages in thread From: Bernd Schubert @ 2026-03-03 10:23 UTC (permalink / raw) To: Miklos Szeredi Cc: linux-fsdevel, linux-kernel, Cheng Ding, Jingbo Xu, Bernd Schubert From: Cheng Ding <cding@ddn.com> Invalidating the page cache in fuse_aio_complete() causes deadlock. Call Trace: <TASK> __schedule+0x27c/0x6b0 schedule+0x33/0x110 io_schedule+0x46/0x80 folio_wait_bit_common+0x136/0x330 __folio_lock+0x17/0x30 invalidate_inode_pages2_range+0x1d2/0x4f0 fuse_aio_complete+0x258/0x270 [fuse] fuse_aio_complete_req+0x87/0xd0 [fuse] fuse_request_end+0x18e/0x200 [fuse] fuse_uring_req_end+0x87/0xd0 [fuse] fuse_uring_cmd+0x241/0xf20 [fuse] io_uring_cmd+0x9f/0x140 io_issue_sqe+0x193/0x410 io_submit_sqes+0x128/0x3e0 __do_sys_io_uring_enter+0x2ea/0x490 __x64_sys_io_uring_enter+0x22/0x40 Move the invalidate_inode_pages2_range() call to a workqueue worker to avoid this issue. This approach is similar to iomap_dio_bio_end_io(). (Minor edit by Bernd to avoid a merge conflict in Miklos' for-next branch). The commit is based on that branch with the addition of https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) Cc: Jingbo Xu <jefflexu@linux.alibaba.com> Signed-off-by: Cheng Ding <cding@ddn.com> Signed-off-by: Bernd Schubert <bschubert@ddn.com> --- fs/fuse/file.c | 39 +++++++++++++++++++++++++++++---------- fs/fuse/fuse_i.h | 1 + 2 files changed, 30 insertions(+), 10 deletions(-) diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 64282c68d1ec7e4616e51735c1c0e8f2ec29cfad..b16515e3b42d33795ad45cf1e374ffab674714f7 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -23,6 +23,8 @@ #include <linux/task_io_accounting_ops.h> #include <linux/iomap.h> +int sb_init_dio_done_wq(struct super_block *sb); + static int fuse_send_open(struct fuse_mount *fm, u64 nodeid, unsigned int open_flags, int opcode, struct fuse_open_out *outargp) @@ -635,6 +637,19 @@ static ssize_t fuse_get_res_by_io(struct fuse_io_priv *io) return io->bytes < 0 ? io->size : io->bytes; } +static void fuse_aio_invalidate_worker(struct work_struct *work) +{ + struct fuse_io_priv *io = container_of(work, struct fuse_io_priv, work); + struct address_space *mapping = io->iocb->ki_filp->f_mapping; + ssize_t res = fuse_get_res_by_io(io); + pgoff_t start = io->offset >> PAGE_SHIFT; + pgoff_t end = (io->offset + res - 1) >> PAGE_SHIFT; + + invalidate_inode_pages2_range(mapping, start, end); + io->iocb->ki_complete(io->iocb, res); + kref_put(&io->refcnt, fuse_io_release); +} + /* * In case of short read, the caller sets 'pos' to the position of * actual end of fuse request in IO request. Otherwise, if bytes_requested @@ -667,28 +682,32 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) spin_unlock(&io->lock); if (!left && !io->blocking) { + struct inode *inode = file_inode(io->iocb->ki_filp); + struct address_space *mapping = io->iocb->ki_filp->f_mapping; ssize_t res = fuse_get_res_by_io(io); if (res >= 0) { - struct inode *inode = file_inode(io->iocb->ki_filp); struct fuse_conn *fc = get_fuse_conn(inode); struct fuse_inode *fi = get_fuse_inode(inode); - struct address_space *mapping = io->iocb->ki_filp->f_mapping; + spin_lock(&fi->lock); + fi->attr_version = atomic64_inc_return(&fc->attr_version); + spin_unlock(&fi->lock); + } + + if (io->write && res > 0 && mapping->nrpages) { /* * As in generic_file_direct_write(), invalidate after the * write, to invalidate read-ahead cache that may have competed * with the write. */ - if (io->write && res && mapping->nrpages) { - invalidate_inode_pages2_range(mapping, - io->offset >> PAGE_SHIFT, - (io->offset + res - 1) >> PAGE_SHIFT); + if (!inode->i_sb->s_dio_done_wq) + res = sb_init_dio_done_wq(inode->i_sb); + if (res >= 0) { + INIT_WORK(&io->work, fuse_aio_invalidate_worker); + queue_work(inode->i_sb->s_dio_done_wq, &io->work); + return; } - - spin_lock(&fi->lock); - fi->attr_version = atomic64_inc_return(&fc->attr_version); - spin_unlock(&fi->lock); } io->iocb->ki_complete(io->iocb, res); diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index 7f16049387d15e869db4be23a93605098588eda9..6e8c8cf6b2c82163acbfbd15c44b849898f945c1 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -377,6 +377,7 @@ union fuse_file_args { /** The request IO state (for asynchronous processing) */ struct fuse_io_priv { struct kref refcnt; + struct work_struct work; int async; spinlock_t lock; unsigned reqs; --- base-commit: c8724f58a948da8be255e407d4623feaf76fe7da change-id: 20260303-async-dio-aio-cache-invalidation-9974bebd5869 Best regards, -- Bernd Schubert <bschubert@ddn.com> ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: move page cache invalidation after AIO to workqueue 2026-03-03 10:23 [PATCH] fuse: move page cache invalidation after AIO to workqueue Bernd Schubert @ 2026-03-03 12:03 ` Jingbo Xu 2026-03-03 12:37 ` Bernd Schubert [not found] ` <LV1PR19MB88707F9B1F84DC199F99BF25BC7AA@LV1PR19MB8870.namprd19.prod.outlook.com> 0 siblings, 2 replies; 5+ messages in thread From: Jingbo Xu @ 2026-03-03 12:03 UTC (permalink / raw) To: Bernd Schubert, Miklos Szeredi; +Cc: linux-fsdevel, linux-kernel, Cheng Ding On 3/3/26 6:23 PM, Bernd Schubert wrote: > From: Cheng Ding <cding@ddn.com> > > Invalidating the page cache in fuse_aio_complete() causes deadlock. > Call Trace: > <TASK> > __schedule+0x27c/0x6b0 > schedule+0x33/0x110 > io_schedule+0x46/0x80 > folio_wait_bit_common+0x136/0x330 > __folio_lock+0x17/0x30 > invalidate_inode_pages2_range+0x1d2/0x4f0 > fuse_aio_complete+0x258/0x270 [fuse] > fuse_aio_complete_req+0x87/0xd0 [fuse] > fuse_request_end+0x18e/0x200 [fuse] > fuse_uring_req_end+0x87/0xd0 [fuse] > fuse_uring_cmd+0x241/0xf20 [fuse] > io_uring_cmd+0x9f/0x140 > io_issue_sqe+0x193/0x410 > io_submit_sqes+0x128/0x3e0 > __do_sys_io_uring_enter+0x2ea/0x490 > __x64_sys_io_uring_enter+0x22/0x40 > > Move the invalidate_inode_pages2_range() call to a workqueue worker > to avoid this issue. This approach is similar to > iomap_dio_bio_end_io(). > > (Minor edit by Bernd to avoid a merge conflict in Miklos' for-next > branch). The commit is based on that branch with the addition of > https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) I think it would be better to completely drop my previous patch and rework on the bare ground, as the patch (https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) is only in Miklos's branch, not merged to the master yet. After reverting my previous patch, I think it would be cleaner by: "The page cache invalidation for FOPEN_DIRECT_IO write in fuse_direct_io() is moved to fuse_direct_write_iter() (with any progress in write), to keep consistent with generic_file_direct_write(). This covers the scenarios of both synchronous FOPEN_DIRECT_IO write (regardless FUSE_ASYNC_DIO) and asynchronous FOPEN_DIRECT_IO write without FUSE_ASYNC_DIO. After that, only asynchronous direct write (for both FOPEN_DIRECT_IO and non-FOPEN_DIRECT_IO) with FUSE_ASYNC_DIO is left." ``` @@ -1736,15 +1760,6 @@ ssize_t fuse_direct_io(struct fuse_io_priv *io, struct iov_iter *iter, if (res > 0) *ppos = pos; - if (res > 0 && write && fopen_direct_io) { - /* - * As in generic_file_direct_write(), invalidate after - * write, to invalidate read-ahead cache that may have - * with the write. - */ - invalidate_inode_pages2_range(mapping, idx_from, idx_to); - } - return res > 0 ? res : err; } EXPORT_SYMBOL_GPL(fuse_direct_io); @@ -1799,6 +1814,14 @@ static ssize_t fuse_direct_write_iter(struct kiocb *iocb, struct iov_iter *from) FUSE_DIO_WRITE); fuse_write_update_attr(inode, iocb->ki_pos, res); } + + /* + * As in generic_file_direct_write(), invalidate after + * write, to invalidate read-ahead cache that may have + * with the write. + */ + if (res > 0) + kiocb_invalidate_post_direct_write(iocb, res); } fuse_dio_unlock(iocb, exclusive); ``` > > Cc: Jingbo Xu <jefflexu@linux.alibaba.com> > Signed-off-by: Cheng Ding <cding@ddn.com> > Signed-off-by: Bernd Schubert <bschubert@ddn.com> > --- > fs/fuse/file.c | 39 +++++++++++++++++++++++++++++---------- > fs/fuse/fuse_i.h | 1 + > 2 files changed, 30 insertions(+), 10 deletions(-) > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c > index 64282c68d1ec7e4616e51735c1c0e8f2ec29cfad..b16515e3b42d33795ad45cf1e374ffab674714f7 100644 > --- a/fs/fuse/file.c > +++ b/fs/fuse/file.c > @@ -23,6 +23,8 @@ > #include <linux/task_io_accounting_ops.h> > #include <linux/iomap.h> > > +int sb_init_dio_done_wq(struct super_block *sb); > + #include "../internal.h" ? > static int fuse_send_open(struct fuse_mount *fm, u64 nodeid, > unsigned int open_flags, int opcode, > struct fuse_open_out *outargp) > @@ -635,6 +637,19 @@ static ssize_t fuse_get_res_by_io(struct fuse_io_priv *io) > return io->bytes < 0 ? io->size : io->bytes; > } > > +static void fuse_aio_invalidate_worker(struct work_struct *work) > +{ > + struct fuse_io_priv *io = container_of(work, struct fuse_io_priv, work); > + struct address_space *mapping = io->iocb->ki_filp->f_mapping; > + ssize_t res = fuse_get_res_by_io(io); > + pgoff_t start = io->offset >> PAGE_SHIFT; > + pgoff_t end = (io->offset + res - 1) >> PAGE_SHIFT; > + > + invalidate_inode_pages2_range(mapping, start, end); > + io->iocb->ki_complete(io->iocb, res); > + kref_put(&io->refcnt, fuse_io_release); > +} > + > /* > * In case of short read, the caller sets 'pos' to the position of > * actual end of fuse request in IO request. Otherwise, if bytes_requested > @@ -667,28 +682,32 @@ static void fuse_aio_complete(struct fuse_io_priv *io, int err, ssize_t pos) > spin_unlock(&io->lock); > > if (!left && !io->blocking) { > + struct inode *inode = file_inode(io->iocb->ki_filp); > + struct address_space *mapping = io->iocb->ki_filp->f_mapping; > ssize_t res = fuse_get_res_by_io(io); > > if (res >= 0) { > - struct inode *inode = file_inode(io->iocb->ki_filp); > struct fuse_conn *fc = get_fuse_conn(inode); > struct fuse_inode *fi = get_fuse_inode(inode); > - struct address_space *mapping = io->iocb->ki_filp->f_mapping; > > + spin_lock(&fi->lock); > + fi->attr_version = atomic64_inc_return(&fc->attr_version); > + spin_unlock(&fi->lock); > + } > + > + if (io->write && res > 0 && mapping->nrpages) { > /* > * As in generic_file_direct_write(), invalidate after the > * write, to invalidate read-ahead cache that may have competed > * with the write. > */ > - if (io->write && res && mapping->nrpages) { > - invalidate_inode_pages2_range(mapping, > - io->offset >> PAGE_SHIFT, > - (io->offset + res - 1) >> PAGE_SHIFT); > + if (!inode->i_sb->s_dio_done_wq) > + res = sb_init_dio_done_wq(inode->i_sb); Better to call sb_init_dio_done_wq() from fuse_direct_IO(), and fail the IO directly if sb_init_dio_done_wq() fails. > + if (res >= 0) { > + INIT_WORK(&io->work, fuse_aio_invalidate_worker); > + queue_work(inode->i_sb->s_dio_done_wq, &io->work); > + return; > } Otherwise, the page cache invalidation would be missed if the previous sb_init_dio_done_wq() fails. > - > - spin_lock(&fi->lock); > - fi->attr_version = atomic64_inc_return(&fc->attr_version); > - spin_unlock(&fi->lock); > } > > io->iocb->ki_complete(io->iocb, res); > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h > index 7f16049387d15e869db4be23a93605098588eda9..6e8c8cf6b2c82163acbfbd15c44b849898f945c1 100644 > --- a/fs/fuse/fuse_i.h > +++ b/fs/fuse/fuse_i.h > @@ -377,6 +377,7 @@ union fuse_file_args { > /** The request IO state (for asynchronous processing) */ > struct fuse_io_priv { > struct kref refcnt; > + struct work_struct work; > int async; > spinlock_t lock; > unsigned reqs; -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: move page cache invalidation after AIO to workqueue 2026-03-03 12:03 ` Jingbo Xu @ 2026-03-03 12:37 ` Bernd Schubert 2026-03-03 14:16 ` Jingbo Xu [not found] ` <LV1PR19MB88707F9B1F84DC199F99BF25BC7AA@LV1PR19MB8870.namprd19.prod.outlook.com> 1 sibling, 1 reply; 5+ messages in thread From: Bernd Schubert @ 2026-03-03 12:37 UTC (permalink / raw) To: Jingbo Xu, Bernd Schubert, Miklos Szeredi Cc: linux-fsdevel, linux-kernel, Cheng Ding On 3/3/26 13:03, Jingbo Xu wrote: > > > On 3/3/26 6:23 PM, Bernd Schubert wrote: >> From: Cheng Ding <cding@ddn.com> >> >> Invalidating the page cache in fuse_aio_complete() causes deadlock. >> Call Trace: >> <TASK> >> __schedule+0x27c/0x6b0 >> schedule+0x33/0x110 >> io_schedule+0x46/0x80 >> folio_wait_bit_common+0x136/0x330 >> __folio_lock+0x17/0x30 >> invalidate_inode_pages2_range+0x1d2/0x4f0 >> fuse_aio_complete+0x258/0x270 [fuse] >> fuse_aio_complete_req+0x87/0xd0 [fuse] >> fuse_request_end+0x18e/0x200 [fuse] >> fuse_uring_req_end+0x87/0xd0 [fuse] >> fuse_uring_cmd+0x241/0xf20 [fuse] >> io_uring_cmd+0x9f/0x140 >> io_issue_sqe+0x193/0x410 >> io_submit_sqes+0x128/0x3e0 >> __do_sys_io_uring_enter+0x2ea/0x490 >> __x64_sys_io_uring_enter+0x22/0x40 >> >> Move the invalidate_inode_pages2_range() call to a workqueue worker >> to avoid this issue. This approach is similar to >> iomap_dio_bio_end_io(). >> >> (Minor edit by Bernd to avoid a merge conflict in Miklos' for-next >> branch). The commit is based on that branch with the addition of >> https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) > > I think it would be better to completely drop my previous patch and > rework on the bare ground, as the patch > (https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) > is only in Miklos's branch, not merged to the master yet. > > > After reverting my previous patch, I think it would be cleaner by: > > > "The page cache invalidation for FOPEN_DIRECT_IO write in > fuse_direct_io() is moved to fuse_direct_write_iter() (with any progress > in write), to keep consistent with generic_file_direct_write(). This > covers the scenarios of both synchronous FOPEN_DIRECT_IO write > (regardless FUSE_ASYNC_DIO) and asynchronous FOPEN_DIRECT_IO write > without FUSE_ASYNC_DIO. > > After that, only asynchronous direct write (for both FOPEN_DIRECT_IO and > non-FOPEN_DIRECT_IO) with FUSE_ASYNC_DIO is left." I think your suggestion moves into this direction https://lore.kernel.org/all/20230918150313.3845114-1-bschubert@ddn.com/ Thanks, Bernd ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] fuse: move page cache invalidation after AIO to workqueue 2026-03-03 12:37 ` Bernd Schubert @ 2026-03-03 14:16 ` Jingbo Xu 0 siblings, 0 replies; 5+ messages in thread From: Jingbo Xu @ 2026-03-03 14:16 UTC (permalink / raw) To: Bernd Schubert, Bernd Schubert, Miklos Szeredi Cc: linux-fsdevel, linux-kernel, Cheng Ding On 3/3/26 8:37 PM, Bernd Schubert wrote: > > > On 3/3/26 13:03, Jingbo Xu wrote: >> >> >> On 3/3/26 6:23 PM, Bernd Schubert wrote: >>> From: Cheng Ding <cding@ddn.com> >>> >>> Invalidating the page cache in fuse_aio_complete() causes deadlock. >>> Call Trace: >>> <TASK> >>> __schedule+0x27c/0x6b0 >>> schedule+0x33/0x110 >>> io_schedule+0x46/0x80 >>> folio_wait_bit_common+0x136/0x330 >>> __folio_lock+0x17/0x30 >>> invalidate_inode_pages2_range+0x1d2/0x4f0 >>> fuse_aio_complete+0x258/0x270 [fuse] >>> fuse_aio_complete_req+0x87/0xd0 [fuse] >>> fuse_request_end+0x18e/0x200 [fuse] >>> fuse_uring_req_end+0x87/0xd0 [fuse] >>> fuse_uring_cmd+0x241/0xf20 [fuse] >>> io_uring_cmd+0x9f/0x140 >>> io_issue_sqe+0x193/0x410 >>> io_submit_sqes+0x128/0x3e0 >>> __do_sys_io_uring_enter+0x2ea/0x490 >>> __x64_sys_io_uring_enter+0x22/0x40 >>> >>> Move the invalidate_inode_pages2_range() call to a workqueue worker >>> to avoid this issue. This approach is similar to >>> iomap_dio_bio_end_io(). >>> >>> (Minor edit by Bernd to avoid a merge conflict in Miklos' for-next >>> branch). The commit is based on that branch with the addition of >>> https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) >> >> I think it would be better to completely drop my previous patch and >> rework on the bare ground, as the patch >> (https://lore.kernel.org/r/20260111073701.6071-1-jefflexu@linux.alibaba.com) >> is only in Miklos's branch, not merged to the master yet. >> >> >> After reverting my previous patch, I think it would be cleaner by: >> >> >> "The page cache invalidation for FOPEN_DIRECT_IO write in >> fuse_direct_io() is moved to fuse_direct_write_iter() (with any progress >> in write), to keep consistent with generic_file_direct_write(). This >> covers the scenarios of both synchronous FOPEN_DIRECT_IO write >> (regardless FUSE_ASYNC_DIO) and asynchronous FOPEN_DIRECT_IO write >> without FUSE_ASYNC_DIO. >> >> After that, only asynchronous direct write (for both FOPEN_DIRECT_IO and >> non-FOPEN_DIRECT_IO) with FUSE_ASYNC_DIO is left." > > I think your suggestion moves into this direction > > https://lore.kernel.org/all/20230918150313.3845114-1-bschubert@ddn.com/ > Yes it's similar in some way, but it's still simple enough as the short term fix. -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <LV1PR19MB88707F9B1F84DC199F99BF25BC7AA@LV1PR19MB8870.namprd19.prod.outlook.com>]
* Re: [PATCH] fuse: move page cache invalidation after AIO to workqueue [not found] ` <LV1PR19MB88707F9B1F84DC199F99BF25BC7AA@LV1PR19MB8870.namprd19.prod.outlook.com> @ 2026-03-06 14:10 ` Jingbo Xu 0 siblings, 0 replies; 5+ messages in thread From: Jingbo Xu @ 2026-03-06 14:10 UTC (permalink / raw) To: Cheng Ding, Bernd Schubert, Miklos Szeredi Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org On 3/6/26 6:11 PM, Cheng Ding wrote: > >> After reverting my previous patch, I think it would be cleaner by: >> >> >> "The page cache invalidation for FOPEN_DIRECT_IO write in >> fuse_direct_io() is moved to fuse_direct_write_iter() (with any progress >> in write), to keep consistent with generic_file_direct_write(). This >> covers the scenarios of both synchronous FOPEN_DIRECT_IO write >> (regardless FUSE_ASYNC_DIO) and asynchronous FOPEN_DIRECT_IO write >> without FUSE_ASYNC_DIO. > > This suggestion sounds good to me. > >> >> >>> >>> +int sb_init_dio_done_wq(struct super_block *sb); >>> + >> >> #include "../internal.h” ? > > We prefer to keep FUSE independent from other parts of the kernel. This way, we can create DKMS packages for the FUSE kernel module. > >>> + if (!inode->i_sb->s_dio_done_wq) >>> + res = sb_init_dio_done_wq(inode->i_sb); >> >> Better to call sb_init_dio_done_wq() from fuse_direct_IO(), and fail the >> IO directly if sb_init_dio_done_wq() fails. >> >>> + if (res >= 0) { >>> + INIT_WORK(&io->work, fuse_aio_invalidate_worker); >>> + queue_work(inode->i_sb->s_dio_done_wq, &io->work); >> + return; >> } >> >> Otherwise, the page cache invalidation would be missed if the previous >> sb_init_dio_done_wq() fails. > > If sb_init_dio_done_wq() fails, res contains the error code, which will be passed to iocb->ki_complete(). However, I can change this if you still prefer to do that in fuse_direct_IO(). > Yes I still prefer initializing sb->s_dio_done_wq in advance in fuse_direct_IO(). Otherwise even you fail the direct IO on failure of sb_init_dio_done_wq(), the data has been written to the file. -- Thanks, Jingbo ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-03-06 14:10 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-03 10:23 [PATCH] fuse: move page cache invalidation after AIO to workqueue Bernd Schubert
2026-03-03 12:03 ` Jingbo Xu
2026-03-03 12:37 ` Bernd Schubert
2026-03-03 14:16 ` Jingbo Xu
[not found] ` <LV1PR19MB88707F9B1F84DC199F99BF25BC7AA@LV1PR19MB8870.namprd19.prod.outlook.com>
2026-03-06 14:10 ` Jingbo Xu
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.