* [PATCH] fuse: when copying a folio delay the mark dirty until the end @ 2026-03-16 15:16 Horst Birthelmer 2026-03-16 17:29 ` Joanne Koong 2026-03-26 6:35 ` kernel test robot 0 siblings, 2 replies; 14+ messages in thread From: Horst Birthelmer @ 2026-03-16 15:16 UTC (permalink / raw) To: Miklos Szeredi, Joanne Koong Cc: Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer From: Horst Birthelmer <hbirthelmer@ddn.com> Doing set_page_dirty_lock() for every page is inefficient for large folios. When copying a folio (and with large folios enabled, this can be many pages) we can delay the marking dirty and flush_dcache_page() until the whole folio is handled and do it once per folio instead of once per page. Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> --- Currently when doing a folio copy flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg) are called for every page. We can do this at the end for the whole folio. --- fs/fuse/dev.c | 9 +++++++-- fs/fuse/fuse_dev_i.h | 1 + 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c index 0b0241f47170d4640f0b8f3cae8be1f78944a456..ae96a48f898e883b4e96147f3b27398261c5e844 100644 --- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -855,7 +855,7 @@ void fuse_copy_finish(struct fuse_copy_state *cs) buf->len = PAGE_SIZE - cs->len; cs->currbuf = NULL; } else if (cs->pg) { - if (cs->write) { + if (cs->write && !cs->copy_folio) { flush_dcache_page(cs->pg); set_page_dirty_lock(cs->pg); } @@ -1126,6 +1126,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop, folio_zero_range(folio, 0, size); } + cs->copy_folio = true; while (count) { if (cs->write && cs->pipebufs && folio) { /* @@ -1167,8 +1168,12 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop, } else offset += fuse_copy_do(cs, NULL, &count); } - if (folio && !cs->write) + if (folio) { flush_dcache_folio(folio); + if (cs->write) + folio_mark_dirty_lock(folio); + } + cs->copy_folio = false; return 0; } diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h index 134bf44aff0d39ae8d5d47cf1518efcf2f1cfc23..4a433d902266d573ad1c19adbdd573440e2a77b2 100644 --- a/fs/fuse/fuse_dev_i.h +++ b/fs/fuse/fuse_dev_i.h @@ -33,6 +33,7 @@ struct fuse_copy_state { unsigned int offset; bool write:1; bool move_folios:1; + bool copy_folio:1; bool is_uring:1; struct { unsigned int copied_sz; /* copied size into the user buffer */ --- base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c change-id: 20260316-mark-dirty-per-folio-be87b6b4bf56 Best regards, -- Horst Birthelmer <hbirthelmer@ddn.com> ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer @ 2026-03-16 17:29 ` Joanne Koong 2026-03-16 20:02 ` Horst Birthelmer 2026-03-26 6:35 ` kernel test robot 1 sibling, 1 reply; 14+ messages in thread From: Joanne Koong @ 2026-03-16 17:29 UTC (permalink / raw) To: Horst Birthelmer Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote: > > From: Horst Birthelmer <hbirthelmer@ddn.com> > > Doing set_page_dirty_lock() for every page is inefficient > for large folios. > When copying a folio (and with large folios enabled, > this can be many pages) we can delay the marking dirty > and flush_dcache_page() until the whole folio is handled > and do it once per folio instead of once per page. > > Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> > --- > Currently when doing a folio copy > flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg) > are called for every page. > > We can do this at the end for the whole folio. Hi Horst, I think these are two different entities. cs->pg is the page that corresponds to the userspace buffer / pipe while the (large) folio corresponds to the pages in the page cache. flush_dcache_folio(folio) and flush_dcache_page(cs->pg) are not interchangeable (I don't think it's likely either that the pages backing the userspace buffer/pipe are large folios). Thanks, Joanne > --- > fs/fuse/dev.c | 9 +++++++-- > fs/fuse/fuse_dev_i.h | 1 + > 2 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c > index 0b0241f47170d4640f0b8f3cae8be1f78944a456..ae96a48f898e883b4e96147f3b27398261c5e844 100644 > --- a/fs/fuse/dev.c > +++ b/fs/fuse/dev.c > @@ -855,7 +855,7 @@ void fuse_copy_finish(struct fuse_copy_state *cs) > buf->len = PAGE_SIZE - cs->len; > cs->currbuf = NULL; > } else if (cs->pg) { > - if (cs->write) { > + if (cs->write && !cs->copy_folio) { > flush_dcache_page(cs->pg); > set_page_dirty_lock(cs->pg); > } > @@ -1126,6 +1126,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop, > folio_zero_range(folio, 0, size); > } > > + cs->copy_folio = true; > while (count) { > if (cs->write && cs->pipebufs && folio) { > /* > @@ -1167,8 +1168,12 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop, > } else > offset += fuse_copy_do(cs, NULL, &count); > } > - if (folio && !cs->write) > + if (folio) { > flush_dcache_folio(folio); > + if (cs->write) > + folio_mark_dirty_lock(folio); > + } > + cs->copy_folio = false; > return 0; > } > > diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h > index 134bf44aff0d39ae8d5d47cf1518efcf2f1cfc23..4a433d902266d573ad1c19adbdd573440e2a77b2 100644 > --- a/fs/fuse/fuse_dev_i.h > +++ b/fs/fuse/fuse_dev_i.h > @@ -33,6 +33,7 @@ struct fuse_copy_state { > unsigned int offset; > bool write:1; > bool move_folios:1; > + bool copy_folio:1; > bool is_uring:1; > struct { > unsigned int copied_sz; /* copied size into the user buffer */ > > --- > base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c > change-id: 20260316-mark-dirty-per-folio-be87b6b4bf56 > > Best regards, > -- > Horst Birthelmer <hbirthelmer@ddn.com> > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-16 17:29 ` Joanne Koong @ 2026-03-16 20:02 ` Horst Birthelmer 2026-03-16 22:06 ` Joanne Koong 0 siblings, 1 reply; 14+ messages in thread From: Horst Birthelmer @ 2026-03-16 20:02 UTC (permalink / raw) To: Joanne Koong Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer On Mon, Mar 16, 2026 at 10:29:52AM -0700, Joanne Koong wrote: > On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote: > > > > From: Horst Birthelmer <hbirthelmer@ddn.com> > > > > Doing set_page_dirty_lock() for every page is inefficient > > for large folios. > > When copying a folio (and with large folios enabled, > > this can be many pages) we can delay the marking dirty > > and flush_dcache_page() until the whole folio is handled > > and do it once per folio instead of once per page. > > > > Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> > > --- > > Currently when doing a folio copy > > flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg) > > are called for every page. > > > > We can do this at the end for the whole folio. > > Hi Horst, > > I think these are two different entities. cs->pg is the page that > corresponds to the userspace buffer / pipe while the (large) folio > corresponds to the pages in the page cache. flush_dcache_folio(folio) > and flush_dcache_page(cs->pg) are not interchangeable (I don't think > it's likely either that the pages backing the userspace buffer/pipe > are large folios). > > Thanks, > Joanne Hi Joanne, I feel a bit embarassed ... but you are completely right. I was interested in solving this case: fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages() fuse_copy_init(&cs, true, &iter) ← cs->write = TRUE fuse_copy_args(&cs, num_args, args->in_pages, ...) if (args->in_pages) fuse_copy_folios(cs, arg->size, 0) fuse_copy_folio(cs, &ap->folios[i], ...) when we have large folios But those are not the same. Thanks, Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-16 20:02 ` Horst Birthelmer @ 2026-03-16 22:06 ` Joanne Koong 2026-03-18 14:03 ` Horst Birthelmer 0 siblings, 1 reply; 14+ messages in thread From: Joanne Koong @ 2026-03-16 22:06 UTC (permalink / raw) To: Horst Birthelmer Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer On Mon, Mar 16, 2026 at 1:02 PM Horst Birthelmer <horst@birthelmer.de> wrote: > > On Mon, Mar 16, 2026 at 10:29:52AM -0700, Joanne Koong wrote: > > On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote: > > > > > > From: Horst Birthelmer <hbirthelmer@ddn.com> > > > > > > Doing set_page_dirty_lock() for every page is inefficient > > > for large folios. > > > When copying a folio (and with large folios enabled, > > > this can be many pages) we can delay the marking dirty > > > and flush_dcache_page() until the whole folio is handled > > > and do it once per folio instead of once per page. > > > > > > Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com> > > > --- > > > Currently when doing a folio copy > > > flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg) > > > are called for every page. > > > > > > We can do this at the end for the whole folio. > > > > Hi Horst, > > > > I think these are two different entities. cs->pg is the page that > > corresponds to the userspace buffer / pipe while the (large) folio > > corresponds to the pages in the page cache. flush_dcache_folio(folio) > > and flush_dcache_page(cs->pg) are not interchangeable (I don't think > > it's likely either that the pages backing the userspace buffer/pipe > > are large folios). > > > > Thanks, > > Joanne > > Hi Joanne, > > I feel a bit embarassed ... but you are completely right. > I was interested in solving this case: > > fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages() > fuse_copy_init(&cs, true, &iter) ← cs->write = TRUE > fuse_copy_args(&cs, num_args, args->in_pages, ...) > if (args->in_pages) > fuse_copy_folios(cs, arg->size, 0) > fuse_copy_folio(cs, &ap->folios[i], ...) > > when we have large folios No worries, the naming doesn't make the distinction obvious at all. For copying out large folios right now, the copy is still page by page due to extracting 1 userspace buffer page at a time (eg the iov_iter_get_pages2(... PAGE_SIZE, 1, ...) call in fuse_copy_fill()). If we pass in a pages array, iov_iter_getpages2 is able to extract multiple pages at a time and save extra overhead with the GUP setup / irq save+restore / pagetable walk and the extra req->waitq locking/unlocking calls, but when I benchmarked it last year I didn't see any noticeable performance improvements from doing this. The extra complexity didn't seem worth it. For optimized copying, I think in the future high-performance servers will mostly just use fuse-over-iouring zero-copy. Thanks, Joanne > > But those are not the same. > > Thanks, > Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-16 22:06 ` Joanne Koong @ 2026-03-18 14:03 ` Horst Birthelmer 2026-03-18 21:19 ` Joanne Koong 0 siblings, 1 reply; 14+ messages in thread From: Horst Birthelmer @ 2026-03-18 14:03 UTC (permalink / raw) To: Joanne Koong Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer On Mon, Mar 16, 2026 at 03:06:02PM -0700, Joanne Koong wrote: > On Mon, Mar 16, 2026 at 1:02 PM Horst Birthelmer <horst@birthelmer.de> wrote: > > > > > > > > Hi Horst, > > > > > > I think these are two different entities. cs->pg is the page that > > > corresponds to the userspace buffer / pipe while the (large) folio > > > corresponds to the pages in the page cache. flush_dcache_folio(folio) > > > and flush_dcache_page(cs->pg) are not interchangeable (I don't think > > > it's likely either that the pages backing the userspace buffer/pipe > > > are large folios). > > > > > > Thanks, > > > Joanne > > > > Hi Joanne, > > > > I feel a bit embarassed ... but you are completely right. > > I was interested in solving this case: > > > > fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages() > > fuse_copy_init(&cs, true, &iter) ← cs->write = TRUE > > fuse_copy_args(&cs, num_args, args->in_pages, ...) > > if (args->in_pages) > > fuse_copy_folios(cs, arg->size, 0) > > fuse_copy_folio(cs, &ap->folios[i], ...) > > > > when we have large folios > > No worries, the naming doesn't make the distinction obvious at all. > For copying out large folios right now, the copy is still page by page > due to extracting 1 userspace buffer page at a time (eg the > iov_iter_get_pages2(... PAGE_SIZE, 1, ...) call in fuse_copy_fill()). > If we pass in a pages array, iov_iter_getpages2 is able to extract > multiple pages at a time and save extra overhead with the GUP setup / > irq save+restore / pagetable walk and the extra req->waitq > locking/unlocking calls, but when I benchmarked it last year I didn't > see any noticeable performance improvements from doing this. The extra > complexity didn't seem worth it. For optimized copying, I think in the > future high-performance servers will mostly just use fuse-over-iouring > zero-copy. > > Thanks, > Joanne > Hi Joanne, I wonder, would something like this help for large folios? @@ -856,8 +856,11 @@ void fuse_copy_finish(struct fuse_copy_state *cs) cs->currbuf = NULL; } else if (cs->pg) { if (cs->write) { + struct folio *folio = page_folio(cs->pg); + flush_dcache_page(cs->pg); - set_page_dirty_lock(cs->pg); + if (!folio_test_dirty(folio)) + set_page_dirty_lock(cs->pg); } put_page(cs->pg); } Do you have seen any problems with spin locks being way too costly while doing writes? That was actually why I started looking into this. Thanks, Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-18 14:03 ` Horst Birthelmer @ 2026-03-18 21:19 ` Joanne Koong 2026-03-18 21:52 ` Bernd Schubert 0 siblings, 1 reply; 14+ messages in thread From: Joanne Koong @ 2026-03-18 21:19 UTC (permalink / raw) To: Horst Birthelmer Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > Hi Joanne, > > I wonder, would something like this help for large folios? Hi Horst, I don't think it's likely that the pages backing the userspace buffer are large folios, so I think this may actually add extra overhead with the extra folio_test_dirty() check. From what I've seen, the main cost that dwarfs everything else for writes/reads is the actual IO, the context switches, and the memcpys. I think compared to these things, the set_page_dirty_lock() cost is negligible and pretty much undetectable. Thanks, Joanne > > @@ -856,8 +856,11 @@ void fuse_copy_finish(struct fuse_copy_state *cs) > cs->currbuf = NULL; > } else if (cs->pg) { > if (cs->write) { > + struct folio *folio = page_folio(cs->pg); > + > flush_dcache_page(cs->pg); > - set_page_dirty_lock(cs->pg); > + if (!folio_test_dirty(folio)) > + set_page_dirty_lock(cs->pg); > } > put_page(cs->pg); > } > > Do you have seen any problems with spin locks being way too costly while > doing writes? > > That was actually why I started looking into this. > > Thanks, > Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-18 21:19 ` Joanne Koong @ 2026-03-18 21:52 ` Bernd Schubert 2026-03-19 1:32 ` Joanne Koong 0 siblings, 1 reply; 14+ messages in thread From: Bernd Schubert @ 2026-03-18 21:52 UTC (permalink / raw) To: Joanne Koong Cc: Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer Hi Joanne, On 3/18/26 22:19, Joanne Koong wrote: > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: >> >> Hi Joanne, >> >> I wonder, would something like this help for large folios? > > Hi Horst, > > I don't think it's likely that the pages backing the userspace buffer > are large folios, so I think this may actually add extra overhead with > the extra folio_test_dirty() check. > > From what I've seen, the main cost that dwarfs everything else for > writes/reads is the actual IO, the context switches, and the memcpys. > I think compared to these things, the set_page_dirty_lock() cost is > negligible and pretty much undetectable. a little bit background here. We see in cpu flame graphs that the spin lock taken in unlock_request() and unlock_request() takes about the same amount of CPU time as the memcpy. Interestingly, only on Intel, but not AMD CPUs. Note that we are running with out custom page pinning, which just takes the pages from an array, so iov_iter_get_pages2() is not used. The reason for that unlock/lock is documented at the end of Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we don't have that, so for now these checks are modified in our branches to avoid the lock. Although that is not upstreamable. Right solution is here to extract an array of pages and do that unlock/lock per pagevec. Next in the flame graph is setting that set_page_dirty_lock which also takes as much CPU time as the memcpy. Again, Intel CPUs only. In the combination with the above pagevec method, I think right solution is to iterate over the pages, stores the last folio and then set to dirty once per folio. Also, I disagree about that the userspace buffers are not likely large folios, see commit 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio) with SPLICE_F_MOVE". Especially Horst persistently runs into it when doing xfstests with recent kernels. I think the issue came up first time with 3.18ish. One can further enforce that by setting "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did when I tested the above commit. And actually that points out that libfuse allocations should do the madvise. I'm going to do that during the next days, maybe tomorrow. Thanks, Bernd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-18 21:52 ` Bernd Schubert @ 2026-03-19 1:32 ` Joanne Koong 2026-03-19 4:27 ` Darrick J. Wong 2026-03-19 8:32 ` Horst Birthelmer 0 siblings, 2 replies; 14+ messages in thread From: Joanne Koong @ 2026-03-19 1:32 UTC (permalink / raw) To: Bernd Schubert Cc: Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote: > > Hi Joanne, > > On 3/18/26 22:19, Joanne Koong wrote: > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > >> > >> Hi Joanne, > >> > >> I wonder, would something like this help for large folios? > > > > Hi Horst, > > > > I don't think it's likely that the pages backing the userspace buffer > > are large folios, so I think this may actually add extra overhead with > > the extra folio_test_dirty() check. > > > > From what I've seen, the main cost that dwarfs everything else for > > writes/reads is the actual IO, the context switches, and the memcpys. > > I think compared to these things, the set_page_dirty_lock() cost is > > negligible and pretty much undetectable. > > > a little bit background here. We see in cpu flame graphs that the spin > lock taken in unlock_request() and unlock_request() takes about the same > amount of CPU time as the memcpy. Interestingly, only on Intel, but not > AMD CPUs. Note that we are running with out custom page pinning, which > just takes the pages from an array, so iov_iter_get_pages2() is not used. > > The reason for that unlock/lock is documented at the end of > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we > don't have that, so for now these checks are modified in our branches to > avoid the lock. Although that is not upstreamable. Right solution is > here to extract an array of pages and do that unlock/lock per pagevec. > > Next in the flame graph is setting that set_page_dirty_lock which also > takes as much CPU time as the memcpy. Again, Intel CPUs only. > In the combination with the above pagevec method, I think right solution > is to iterate over the pages, stores the last folio and then set to > dirty once per folio. Thanks for the background context. The intel vs amd difference is interesting. The approaches you mention sound reasonable. Are you able to share the flame graph or is this easily repro-able using fio on the passthrough_hp server? > Also, I disagree about that the userspace buffers are not likely large > folios, see commit > 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio) > with SPLICE_F_MOVE". Especially Horst persistently runs into it when > doing xfstests with recent kernels. I think the issue came up first time I think that's because xfstests uses /tmp for scratch space, so the "This is easily reproducible (on 6.19) with CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y" triggers it but on production workloads I don't think it's likely that those source pages are backed by shmem/tmpfs or exist in the page cache already as a large folio as the server has no control over that. I also don't think most applications use splice, though maybe I'm wrong here. For non-splice, even if the user sets "/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in libfuse we do madvise on the buffer allocation for huge pages, that has a 2 MB granularity requirement which depends on the user system also having explicitly upped the max pages limit through the sysctl since the kernel fuse max pages limit is 256 (1 MB) by default. I don't think that is common on most servers. Thanks, Joanne > with 3.18ish. > > One can further enforce that by setting > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did > when I tested the above commit. And actually that points out that > libfuse allocations should do the madvise. I'm going to do that during > the next days, maybe tomorrow. > > > Thanks, > Bernd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-19 1:32 ` Joanne Koong @ 2026-03-19 4:27 ` Darrick J. Wong 2026-03-20 17:24 ` Joanne Koong 2026-03-19 8:32 ` Horst Birthelmer 1 sibling, 1 reply; 14+ messages in thread From: Darrick J. Wong @ 2026-03-19 4:27 UTC (permalink / raw) To: Joanne Koong Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote: > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > Hi Joanne, > > > > On 3/18/26 22:19, Joanne Koong wrote: > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > >> > > >> Hi Joanne, > > >> > > >> I wonder, would something like this help for large folios? > > > > > > Hi Horst, > > > > > > I don't think it's likely that the pages backing the userspace buffer > > > are large folios, so I think this may actually add extra overhead with > > > the extra folio_test_dirty() check. > > > > > > From what I've seen, the main cost that dwarfs everything else for > > > writes/reads is the actual IO, the context switches, and the memcpys. > > > I think compared to these things, the set_page_dirty_lock() cost is > > > negligible and pretty much undetectable. > > > > > > a little bit background here. We see in cpu flame graphs that the spin > > lock taken in unlock_request() and unlock_request() takes about the same > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not > > AMD CPUs. Note that we are running with out custom page pinning, which > > just takes the pages from an array, so iov_iter_get_pages2() is not used. > > > > The reason for that unlock/lock is documented at the end of > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we > > don't have that, so for now these checks are modified in our branches to > > avoid the lock. Although that is not upstreamable. Right solution is > > here to extract an array of pages and do that unlock/lock per pagevec. > > > > Next in the flame graph is setting that set_page_dirty_lock which also > > takes as much CPU time as the memcpy. Again, Intel CPUs only. > > In the combination with the above pagevec method, I think right solution > > is to iterate over the pages, stores the last folio and then set to > > dirty once per folio. > > Thanks for the background context. The intel vs amd difference is > interesting. The approaches you mention sound reasonable. Are you able > to share the flame graph or is this easily repro-able using fio on the > passthrough_hp server? > > > > Also, I disagree about that the userspace buffers are not likely large > > folios, see commit > > 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio) > > with SPLICE_F_MOVE". Especially Horst persistently runs into it when > > doing xfstests with recent kernels. I think the issue came up first time > > I think that's because xfstests uses /tmp for scratch space, so the > > "This is easily reproducible (on 6.19) with > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y > CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y" > > triggers it but on production workloads I don't think it's likely that > those source pages are backed by shmem/tmpfs or exist in the page > cache already as a large folio as the server has no control over that. /me stumbles in-thread to note that xfs gets large folios for its files' pagecache fairly frequently now, especially as readahead ramps up. Ok back to the hell that is deploying ClownStrike through a Java program while Firefox repeatedly drives my laptop to OOM. --D > I also don't think most applications use splice, though maybe I'm > wrong here. > > For non-splice, even if the user sets > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in > libfuse we do madvise on the buffer allocation for huge pages, that > has a 2 MB granularity requirement which depends on the user system > also having explicitly upped the max pages limit through the sysctl > since the kernel fuse max pages limit is 256 (1 MB) by default. I > don't think that is common on most servers. > > Thanks, > Joanne > > > with 3.18ish. > > > > One can further enforce that by setting > > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did > > when I tested the above commit. And actually that points out that > > libfuse allocations should do the madvise. I'm going to do that during > > the next days, maybe tomorrow. > > > > > > Thanks, > > Bernd > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-19 4:27 ` Darrick J. Wong @ 2026-03-20 17:24 ` Joanne Koong 0 siblings, 0 replies; 14+ messages in thread From: Joanne Koong @ 2026-03-20 17:24 UTC (permalink / raw) To: Darrick J. Wong Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer On Wed, Mar 18, 2026 at 9:27 PM Darrick J. Wong <djwong@kernel.org> wrote: > > On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote: > > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > > > Hi Joanne, > > > > > > On 3/18/26 22:19, Joanne Koong wrote: > > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > > >> > > > >> Hi Joanne, > > > >> > > > >> I wonder, would something like this help for large folios? > > > > > > > > Hi Horst, > > > > > > > > I don't think it's likely that the pages backing the userspace buffer > > > > are large folios, so I think this may actually add extra overhead with > > > > the extra folio_test_dirty() check. > > > > > > > > From what I've seen, the main cost that dwarfs everything else for > > > > writes/reads is the actual IO, the context switches, and the memcpys. > > > > I think compared to these things, the set_page_dirty_lock() cost is > > > > negligible and pretty much undetectable. > > > > > > > > > a little bit background here. We see in cpu flame graphs that the spin > > > lock taken in unlock_request() and unlock_request() takes about the same > > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not > > > AMD CPUs. Note that we are running with out custom page pinning, which > > > just takes the pages from an array, so iov_iter_get_pages2() is not used. > > > > > > The reason for that unlock/lock is documented at the end of > > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we > > > don't have that, so for now these checks are modified in our branches to > > > avoid the lock. Although that is not upstreamable. Right solution is > > > here to extract an array of pages and do that unlock/lock per pagevec. > > > > > > Next in the flame graph is setting that set_page_dirty_lock which also > > > takes as much CPU time as the memcpy. Again, Intel CPUs only. > > > In the combination with the above pagevec method, I think right solution > > > is to iterate over the pages, stores the last folio and then set to > > > dirty once per folio. > > > > Thanks for the background context. The intel vs amd difference is > > interesting. The approaches you mention sound reasonable. Are you able > > to share the flame graph or is this easily repro-able using fio on the > > passthrough_hp server? > > > > > > > Also, I disagree about that the userspace buffers are not likely large > > > folios, see commit > > > 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio) > > > with SPLICE_F_MOVE". Especially Horst persistently runs into it when > > > doing xfstests with recent kernels. I think the issue came up first time > > > > I think that's because xfstests uses /tmp for scratch space, so the > > > > "This is easily reproducible (on 6.19) with > > CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y > > CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y" > > > > triggers it but on production workloads I don't think it's likely that > > those source pages are backed by shmem/tmpfs or exist in the page > > cache already as a large folio as the server has no control over that. > > /me stumbles in-thread to note that xfs gets large folios for its files' > pagecache fairly frequently now, especially as readahead ramps up. Oh nice, I didn't realize that. Though I wonder if the pages are backed by xfs/ext4/etc, it seems like any high-performance server would just use passthrough and skip splice altogether? Thanks, Joanne > > Ok back to the hell that is deploying ClownStrike through a Java program > while Firefox repeatedly drives my laptop to OOM. > > --D > > > I also don't think most applications use splice, though maybe I'm > > wrong here. > > > > For non-splice, even if the user sets > > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in > > libfuse we do madvise on the buffer allocation for huge pages, that > > has a 2 MB granularity requirement which depends on the user system > > also having explicitly upped the max pages limit through the sysctl > > since the kernel fuse max pages limit is 256 (1 MB) by default. I > > don't think that is common on most servers. > > > > Thanks, > > Joanne > > > > > with 3.18ish. > > > > > > One can further enforce that by setting > > > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did > > > when I tested the above commit. And actually that points out that > > > libfuse allocations should do the madvise. I'm going to do that during > > > the next days, maybe tomorrow. > > > > > > > > > Thanks, > > > Bernd > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-19 1:32 ` Joanne Koong 2026-03-19 4:27 ` Darrick J. Wong @ 2026-03-19 8:32 ` Horst Birthelmer 2026-03-20 17:18 ` Joanne Koong 1 sibling, 1 reply; 14+ messages in thread From: Horst Birthelmer @ 2026-03-19 8:32 UTC (permalink / raw) To: Joanne Koong Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote: > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > Hi Joanne, > > > > On 3/18/26 22:19, Joanne Koong wrote: > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > >> > > >> Hi Joanne, > > >> > > >> I wonder, would something like this help for large folios? > > > > > > Hi Horst, > > > > > > I don't think it's likely that the pages backing the userspace buffer > > > are large folios, so I think this may actually add extra overhead with > > > the extra folio_test_dirty() check. > > > > > > From what I've seen, the main cost that dwarfs everything else for > > > writes/reads is the actual IO, the context switches, and the memcpys. > > > I think compared to these things, the set_page_dirty_lock() cost is > > > negligible and pretty much undetectable. > > > > > > a little bit background here. We see in cpu flame graphs that the spin > > lock taken in unlock_request() and unlock_request() takes about the same > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not > > AMD CPUs. Note that we are running with out custom page pinning, which > > just takes the pages from an array, so iov_iter_get_pages2() is not used. > > > > The reason for that unlock/lock is documented at the end of > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we > > don't have that, so for now these checks are modified in our branches to > > avoid the lock. Although that is not upstreamable. Right solution is > > here to extract an array of pages and do that unlock/lock per pagevec. > > > > Next in the flame graph is setting that set_page_dirty_lock which also > > takes as much CPU time as the memcpy. Again, Intel CPUs only. > > In the combination with the above pagevec method, I think right solution > > is to iterate over the pages, stores the last folio and then set to > > dirty once per folio. > > Thanks for the background context. The intel vs amd difference is > interesting. The approaches you mention sound reasonable. Are you able > to share the flame graph or is this easily repro-able using fio on the > passthrough_hp server? > > Hi Joanne, I have tried to reproduce this with passthrough_hp and I never saw it. So my answer would be something like: I don't think so. This happens even with large folios disabled. I was just trying to solve it, since I figured it will be worse with large folios. Thanks, Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-19 8:32 ` Horst Birthelmer @ 2026-03-20 17:18 ` Joanne Koong 0 siblings, 0 replies; 14+ messages in thread From: Joanne Koong @ 2026-03-20 17:18 UTC (permalink / raw) To: Horst Birthelmer Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel, Horst Birthelmer On Thu, Mar 19, 2026 at 1:32 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote: > > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote: > > > > > > Hi Joanne, > > > > > > On 3/18/26 22:19, Joanne Koong wrote: > > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote: > > > >> > > > >> Hi Joanne, > > > >> > > > >> I wonder, would something like this help for large folios? > > > > > > > > Hi Horst, > > > > > > > > I don't think it's likely that the pages backing the userspace buffer > > > > are large folios, so I think this may actually add extra overhead with > > > > the extra folio_test_dirty() check. > > > > > > > > From what I've seen, the main cost that dwarfs everything else for > > > > writes/reads is the actual IO, the context switches, and the memcpys. > > > > I think compared to these things, the set_page_dirty_lock() cost is > > > > negligible and pretty much undetectable. > > > > > > > > > a little bit background here. We see in cpu flame graphs that the spin > > > lock taken in unlock_request() and unlock_request() takes about the same > > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not > > > AMD CPUs. Note that we are running with out custom page pinning, which > > > just takes the pages from an array, so iov_iter_get_pages2() is not used. > > > > > > The reason for that unlock/lock is documented at the end of > > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we > > > don't have that, so for now these checks are modified in our branches to > > > avoid the lock. Although that is not upstreamable. Right solution is > > > here to extract an array of pages and do that unlock/lock per pagevec. > > > > > > Next in the flame graph is setting that set_page_dirty_lock which also > > > takes as much CPU time as the memcpy. Again, Intel CPUs only. > > > In the combination with the above pagevec method, I think right solution > > > is to iterate over the pages, stores the last folio and then set to > > > dirty once per folio. > > > > Thanks for the background context. The intel vs amd difference is > > interesting. The approaches you mention sound reasonable. Are you able > > to share the flame graph or is this easily repro-able using fio on the > > passthrough_hp server? > > > > > Hi Joanne, > > I have tried to reproduce this with passthrough_hp and I never saw it. > So my answer would be something like: I don't think so. > > This happens even with large folios disabled. I was just trying to > solve it, since I figured it will be worse with large folios. Thanks for the context. I haven't encountered this bottleneck myself (yet) but if you are encountering it pretty regularly, I agree with you that it definitely seems worth addressing. Thanks, Joanne > > Thanks, > Horst ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer 2026-03-16 17:29 ` Joanne Koong @ 2026-03-26 6:35 ` kernel test robot 2026-03-26 15:05 ` [LTP] " Cyril Hrubis 1 sibling, 1 reply; 14+ messages in thread From: kernel test robot @ 2026-03-26 6:35 UTC (permalink / raw) To: Horst Birthelmer Cc: oe-lkp, lkp, linux-fsdevel, ltp, Miklos Szeredi, Joanne Koong, Bernd Schubert, linux-kernel, Horst Birthelmer, oliver.sang Hello, kernel test robot noticed "RIP:fuse_iomap_writeback_range[fuse]" on: commit: 47f8dde97f35e32a1003d54e387273bcdf014ddf ("[PATCH] fuse: when copying a folio delay the mark dirty until the end") url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/fuse-when-copying-a-folio-delay-the-mark-dirty-until-the-end/20260316-234418 patch link: https://lore.kernel.org/all/20260316-mark-dirty-per-folio-v1-1-8dc39c94b7ce@ddn.com/ patch subject: [PATCH] fuse: when copying a folio delay the mark dirty until the end in testcase: ltp version: with following parameters: disk: 1HDD fs: ext4 test: fs-03 config: x86_64-rhel-9.4-ltp compiler: gcc-14 test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202603261451.d2a4cd46-lkp@intel.com kern :warn : [ 623.963830] [ T243] ------------[ cut here ]------------ kern :warn : [ 623.964688] [ T243] WARNING: fs/fuse/file.c:2025 at fuse_iomap_writeback_range+0xeb3/0x17b0 [fuse], CPU#24: 9/243 kern :warn : [ 623.966501] [ T243] Modules linked in: exfat vfat fat xfs loop ext4 mbcache jbd2 dm_mod binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac_common nfit libnvdimm amdgpu x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_atihdmi coretemp snd_hda_codec_generic snd_hda_codec_hdmi btrfs amdxcp snd_soc_avs drm_panel_backlight_quirks sd_mod kvm_intel gpu_sched snd_soc_hda_codec sg libblake2b drm_buddy snd_hda_intel snd_hda_ext_core xor drm_ttm_helper zstd_compress ttm snd_hda_codec kvm drm_exec snd_hda_core drm_suballoc_helper raid6_pq snd_soc_core snd_intel_dspcfg drm_display_helper snd_intel_sdw_acpi irqbypass snd_hwdep snd_compress ghash_clmulni_intel cec snd_pcm rapl ahci drm_client_lib intel_cstate drm_kms_helper snd_timer libahci wmi_bmof mxm_wmi intel_wmi_thunderbolt nvme snd mei_me i2c_i801 video intel_uncore libata nvme_core wdat_wdt soundcore crc16 pcspkr i2c_smbus ioatdma mei dca wmi drm fuse nfnetlink kern :warn : [ 623.977867] [ T243] CPU: 24 UID: 0 PID: 243 Comm: kworker/u144:9 Tainted: G S 7.0.0-rc4-00001-g47f8dde97f35 #1 PREEMPT(lazy) kern :warn : [ 623.979856] [ T243] Tainted: [S]=CPU_OUT_OF_SPEC kern :warn : [ 623.980727] [ T243] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021 kern :warn : [ 623.982101] [ T243] Workqueue: writeback wb_workfn (flush-7:0-fuseblk) kern :warn : [ 623.983158] [ T243] RIP: 0010:fuse_iomap_writeback_range (fs/fuse/file.c:2025 (discriminator 1) fs/fuse/file.c:2206 (discriminator 1)) fuse kern :warn : [ 623.984255] [ T243] Code: 03 80 3c 02 00 0f 85 fc 03 00 00 48 8b 44 24 18 49 89 47 08 e9 0e f3 ff ff 0f 0b e9 dd f2 ff ff 48 8b 7c 24 20 e8 4d a0 08 c4 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02 All code ======== 0: 03 80 3c 02 00 0f add 0xf00023c(%rax),%eax 6: 85 fc test %edi,%esp 8: 03 00 add (%rax),%eax a: 00 48 8b add %cl,-0x75(%rax) d: 44 24 18 rex.R and $0x18,%al 10: 49 89 47 08 mov %rax,0x8(%r15) 14: e9 0e f3 ff ff jmp 0xfffffffffffff327 19: 0f 0b ud2 1b: e9 dd f2 ff ff jmp 0xfffffffffffff2fd 20: 48 8b 7c 24 20 mov 0x20(%rsp),%rdi 25: e8 4d a0 08 c4 call 0xffffffffc408a077 2a:* 0f 0b ud2 <-- trapping instruction 2c: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 33: fc ff df 36: 4c 89 ea mov %r13,%rdx 39: 48 c1 ea 03 shr $0x3,%rdx 3d: 80 .byte 0x80 3e: 3c 02 cmp $0x2,%al Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax 9: fc ff df c: 4c 89 ea mov %r13,%rdx f: 48 c1 ea 03 shr $0x3,%rdx 13: 80 .byte 0x80 14: 3c 02 cmp $0x2,%al kern :warn : [ 623.986908] [ T243] RSP: 0018:ffffc9000131f320 EFLAGS: 00010286 kern :warn : [ 623.987958] [ T243] RAX: ffff888167fd4ea8 RBX: 0000000000000000 RCX: 1ffff1102cffa9d5 kern :warn : [ 623.989181] [ T243] RDX: ffff888167fd4ea8 RSI: 0000000000000004 RDI: ffff888167fd4f28 kern :warn : [ 623.990332] [ T243] RBP: ffff888167fd4c00 R08: 0000000000000001 R09: fffff52000263e59 kern :warn : [ 623.991500] [ T243] R10: 0000000000000003 R11: 0000000000000038 R12: 0000000000000000 kern :warn : [ 623.992678] [ T243] R13: ffffc9000131f590 R14: ffffea000aff1380 R15: ffffc9000131f588 kern :warn : [ 623.993830] [ T243] FS: 0000000000000000(0000) GS:ffff88a09e48c000(0000) knlGS:0000000000000000 kern :warn : [ 623.995069] [ T243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kern :warn : [ 623.996117] [ T243] CR2: 00007f8e6b375000 CR3: 000000209ca72001 CR4: 00000000003726f0 kern :warn : [ 623.997263] [ T243] Call Trace: kern :warn : [ 623.998054] [ T243] <TASK> kern :warn : [ 623.998796] [ T243] iomap_writeback_folio (fs/iomap/buffered-io.c:1777 fs/iomap/buffered-io.c:1895) kern :warn : [ 623.999721] [ T243] ? __pfx_iomap_writeback_folio (fs/iomap/buffered-io.c:1854) kern :warn : [ 624.000685] [ T243] ? writeback_iter (mm/page-writeback.c:2513) kern :warn : [ 624.001559] [ T243] iomap_writepages (fs/iomap/buffered-io.c:1959) kern :warn : [ 624.002410] [ T243] ? __pfx_iomap_writepages (fs/iomap/buffered-io.c:1944) kern :warn : [ 624.003325] [ T243] ? unwind_next_frame (include/linux/rcupdate.h:1193 arch/x86/kernel/unwind_orc.c:495) kern :warn : [ 624.004213] [ T243] ? ret_from_fork_asm (arch/x86/entry/entry_64.S:255) kern :warn : [ 624.005068] [ T243] fuse_writepages (fs/fuse/file.c:2276) fuse kern :warn : [ 624.005975] [ T243] ? __pfx_fuse_writepages (fs/fuse/file.c:2276) fuse kern :warn : [ 624.006916] [ T243] ? update_sg_lb_stats (kernel/sched/fair.c:10481 (discriminator 2)) kern :warn : [ 624.007771] [ T243] ? __pfx__raw_spin_lock (kernel/locking/spinlock.c:153) kern :warn : [ 624.008629] [ T243] do_writepages (mm/page-writeback.c:2558) kern :warn : [ 624.009436] [ T243] __writeback_single_inode (fs/fs-writeback.c:1759) kern :warn : [ 624.010308] [ T243] writeback_sb_inodes (fs/fs-writeback.c:2045) kern :warn : [ 624.011156] [ T243] ? __pfx_writeback_sb_inodes (fs/fs-writeback.c:1946) kern :warn : [ 624.012141] [ T243] ? __wb_calc_thresh (arch/x86/include/asm/atomic64_64.h:15 include/linux/atomic/atomic-arch-fallback.h:2583 include/linux/atomic/atomic-long.h:38 include/linux/atomic/atomic-instrumented.h:3189 mm/page-writeback.c:160 mm/page-writeback.c:912) kern :warn : [ 624.012991] [ T243] ? __pfx_down_read_trylock (kernel/locking/rwsem.c:1575) kern :warn : [ 624.013925] [ T243] ? __pfx_move_expired_inodes (fs/fs-writeback.c:1499) kern :warn : [ 624.014835] [ T243] __writeback_inodes_wb (fs/fs-writeback.c:2119) kern :warn : [ 624.015718] [ T243] wb_writeback (fs/fs-writeback.c:2229) kern :warn : [ 624.016519] [ T243] ? __pfx_wb_writeback (fs/fs-writeback.c:2172) kern :warn : [ 624.017340] [ T243] ? get_nr_dirty_inodes (fs/inode.c:95 (discriminator 1) fs/inode.c:103 (discriminator 1)) kern :warn : [ 624.018188] [ T243] wb_do_writeback (fs/fs-writeback.c:2387 (discriminator 1)) kern :warn : [ 624.018984] [ T243] ? set_worker_desc (kernel/workqueue.c:6209) kern :warn : [ 624.019821] [ T243] ? __pfx_wb_do_writeback (fs/fs-writeback.c:2367) kern :warn : [ 624.020682] [ T243] ? finish_task_switch+0x13b/0x6f0 kern :warn : [ 624.021574] [ T243] ? __switch_to (arch/x86/include/asm/cpufeature.h:101 arch/x86/kernel/process_64.c:377 arch/x86/kernel/process_64.c:665) kern :warn : [ 624.022332] [ T243] wb_workfn (fs/fs-writeback.c:2414) kern :warn : [ 624.023026] [ T243] process_one_work (arch/x86/include/asm/jump_label.h:37 include/trace/events/workqueue.h:110 kernel/workqueue.c:3281) kern :warn : [ 624.023772] [ T243] ? assign_work (kernel/workqueue.c:1219) kern :warn : [ 624.024470] [ T243] worker_thread (kernel/workqueue.c:3353 (discriminator 2) kernel/workqueue.c:3440 (discriminator 2)) kern :warn : [ 624.025199] [ T243] ? __pfx_worker_thread (kernel/workqueue.c:3386) kern :warn : [ 624.025970] [ T243] kthread (kernel/kthread.c:436) kern :warn : [ 624.026642] [ T243] ? recalc_sigpending (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:109 kernel/signal.c:181) kern :warn : [ 624.027381] [ T243] ? __pfx_kthread (kernel/kthread.c:381) kern :warn : [ 624.028081] [ T243] ret_from_fork (arch/x86/kernel/process.c:164) kern :warn : [ 624.028770] [ T243] ? __pfx_ret_from_fork (arch/x86/kernel/process.c:153) kern :warn : [ 624.029533] [ T243] ? switch_fpu (arch/x86/include/asm/bitops.h:202 (discriminator 1) arch/x86/include/asm/bitops.h:232 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 1) include/linux/thread_info.h:133 (discriminator 1) include/linux/sched.h:2064 (discriminator 1) arch/x86/include/asm/fpu/sched.h:34 (discriminator 1)) kern :warn : [ 624.030233] [ T243] ? __switch_to (arch/x86/include/asm/cpufeature.h:101 arch/x86/kernel/process_64.c:377 arch/x86/kernel/process_64.c:665) kern :warn : [ 624.030868] [ T243] ? __pfx_kthread (kernel/kthread.c:381) kern :warn : [ 624.031524] [ T243] ret_from_fork_asm (arch/x86/entry/entry_64.S:255) kern :warn : [ 624.032186] [ T243] </TASK> kern :warn : [ 624.032704] [ T243] ---[ end trace 0000000000000000 ]--- The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20260326/202603261451.d2a4cd46-lkp@intel.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [LTP] [PATCH] fuse: when copying a folio delay the mark dirty until the end 2026-03-26 6:35 ` kernel test robot @ 2026-03-26 15:05 ` Cyril Hrubis 0 siblings, 0 replies; 14+ messages in thread From: Cyril Hrubis @ 2026-03-26 15:05 UTC (permalink / raw) To: kernel test robot Cc: Horst Birthelmer, lkp, Miklos Szeredi, Bernd Schubert, linux-kernel, linux-fsdevel, Horst Birthelmer, ltp, oe-lkp, Joanne Koong Hi! > commit: 47f8dde97f35e32a1003d54e387273bcdf014ddf ("[PATCH] fuse: when copying a folio delay the mark dirty until the end") > url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/fuse-when-copying-a-folio-delay-the-mark-dirty-until-the-end/20260316-234418 > patch link: https://lore.kernel.org/all/20260316-mark-dirty-per-folio-v1-1-8dc39c94b7ce@ddn.com/ > patch subject: [PATCH] fuse: when copying a folio delay the mark dirty until the end > > in testcase: ltp > version: > with following parameters: > > disk: 1HDD > fs: ext4 > test: fs-03 Looks like the test that has failed was fs_fill that runs several threads that attempt to fill the filesystem (until they get ENOSCP) then delete the files and try again. And apparently we managed to get EIO when file has been closed() by one of the threads after a few minutes of the runtime on NTFS mounted over FUSE. -- Cyril Hrubis chrubis@suse.cz ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2026-03-26 15:05 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer 2026-03-16 17:29 ` Joanne Koong 2026-03-16 20:02 ` Horst Birthelmer 2026-03-16 22:06 ` Joanne Koong 2026-03-18 14:03 ` Horst Birthelmer 2026-03-18 21:19 ` Joanne Koong 2026-03-18 21:52 ` Bernd Schubert 2026-03-19 1:32 ` Joanne Koong 2026-03-19 4:27 ` Darrick J. Wong 2026-03-20 17:24 ` Joanne Koong 2026-03-19 8:32 ` Horst Birthelmer 2026-03-20 17:18 ` Joanne Koong 2026-03-26 6:35 ` kernel test robot 2026-03-26 15:05 ` [LTP] " Cyril Hrubis
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox