[PATCH] fuse: when copying a folio delay the mark dirty until the end

public inbox for linux-fsdevel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH] fuse: when copying a folio delay the mark dirty until the end
@ 2026-03-16 15:16 Horst Birthelmer
  2026-03-16 17:29 ` Joanne Koong
  2026-03-26  6:35 ` kernel test robot
  0 siblings, 2 replies; 14+ messages in thread
From: Horst Birthelmer @ 2026-03-16 15:16 UTC (permalink / raw)
  To: Miklos Szeredi, Joanne Koong
  Cc: Bernd Schubert, linux-fsdevel, linux-kernel, Horst Birthelmer

From: Horst Birthelmer <hbirthelmer@ddn.com>

Doing set_page_dirty_lock() for every page is inefficient
for large folios.
When copying a folio (and with large folios enabled,
this can be many pages) we can delay the marking dirty
and flush_dcache_page() until the whole folio is handled
and do it once per folio instead of once per page.

Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
---
Currently when doing a folio copy 
flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg)
are called for every page.

We can do this at the end for the whole folio.
---
 fs/fuse/dev.c        | 9 +++++++--
 fs/fuse/fuse_dev_i.h | 1 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 0b0241f47170d4640f0b8f3cae8be1f78944a456..ae96a48f898e883b4e96147f3b27398261c5e844 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -855,7 +855,7 @@ void fuse_copy_finish(struct fuse_copy_state *cs)
 			buf->len = PAGE_SIZE - cs->len;
 		cs->currbuf = NULL;
 	} else if (cs->pg) {
-		if (cs->write) {
+		if (cs->write && !cs->copy_folio) {
 			flush_dcache_page(cs->pg);
 			set_page_dirty_lock(cs->pg);
 		}
@@ -1126,6 +1126,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
 			folio_zero_range(folio, 0, size);
 	}
 
+	cs->copy_folio = true;
 	while (count) {
 		if (cs->write && cs->pipebufs && folio) {
 			/*
@@ -1167,8 +1168,12 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
 		} else
 			offset += fuse_copy_do(cs, NULL, &count);
 	}
-	if (folio && !cs->write)
+	if (folio) {
 		flush_dcache_folio(folio);
+		if (cs->write)
+			folio_mark_dirty_lock(folio);
+	}
+	cs->copy_folio = false;
 	return 0;
 }
 
diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
index 134bf44aff0d39ae8d5d47cf1518efcf2f1cfc23..4a433d902266d573ad1c19adbdd573440e2a77b2 100644
--- a/fs/fuse/fuse_dev_i.h
+++ b/fs/fuse/fuse_dev_i.h
@@ -33,6 +33,7 @@ struct fuse_copy_state {
 	unsigned int offset;
 	bool write:1;
 	bool move_folios:1;
+	bool copy_folio:1;
 	bool is_uring:1;
 	struct {
 		unsigned int copied_sz; /* copied size into the user buffer */

---
base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
change-id: 20260316-mark-dirty-per-folio-be87b6b4bf56

Best regards,
-- 
Horst Birthelmer <hbirthelmer@ddn.com>


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer
@ 2026-03-16 17:29 ` Joanne Koong
  2026-03-16 20:02   ` Horst Birthelmer
  2026-03-26  6:35 ` kernel test robot
  1 sibling, 1 reply; 14+ messages in thread
From: Joanne Koong @ 2026-03-16 17:29 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: Miklos Szeredi, Bernd Schubert, linux-fsdevel, linux-kernel,
	Horst Birthelmer

On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote:
>
> From: Horst Birthelmer <hbirthelmer@ddn.com>
>
> Doing set_page_dirty_lock() for every page is inefficient
> for large folios.
> When copying a folio (and with large folios enabled,
> this can be many pages) we can delay the marking dirty
> and flush_dcache_page() until the whole folio is handled
> and do it once per folio instead of once per page.
>
> Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
> ---
> Currently when doing a folio copy
> flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg)
> are called for every page.
>
> We can do this at the end for the whole folio.

Hi Horst,

I think these are two different entities. cs->pg is the page that
corresponds to the userspace buffer / pipe while the (large) folio
corresponds to the pages in the page cache. flush_dcache_folio(folio)
and flush_dcache_page(cs->pg) are not interchangeable (I don't think
it's likely either that the pages backing the userspace buffer/pipe
are large folios).

Thanks,
Joanne

> ---
>  fs/fuse/dev.c        | 9 +++++++--
>  fs/fuse/fuse_dev_i.h | 1 +
>  2 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
> index 0b0241f47170d4640f0b8f3cae8be1f78944a456..ae96a48f898e883b4e96147f3b27398261c5e844 100644
> --- a/fs/fuse/dev.c
> +++ b/fs/fuse/dev.c
> @@ -855,7 +855,7 @@ void fuse_copy_finish(struct fuse_copy_state *cs)
>                         buf->len = PAGE_SIZE - cs->len;
>                 cs->currbuf = NULL;
>         } else if (cs->pg) {
> -               if (cs->write) {
> +               if (cs->write && !cs->copy_folio) {
>                         flush_dcache_page(cs->pg);
>                         set_page_dirty_lock(cs->pg);
>                 }
> @@ -1126,6 +1126,7 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
>                         folio_zero_range(folio, 0, size);
>         }
>
> +       cs->copy_folio = true;
>         while (count) {
>                 if (cs->write && cs->pipebufs && folio) {
>                         /*
> @@ -1167,8 +1168,12 @@ static int fuse_copy_folio(struct fuse_copy_state *cs, struct folio **foliop,
>                 } else
>                         offset += fuse_copy_do(cs, NULL, &count);
>         }
> -       if (folio && !cs->write)
> +       if (folio) {
>                 flush_dcache_folio(folio);
> +               if (cs->write)
> +                       folio_mark_dirty_lock(folio);
> +       }
> +       cs->copy_folio = false;
>         return 0;
>  }
>
> diff --git a/fs/fuse/fuse_dev_i.h b/fs/fuse/fuse_dev_i.h
> index 134bf44aff0d39ae8d5d47cf1518efcf2f1cfc23..4a433d902266d573ad1c19adbdd573440e2a77b2 100644
> --- a/fs/fuse/fuse_dev_i.h
> +++ b/fs/fuse/fuse_dev_i.h
> @@ -33,6 +33,7 @@ struct fuse_copy_state {
>         unsigned int offset;
>         bool write:1;
>         bool move_folios:1;
> +       bool copy_folio:1;
>         bool is_uring:1;
>         struct {
>                 unsigned int copied_sz; /* copied size into the user buffer */
>
> ---
> base-commit: f338e77383789c0cae23ca3d48adcc5e9e137e3c
> change-id: 20260316-mark-dirty-per-folio-be87b6b4bf56
>
> Best regards,
> --
> Horst Birthelmer <hbirthelmer@ddn.com>
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-16 17:29 ` Joanne Koong
@ 2026-03-16 20:02   ` Horst Birthelmer
  2026-03-16 22:06     ` Joanne Koong
  0 siblings, 1 reply; 14+ messages in thread
From: Horst Birthelmer @ 2026-03-16 20:02 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Mon, Mar 16, 2026 at 10:29:52AM -0700, Joanne Koong wrote:
> On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote:
> >
> > From: Horst Birthelmer <hbirthelmer@ddn.com>
> >
> > Doing set_page_dirty_lock() for every page is inefficient
> > for large folios.
> > When copying a folio (and with large folios enabled,
> > this can be many pages) we can delay the marking dirty
> > and flush_dcache_page() until the whole folio is handled
> > and do it once per folio instead of once per page.
> >
> > Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
> > ---
> > Currently when doing a folio copy
> > flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg)
> > are called for every page.
> >
> > We can do this at the end for the whole folio.
> 
> Hi Horst,
> 
> I think these are two different entities. cs->pg is the page that
> corresponds to the userspace buffer / pipe while the (large) folio
> corresponds to the pages in the page cache. flush_dcache_folio(folio)
> and flush_dcache_page(cs->pg) are not interchangeable (I don't think
> it's likely either that the pages backing the userspace buffer/pipe
> are large folios).
> 
> Thanks,
> Joanne

Hi Joanne,

I feel a bit embarassed ... but you are completely right.
I was interested in solving this case:

  fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages()
    fuse_copy_init(&cs, true, &iter)  ← cs->write = TRUE
    fuse_copy_args(&cs, num_args, args->in_pages, ...)
      if (args->in_pages)
        fuse_copy_folios(cs, arg->size, 0)
          fuse_copy_folio(cs, &ap->folios[i], ...)

when we have large folios

But those are not the same.

Thanks,
Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-16 20:02   ` Horst Birthelmer
@ 2026-03-16 22:06     ` Joanne Koong
  2026-03-18 14:03       ` Horst Birthelmer
  0 siblings, 1 reply; 14+ messages in thread
From: Joanne Koong @ 2026-03-16 22:06 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Mon, Mar 16, 2026 at 1:02 PM Horst Birthelmer <horst@birthelmer.de> wrote:
>
> On Mon, Mar 16, 2026 at 10:29:52AM -0700, Joanne Koong wrote:
> > On Mon, Mar 16, 2026 at 8:16 AM Horst Birthelmer <horst@birthelmer.com> wrote:
> > >
> > > From: Horst Birthelmer <hbirthelmer@ddn.com>
> > >
> > > Doing set_page_dirty_lock() for every page is inefficient
> > > for large folios.
> > > When copying a folio (and with large folios enabled,
> > > this can be many pages) we can delay the marking dirty
> > > and flush_dcache_page() until the whole folio is handled
> > > and do it once per folio instead of once per page.
> > >
> > > Signed-off-by: Horst Birthelmer <hbirthelmer@ddn.com>
> > > ---
> > > Currently when doing a folio copy
> > > flush_dcache_page(cs->pg) and set_page_dirty_lock(cs->pg)
> > > are called for every page.
> > >
> > > We can do this at the end for the whole folio.
> >
> > Hi Horst,
> >
> > I think these are two different entities. cs->pg is the page that
> > corresponds to the userspace buffer / pipe while the (large) folio
> > corresponds to the pages in the page cache. flush_dcache_folio(folio)
> > and flush_dcache_page(cs->pg) are not interchangeable (I don't think
> > it's likely either that the pages backing the userspace buffer/pipe
> > are large folios).
> >
> > Thanks,
> > Joanne
>
> Hi Joanne,
>
> I feel a bit embarassed ... but you are completely right.
> I was interested in solving this case:
>
>   fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages()
>     fuse_copy_init(&cs, true, &iter)  ← cs->write = TRUE
>     fuse_copy_args(&cs, num_args, args->in_pages, ...)
>       if (args->in_pages)
>         fuse_copy_folios(cs, arg->size, 0)
>           fuse_copy_folio(cs, &ap->folios[i], ...)
>
> when we have large folios

No worries, the naming doesn't make the distinction obvious at all.
For copying out large folios right now, the copy is still page by page
due to extracting 1 userspace buffer page at a time (eg the
iov_iter_get_pages2(... PAGE_SIZE, 1, ...) call in fuse_copy_fill()).
If we pass in a pages array, iov_iter_getpages2 is able to extract
multiple pages at a time and save extra overhead with the GUP setup /
irq save+restore / pagetable walk and the extra req->waitq
locking/unlocking calls, but when I benchmarked it last year I didn't
see any noticeable performance improvements from doing this. The extra
complexity didn't seem worth it. For optimized copying, I think in the
future high-performance servers will mostly just use fuse-over-iouring
zero-copy.

Thanks,
Joanne

>
> But those are not the same.
>
> Thanks,
> Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-16 22:06     ` Joanne Koong
@ 2026-03-18 14:03       ` Horst Birthelmer
  2026-03-18 21:19         ` Joanne Koong
  0 siblings, 1 reply; 14+ messages in thread
From: Horst Birthelmer @ 2026-03-18 14:03 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Mon, Mar 16, 2026 at 03:06:02PM -0700, Joanne Koong wrote:
> On Mon, Mar 16, 2026 at 1:02 PM Horst Birthelmer <horst@birthelmer.de> wrote:
> >
> > >
> > > Hi Horst,
> > >
> > > I think these are two different entities. cs->pg is the page that
> > > corresponds to the userspace buffer / pipe while the (large) folio
> > > corresponds to the pages in the page cache. flush_dcache_folio(folio)
> > > and flush_dcache_page(cs->pg) are not interchangeable (I don't think
> > > it's likely either that the pages backing the userspace buffer/pipe
> > > are large folios).
> > >
> > > Thanks,
> > > Joanne
> >
> > Hi Joanne,
> >
> > I feel a bit embarassed ... but you are completely right.
> > I was interested in solving this case:
> >
> >   fuse_uring_args_to_ring() or fuse_uring_args_to_ring_pages()
> >     fuse_copy_init(&cs, true, &iter)  ← cs->write = TRUE
> >     fuse_copy_args(&cs, num_args, args->in_pages, ...)
> >       if (args->in_pages)
> >         fuse_copy_folios(cs, arg->size, 0)
> >           fuse_copy_folio(cs, &ap->folios[i], ...)
> >
> > when we have large folios
> 
> No worries, the naming doesn't make the distinction obvious at all.
> For copying out large folios right now, the copy is still page by page
> due to extracting 1 userspace buffer page at a time (eg the
> iov_iter_get_pages2(... PAGE_SIZE, 1, ...) call in fuse_copy_fill()).
> If we pass in a pages array, iov_iter_getpages2 is able to extract
> multiple pages at a time and save extra overhead with the GUP setup /
> irq save+restore / pagetable walk and the extra req->waitq
> locking/unlocking calls, but when I benchmarked it last year I didn't
> see any noticeable performance improvements from doing this. The extra
> complexity didn't seem worth it. For optimized copying, I think in the
> future high-performance servers will mostly just use fuse-over-iouring
> zero-copy.
> 
> Thanks,
> Joanne
> 

Hi Joanne,

I wonder, would something like this help for large folios?

@@ -856,8 +856,11 @@ void fuse_copy_finish(struct fuse_copy_state *cs)
                cs->currbuf = NULL;
        } else if (cs->pg) {
                if (cs->write) {
+                       struct folio *folio = page_folio(cs->pg);
+
                        flush_dcache_page(cs->pg);
-                       set_page_dirty_lock(cs->pg);
+                       if (!folio_test_dirty(folio))
+                               set_page_dirty_lock(cs->pg);
                }
                put_page(cs->pg);
        }

Do you have seen any problems with spin locks being way too costly while
doing writes?

That was actually why I started looking into this.

Thanks,
Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-18 14:03       ` Horst Birthelmer
@ 2026-03-18 21:19         ` Joanne Koong
  2026-03-18 21:52           ` Bernd Schubert
  0 siblings, 1 reply; 14+ messages in thread
From: Joanne Koong @ 2026-03-18 21:19 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: Horst Birthelmer, Miklos Szeredi, Bernd Schubert, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
>
> Hi Joanne,
>
> I wonder, would something like this help for large folios?

Hi Horst,

I don't think it's likely that the pages backing the userspace buffer
are large folios, so I think this may actually add extra overhead with
the extra folio_test_dirty() check.

From what I've seen, the main cost that dwarfs everything else for
writes/reads is the actual IO, the context switches, and the memcpys.
I think compared to these things, the set_page_dirty_lock() cost is
negligible and pretty much undetectable.

Thanks,
Joanne

>
> @@ -856,8 +856,11 @@ void fuse_copy_finish(struct fuse_copy_state *cs)
>                 cs->currbuf = NULL;
>         } else if (cs->pg) {
>                 if (cs->write) {
> +                       struct folio *folio = page_folio(cs->pg);
> +
>                         flush_dcache_page(cs->pg);
> -                       set_page_dirty_lock(cs->pg);
> +                       if (!folio_test_dirty(folio))
> +                               set_page_dirty_lock(cs->pg);
>                 }
>                 put_page(cs->pg);
>         }
>
> Do you have seen any problems with spin locks being way too costly while
> doing writes?
>
> That was actually why I started looking into this.
>
> Thanks,
> Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-18 21:19         ` Joanne Koong
@ 2026-03-18 21:52           ` Bernd Schubert
  2026-03-19  1:32             ` Joanne Koong
  0 siblings, 1 reply; 14+ messages in thread
From: Bernd Schubert @ 2026-03-18 21:52 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel,
	Horst Birthelmer

Hi Joanne,

On 3/18/26 22:19, Joanne Koong wrote:
> On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
>>
>> Hi Joanne,
>>
>> I wonder, would something like this help for large folios?
> 
> Hi Horst,
> 
> I don't think it's likely that the pages backing the userspace buffer
> are large folios, so I think this may actually add extra overhead with
> the extra folio_test_dirty() check.
> 
> From what I've seen, the main cost that dwarfs everything else for
> writes/reads is the actual IO, the context switches, and the memcpys.
> I think compared to these things, the set_page_dirty_lock() cost is
> negligible and pretty much undetectable.

a little bit background here. We see in cpu flame graphs that the spin
lock taken in unlock_request() and unlock_request() takes about the same
amount of CPU time as the memcpy. Interestingly, only on Intel, but not
AMD CPUs. Note that we are running with out custom page pinning, which
just takes the pages from an array, so iov_iter_get_pages2() is not used.

The reason for that unlock/lock is documented at the end of
Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
don't have that, so for now these checks are modified in our branches to
avoid the lock. Although that is not upstreamable. Right solution is
here to extract an array of pages and do that unlock/lock per pagevec.

Next in the flame graph is setting that set_page_dirty_lock which also
takes as much CPU time as the memcpy. Again, Intel CPUs only.
In the combination with the above pagevec method, I think right solution
is to iterate over the pages, stores the last folio and then set to
dirty once per folio.
Also, I disagree about that the userspace buffers are not likely large
folios, see commit
59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio)
with SPLICE_F_MOVE". Especially Horst persistently runs into it when
doing xfstests with recent kernels. I think the issue came up first time
with 3.18ish.

One can further enforce that by setting
"/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did
when I tested the above commit. And actually that points out that
libfuse allocations should do the madvise. I'm going to do that during
the next days, maybe tomorrow.

Thanks,
Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-18 21:52           ` Bernd Schubert
@ 2026-03-19  1:32             ` Joanne Koong
  2026-03-19  4:27               ` Darrick J. Wong
  2026-03-19  8:32               ` Horst Birthelmer
  0 siblings, 2 replies; 14+ messages in thread
From: Joanne Koong @ 2026-03-19  1:32 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Horst Birthelmer, Miklos Szeredi, linux-fsdevel, linux-kernel,
	Horst Birthelmer

On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote:
>
> Hi Joanne,
>
> On 3/18/26 22:19, Joanne Koong wrote:
> > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
> >>
> >> Hi Joanne,
> >>
> >> I wonder, would something like this help for large folios?
> >
> > Hi Horst,
> >
> > I don't think it's likely that the pages backing the userspace buffer
> > are large folios, so I think this may actually add extra overhead with
> > the extra folio_test_dirty() check.
> >
> > From what I've seen, the main cost that dwarfs everything else for
> > writes/reads is the actual IO, the context switches, and the memcpys.
> > I think compared to these things, the set_page_dirty_lock() cost is
> > negligible and pretty much undetectable.
>
>
> a little bit background here. We see in cpu flame graphs that the spin
> lock taken in unlock_request() and unlock_request() takes about the same
> amount of CPU time as the memcpy. Interestingly, only on Intel, but not
> AMD CPUs. Note that we are running with out custom page pinning, which
> just takes the pages from an array, so iov_iter_get_pages2() is not used.
>
> The reason for that unlock/lock is documented at the end of
> Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
> don't have that, so for now these checks are modified in our branches to
> avoid the lock. Although that is not upstreamable. Right solution is
> here to extract an array of pages and do that unlock/lock per pagevec.
>
> Next in the flame graph is setting that set_page_dirty_lock which also
> takes as much CPU time as the memcpy. Again, Intel CPUs only.
> In the combination with the above pagevec method, I think right solution
> is to iterate over the pages, stores the last folio and then set to
> dirty once per folio.

Thanks for the background context. The intel vs amd difference is
interesting. The approaches you mention sound reasonable. Are you able
to share the flame graph or is this easily repro-able using fio on the
passthrough_hp server?


> Also, I disagree about that the userspace buffers are not likely large
> folios, see commit
> 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio)
> with SPLICE_F_MOVE". Especially Horst persistently runs into it when
> doing xfstests with recent kernels. I think the issue came up first time

I think that's because xfstests uses /tmp for scratch space, so the

    "This is easily reproducible (on 6.19) with
    CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y
    CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y"

triggers it but on production workloads I don't think it's likely that
those source pages are backed by shmem/tmpfs or exist in the page
cache already as a large folio as the server has no control over that.
I also don't think most applications use splice, though maybe I'm
wrong here.

For non-splice, even if the user sets
"/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in
libfuse we do madvise on the buffer allocation for huge pages, that
has a 2 MB granularity requirement which depends on the user system
also having explicitly upped the max pages limit through the sysctl
since the kernel fuse max pages limit is 256 (1 MB) by default. I
don't think that is common on most servers.

Thanks,
Joanne

> with 3.18ish.
>
> One can further enforce that by setting
> "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did
> when I tested the above commit. And actually that points out that
> libfuse allocations should do the madvise. I'm going to do that during
> the next days, maybe tomorrow.
>
>
> Thanks,
> Bernd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-19  1:32             ` Joanne Koong
@ 2026-03-19  4:27               ` Darrick J. Wong
  2026-03-20 17:24                 ` Joanne Koong
  2026-03-19  8:32               ` Horst Birthelmer
  1 sibling, 1 reply; 14+ messages in thread
From: Darrick J. Wong @ 2026-03-19  4:27 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote:
> On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote:
> >
> > Hi Joanne,
> >
> > On 3/18/26 22:19, Joanne Koong wrote:
> > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
> > >>
> > >> Hi Joanne,
> > >>
> > >> I wonder, would something like this help for large folios?
> > >
> > > Hi Horst,
> > >
> > > I don't think it's likely that the pages backing the userspace buffer
> > > are large folios, so I think this may actually add extra overhead with
> > > the extra folio_test_dirty() check.
> > >
> > > From what I've seen, the main cost that dwarfs everything else for
> > > writes/reads is the actual IO, the context switches, and the memcpys.
> > > I think compared to these things, the set_page_dirty_lock() cost is
> > > negligible and pretty much undetectable.
> >
> >
> > a little bit background here. We see in cpu flame graphs that the spin
> > lock taken in unlock_request() and unlock_request() takes about the same
> > amount of CPU time as the memcpy. Interestingly, only on Intel, but not
> > AMD CPUs. Note that we are running with out custom page pinning, which
> > just takes the pages from an array, so iov_iter_get_pages2() is not used.
> >
> > The reason for that unlock/lock is documented at the end of
> > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
> > don't have that, so for now these checks are modified in our branches to
> > avoid the lock. Although that is not upstreamable. Right solution is
> > here to extract an array of pages and do that unlock/lock per pagevec.
> >
> > Next in the flame graph is setting that set_page_dirty_lock which also
> > takes as much CPU time as the memcpy. Again, Intel CPUs only.
> > In the combination with the above pagevec method, I think right solution
> > is to iterate over the pages, stores the last folio and then set to
> > dirty once per folio.
> 
> Thanks for the background context. The intel vs amd difference is
> interesting. The approaches you mention sound reasonable. Are you able
> to share the flame graph or is this easily repro-able using fio on the
> passthrough_hp server?
> 
> 
> > Also, I disagree about that the userspace buffers are not likely large
> > folios, see commit
> > 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio)
> > with SPLICE_F_MOVE". Especially Horst persistently runs into it when
> > doing xfstests with recent kernels. I think the issue came up first time
> 
> I think that's because xfstests uses /tmp for scratch space, so the
> 
>     "This is easily reproducible (on 6.19) with
>     CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y
>     CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y"
> 
> triggers it but on production workloads I don't think it's likely that
> those source pages are backed by shmem/tmpfs or exist in the page
> cache already as a large folio as the server has no control over that.

/me stumbles in-thread to note that xfs gets large folios for its files'
pagecache fairly frequently now, especially as readahead ramps up.

Ok back to the hell that is deploying ClownStrike through a Java program
while Firefox repeatedly drives my laptop to OOM.

--D

> I also don't think most applications use splice, though maybe I'm
> wrong here.
> 
> For non-splice, even if the user sets
> "/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in
> libfuse we do madvise on the buffer allocation for huge pages, that
> has a 2 MB granularity requirement which depends on the user system
> also having explicitly upped the max pages limit through the sysctl
> since the kernel fuse max pages limit is 256 (1 MB) by default. I
> don't think that is common on most servers.
> 
> Thanks,
> Joanne
> 
> > with 3.18ish.
> >
> > One can further enforce that by setting
> > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did
> > when I tested the above commit. And actually that points out that
> > libfuse allocations should do the madvise. I'm going to do that during
> > the next days, maybe tomorrow.
> >
> >
> > Thanks,
> > Bernd
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-19  4:27               ` Darrick J. Wong
@ 2026-03-20 17:24                 ` Joanne Koong
  0 siblings, 0 replies; 14+ messages in thread
From: Joanne Koong @ 2026-03-20 17:24 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Wed, Mar 18, 2026 at 9:27 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote:
> > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote:
> > >
> > > Hi Joanne,
> > >
> > > On 3/18/26 22:19, Joanne Koong wrote:
> > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
> > > >>
> > > >> Hi Joanne,
> > > >>
> > > >> I wonder, would something like this help for large folios?
> > > >
> > > > Hi Horst,
> > > >
> > > > I don't think it's likely that the pages backing the userspace buffer
> > > > are large folios, so I think this may actually add extra overhead with
> > > > the extra folio_test_dirty() check.
> > > >
> > > > From what I've seen, the main cost that dwarfs everything else for
> > > > writes/reads is the actual IO, the context switches, and the memcpys.
> > > > I think compared to these things, the set_page_dirty_lock() cost is
> > > > negligible and pretty much undetectable.
> > >
> > >
> > > a little bit background here. We see in cpu flame graphs that the spin
> > > lock taken in unlock_request() and unlock_request() takes about the same
> > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not
> > > AMD CPUs. Note that we are running with out custom page pinning, which
> > > just takes the pages from an array, so iov_iter_get_pages2() is not used.
> > >
> > > The reason for that unlock/lock is documented at the end of
> > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
> > > don't have that, so for now these checks are modified in our branches to
> > > avoid the lock. Although that is not upstreamable. Right solution is
> > > here to extract an array of pages and do that unlock/lock per pagevec.
> > >
> > > Next in the flame graph is setting that set_page_dirty_lock which also
> > > takes as much CPU time as the memcpy. Again, Intel CPUs only.
> > > In the combination with the above pagevec method, I think right solution
> > > is to iterate over the pages, stores the last folio and then set to
> > > dirty once per folio.
> >
> > Thanks for the background context. The intel vs amd difference is
> > interesting. The approaches you mention sound reasonable. Are you able
> > to share the flame graph or is this easily repro-able using fio on the
> > passthrough_hp server?
> >
> >
> > > Also, I disagree about that the userspace buffers are not likely large
> > > folios, see commit
> > > 59ba47b6be9cd0146ef9a55c6e32e337e11e7625 "fuse: Check for large folio)
> > > with SPLICE_F_MOVE". Especially Horst persistently runs into it when
> > > doing xfstests with recent kernels. I think the issue came up first time
> >
> > I think that's because xfstests uses /tmp for scratch space, so the
> >
> >     "This is easily reproducible (on 6.19) with
> >     CONFIG_TRANSPARENT_HUGEPAGE_SHMEM_HUGE_ALWAYS=y
> >     CONFIG_TRANSPARENT_HUGEPAGE_TMPFS_HUGE_ALWAYS=y"
> >
> > triggers it but on production workloads I don't think it's likely that
> > those source pages are backed by shmem/tmpfs or exist in the page
> > cache already as a large folio as the server has no control over that.
>
> /me stumbles in-thread to note that xfs gets large folios for its files'
> pagecache fairly frequently now, especially as readahead ramps up.

Oh nice, I didn't realize that. Though I wonder if the pages are
backed by xfs/ext4/etc, it seems like any high-performance server
would just use passthrough and skip splice altogether?

Thanks,
Joanne
>
> Ok back to the hell that is deploying ClownStrike through a Java program
> while Firefox repeatedly drives my laptop to OOM.
>
> --D
>
> > I also don't think most applications use splice, though maybe I'm
> > wrong here.
> >
> > For non-splice, even if the user sets
> > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always' or in
> > libfuse we do madvise on the buffer allocation for huge pages, that
> > has a 2 MB granularity requirement which depends on the user system
> > also having explicitly upped the max pages limit through the sysctl
> > since the kernel fuse max pages limit is 256 (1 MB) by default. I
> > don't think that is common on most servers.
> >
> > Thanks,
> > Joanne
> >
> > > with 3.18ish.
> > >
> > > One can further enforce that by setting
> > > "/sys/kernel/mm/transparent_hugepage/enabled" to 'always', what I did
> > > when I tested the above commit. And actually that points out that
> > > libfuse allocations should do the madvise. I'm going to do that during
> > > the next days, maybe tomorrow.
> > >
> > >
> > > Thanks,
> > > Bernd
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-19  1:32             ` Joanne Koong
  2026-03-19  4:27               ` Darrick J. Wong
@ 2026-03-19  8:32               ` Horst Birthelmer
  2026-03-20 17:18                 ` Joanne Koong
  1 sibling, 1 reply; 14+ messages in thread
From: Horst Birthelmer @ 2026-03-19  8:32 UTC (permalink / raw)
  To: Joanne Koong
  Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote:
> On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote:
> >
> > Hi Joanne,
> >
> > On 3/18/26 22:19, Joanne Koong wrote:
> > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
> > >>
> > >> Hi Joanne,
> > >>
> > >> I wonder, would something like this help for large folios?
> > >
> > > Hi Horst,
> > >
> > > I don't think it's likely that the pages backing the userspace buffer
> > > are large folios, so I think this may actually add extra overhead with
> > > the extra folio_test_dirty() check.
> > >
> > > From what I've seen, the main cost that dwarfs everything else for
> > > writes/reads is the actual IO, the context switches, and the memcpys.
> > > I think compared to these things, the set_page_dirty_lock() cost is
> > > negligible and pretty much undetectable.
> >
> >
> > a little bit background here. We see in cpu flame graphs that the spin
> > lock taken in unlock_request() and unlock_request() takes about the same
> > amount of CPU time as the memcpy. Interestingly, only on Intel, but not
> > AMD CPUs. Note that we are running with out custom page pinning, which
> > just takes the pages from an array, so iov_iter_get_pages2() is not used.
> >
> > The reason for that unlock/lock is documented at the end of
> > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
> > don't have that, so for now these checks are modified in our branches to
> > avoid the lock. Although that is not upstreamable. Right solution is
> > here to extract an array of pages and do that unlock/lock per pagevec.
> >
> > Next in the flame graph is setting that set_page_dirty_lock which also
> > takes as much CPU time as the memcpy. Again, Intel CPUs only.
> > In the combination with the above pagevec method, I think right solution
> > is to iterate over the pages, stores the last folio and then set to
> > dirty once per folio.
> 
> Thanks for the background context. The intel vs amd difference is
> interesting. The approaches you mention sound reasonable. Are you able
> to share the flame graph or is this easily repro-able using fio on the
> passthrough_hp server?
> 
> 
Hi Joanne,

I have tried to reproduce this with passthrough_hp and I never saw it.
So my answer would be something like: I don't think so.

This happens even with large folios disabled. I was just trying to
solve it, since I figured it will be worse with large folios.

Thanks,
Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-19  8:32               ` Horst Birthelmer
@ 2026-03-20 17:18                 ` Joanne Koong
  0 siblings, 0 replies; 14+ messages in thread
From: Joanne Koong @ 2026-03-20 17:18 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: Bernd Schubert, Horst Birthelmer, Miklos Szeredi, linux-fsdevel,
	linux-kernel, Horst Birthelmer

On Thu, Mar 19, 2026 at 1:32 AM Horst Birthelmer <horst@birthelmer.de> wrote:
>
> On Wed, Mar 18, 2026 at 06:32:25PM -0700, Joanne Koong wrote:
> > On Wed, Mar 18, 2026 at 2:52 PM Bernd Schubert <bernd@bsbernd.com> wrote:
> > >
> > > Hi Joanne,
> > >
> > > On 3/18/26 22:19, Joanne Koong wrote:
> > > > On Wed, Mar 18, 2026 at 7:03 AM Horst Birthelmer <horst@birthelmer.de> wrote:
> > > >>
> > > >> Hi Joanne,
> > > >>
> > > >> I wonder, would something like this help for large folios?
> > > >
> > > > Hi Horst,
> > > >
> > > > I don't think it's likely that the pages backing the userspace buffer
> > > > are large folios, so I think this may actually add extra overhead with
> > > > the extra folio_test_dirty() check.
> > > >
> > > > From what I've seen, the main cost that dwarfs everything else for
> > > > writes/reads is the actual IO, the context switches, and the memcpys.
> > > > I think compared to these things, the set_page_dirty_lock() cost is
> > > > negligible and pretty much undetectable.
> > >
> > >
> > > a little bit background here. We see in cpu flame graphs that the spin
> > > lock taken in unlock_request() and unlock_request() takes about the same
> > > amount of CPU time as the memcpy. Interestingly, only on Intel, but not
> > > AMD CPUs. Note that we are running with out custom page pinning, which
> > > just takes the pages from an array, so iov_iter_get_pages2() is not used.
> > >
> > > The reason for that unlock/lock is documented at the end of
> > > Documentation/filesystems/fuse/fuse.rst as Kamikaze file system. Well we
> > > don't have that, so for now these checks are modified in our branches to
> > > avoid the lock. Although that is not upstreamable. Right solution is
> > > here to extract an array of pages and do that unlock/lock per pagevec.
> > >
> > > Next in the flame graph is setting that set_page_dirty_lock which also
> > > takes as much CPU time as the memcpy. Again, Intel CPUs only.
> > > In the combination with the above pagevec method, I think right solution
> > > is to iterate over the pages, stores the last folio and then set to
> > > dirty once per folio.
> >
> > Thanks for the background context. The intel vs amd difference is
> > interesting. The approaches you mention sound reasonable. Are you able
> > to share the flame graph or is this easily repro-able using fio on the
> > passthrough_hp server?
> >
> >
> Hi Joanne,
>
> I have tried to reproduce this with passthrough_hp and I never saw it.
> So my answer would be something like: I don't think so.
>
> This happens even with large folios disabled. I was just trying to
> solve it, since I figured it will be worse with large folios.

Thanks for the context. I haven't encountered this bottleneck myself
(yet) but if you are encountering it pretty regularly, I agree with
you that it definitely seems worth addressing.

Thanks,
Joanne

>
> Thanks,
> Horst

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer
  2026-03-16 17:29 ` Joanne Koong
@ 2026-03-26  6:35 ` kernel test robot
  2026-03-26 15:05   ` [LTP] " Cyril Hrubis
  1 sibling, 1 reply; 14+ messages in thread
From: kernel test robot @ 2026-03-26  6:35 UTC (permalink / raw)
  To: Horst Birthelmer
  Cc: oe-lkp, lkp, linux-fsdevel, ltp, Miklos Szeredi, Joanne Koong,
	Bernd Schubert, linux-kernel, Horst Birthelmer, oliver.sang



Hello,

kernel test robot noticed "RIP:fuse_iomap_writeback_range[fuse]" on:

commit: 47f8dde97f35e32a1003d54e387273bcdf014ddf ("[PATCH] fuse: when copying a folio delay the mark dirty until the end")
url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/fuse-when-copying-a-folio-delay-the-mark-dirty-until-the-end/20260316-234418
patch link: https://lore.kernel.org/all/20260316-mark-dirty-per-folio-v1-1-8dc39c94b7ce@ddn.com/
patch subject: [PATCH] fuse: when copying a folio delay the mark dirty until the end

in testcase: ltp
version: 
with following parameters:

	disk: 1HDD
	fs: ext4
	test: fs-03


config: x86_64-rhel-9.4-ltp
compiler: gcc-14
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202603261451.d2a4cd46-lkp@intel.com


kern  :warn  : [  623.963830] [    T243] ------------[ cut here ]------------
kern  :warn  : [  623.964688] [    T243] WARNING: fs/fuse/file.c:2025 at fuse_iomap_writeback_range+0xeb3/0x17b0 [fuse], CPU#24: 9/243
kern  :warn  : [  623.966501] [    T243] Modules linked in: exfat vfat fat xfs loop ext4 mbcache jbd2 dm_mod binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac_common nfit libnvdimm amdgpu x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_alc882 snd_hda_codec_realtek_lib snd_hda_codec_atihdmi coretemp snd_hda_codec_generic snd_hda_codec_hdmi btrfs amdxcp snd_soc_avs drm_panel_backlight_quirks sd_mod kvm_intel gpu_sched snd_soc_hda_codec sg libblake2b drm_buddy snd_hda_intel snd_hda_ext_core xor drm_ttm_helper zstd_compress ttm snd_hda_codec kvm drm_exec snd_hda_core drm_suballoc_helper raid6_pq snd_soc_core snd_intel_dspcfg drm_display_helper snd_intel_sdw_acpi irqbypass snd_hwdep snd_compress ghash_clmulni_intel cec snd_pcm rapl ahci drm_client_lib intel_cstate drm_kms_helper snd_timer libahci wmi_bmof mxm_wmi intel_wmi_thunderbolt nvme snd mei_me i2c_i801 video intel_uncore libata nvme_core wdat_wdt soundcore crc16 pcspkr i2c_smbus ioatdma mei dca wmi drm fuse nfnetlink
kern  :warn  : [  623.977867] [    T243] CPU: 24 UID: 0 PID: 243 Comm: kworker/u144:9 Tainted: G S                  7.0.0-rc4-00001-g47f8dde97f35 #1 PREEMPT(lazy)
kern  :warn  : [  623.979856] [    T243] Tainted: [S]=CPU_OUT_OF_SPEC
kern  :warn  : [  623.980727] [    T243] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
kern  :warn  : [  623.982101] [    T243] Workqueue: writeback wb_workfn (flush-7:0-fuseblk)
kern  :warn  : [  623.983158] [    T243] RIP: 0010:fuse_iomap_writeback_range (fs/fuse/file.c:2025 (discriminator 1) fs/fuse/file.c:2206 (discriminator 1)) fuse
kern  :warn  : [  623.984255] [    T243] Code: 03 80 3c 02 00 0f 85 fc 03 00 00 48 8b 44 24 18 49 89 47 08 e9 0e f3 ff ff 0f 0b e9 dd f2 ff ff 48 8b 7c 24 20 e8 4d a0 08 c4 <0f> 0b 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 80 3c 02
All code
========
   0:   03 80 3c 02 00 0f       add    0xf00023c(%rax),%eax
   6:   85 fc                   test   %edi,%esp
   8:   03 00                   add    (%rax),%eax
   a:   00 48 8b                add    %cl,-0x75(%rax)
   d:   44 24 18                rex.R and $0x18,%al
  10:   49 89 47 08             mov    %rax,0x8(%r15)
  14:   e9 0e f3 ff ff          jmp    0xfffffffffffff327
  19:   0f 0b                   ud2
  1b:   e9 dd f2 ff ff          jmp    0xfffffffffffff2fd
  20:   48 8b 7c 24 20          mov    0x20(%rsp),%rdi
  25:   e8 4d a0 08 c4          call   0xffffffffc408a077
  2a:*  0f 0b                   ud2             <-- trapping instruction
  2c:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
  33:   fc ff df
  36:   4c 89 ea                mov    %r13,%rdx
  39:   48 c1 ea 03             shr    $0x3,%rdx
  3d:   80                      .byte 0x80
  3e:   3c 02                   cmp    $0x2,%al

Code starting with the faulting instruction
===========================================
   0:   0f 0b                   ud2
   2:   48 b8 00 00 00 00 00    movabs $0xdffffc0000000000,%rax
   9:   fc ff df
   c:   4c 89 ea                mov    %r13,%rdx
   f:   48 c1 ea 03             shr    $0x3,%rdx
  13:   80                      .byte 0x80
  14:   3c 02                   cmp    $0x2,%al
kern  :warn  : [  623.986908] [    T243] RSP: 0018:ffffc9000131f320 EFLAGS: 00010286
kern  :warn  : [  623.987958] [    T243] RAX: ffff888167fd4ea8 RBX: 0000000000000000 RCX: 1ffff1102cffa9d5
kern  :warn  : [  623.989181] [    T243] RDX: ffff888167fd4ea8 RSI: 0000000000000004 RDI: ffff888167fd4f28
kern  :warn  : [  623.990332] [    T243] RBP: ffff888167fd4c00 R08: 0000000000000001 R09: fffff52000263e59
kern  :warn  : [  623.991500] [    T243] R10: 0000000000000003 R11: 0000000000000038 R12: 0000000000000000
kern  :warn  : [  623.992678] [    T243] R13: ffffc9000131f590 R14: ffffea000aff1380 R15: ffffc9000131f588
kern  :warn  : [  623.993830] [    T243] FS:  0000000000000000(0000) GS:ffff88a09e48c000(0000) knlGS:0000000000000000
kern  :warn  : [  623.995069] [    T243] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
kern  :warn  : [  623.996117] [    T243] CR2: 00007f8e6b375000 CR3: 000000209ca72001 CR4: 00000000003726f0
kern  :warn  : [  623.997263] [    T243] Call Trace:
kern  :warn  : [  623.998054] [    T243]  <TASK>
kern  :warn  : [  623.998796] [    T243]  iomap_writeback_folio (fs/iomap/buffered-io.c:1777 fs/iomap/buffered-io.c:1895)
kern  :warn  : [  623.999721] [    T243]  ? __pfx_iomap_writeback_folio (fs/iomap/buffered-io.c:1854)
kern  :warn  : [  624.000685] [    T243]  ? writeback_iter (mm/page-writeback.c:2513)
kern  :warn  : [  624.001559] [    T243]  iomap_writepages (fs/iomap/buffered-io.c:1959)
kern  :warn  : [  624.002410] [    T243]  ? __pfx_iomap_writepages (fs/iomap/buffered-io.c:1944)
kern  :warn  : [  624.003325] [    T243]  ? unwind_next_frame (include/linux/rcupdate.h:1193 arch/x86/kernel/unwind_orc.c:495)
kern  :warn  : [  624.004213] [    T243]  ? ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
kern  :warn  : [  624.005068] [    T243] fuse_writepages (fs/fuse/file.c:2276) fuse
kern  :warn  : [  624.005975] [    T243]  ? __pfx_fuse_writepages (fs/fuse/file.c:2276) fuse
kern  :warn  : [  624.006916] [    T243]  ? update_sg_lb_stats (kernel/sched/fair.c:10481 (discriminator 2))
kern  :warn  : [  624.007771] [    T243]  ? __pfx__raw_spin_lock (kernel/locking/spinlock.c:153)
kern  :warn  : [  624.008629] [    T243]  do_writepages (mm/page-writeback.c:2558)
kern  :warn  : [  624.009436] [    T243]  __writeback_single_inode (fs/fs-writeback.c:1759)
kern  :warn  : [  624.010308] [    T243]  writeback_sb_inodes (fs/fs-writeback.c:2045)
kern  :warn  : [  624.011156] [    T243]  ? __pfx_writeback_sb_inodes (fs/fs-writeback.c:1946)
kern  :warn  : [  624.012141] [    T243]  ? __wb_calc_thresh (arch/x86/include/asm/atomic64_64.h:15 include/linux/atomic/atomic-arch-fallback.h:2583 include/linux/atomic/atomic-long.h:38 include/linux/atomic/atomic-instrumented.h:3189 mm/page-writeback.c:160 mm/page-writeback.c:912)
kern  :warn  : [  624.012991] [    T243]  ? __pfx_down_read_trylock (kernel/locking/rwsem.c:1575)
kern  :warn  : [  624.013925] [    T243]  ? __pfx_move_expired_inodes (fs/fs-writeback.c:1499)
kern  :warn  : [  624.014835] [    T243]  __writeback_inodes_wb (fs/fs-writeback.c:2119)
kern  :warn  : [  624.015718] [    T243]  wb_writeback (fs/fs-writeback.c:2229)
kern  :warn  : [  624.016519] [    T243]  ? __pfx_wb_writeback (fs/fs-writeback.c:2172)
kern  :warn  : [  624.017340] [    T243]  ? get_nr_dirty_inodes (fs/inode.c:95 (discriminator 1) fs/inode.c:103 (discriminator 1))
kern  :warn  : [  624.018188] [    T243]  wb_do_writeback (fs/fs-writeback.c:2387 (discriminator 1))
kern  :warn  : [  624.018984] [    T243]  ? set_worker_desc (kernel/workqueue.c:6209)
kern  :warn  : [  624.019821] [    T243]  ? __pfx_wb_do_writeback (fs/fs-writeback.c:2367)
kern  :warn  : [  624.020682] [    T243]  ? finish_task_switch+0x13b/0x6f0
kern  :warn  : [  624.021574] [    T243]  ? __switch_to (arch/x86/include/asm/cpufeature.h:101 arch/x86/kernel/process_64.c:377 arch/x86/kernel/process_64.c:665)
kern  :warn  : [  624.022332] [    T243]  wb_workfn (fs/fs-writeback.c:2414)
kern  :warn  : [  624.023026] [    T243]  process_one_work (arch/x86/include/asm/jump_label.h:37 include/trace/events/workqueue.h:110 kernel/workqueue.c:3281)
kern  :warn  : [  624.023772] [    T243]  ? assign_work (kernel/workqueue.c:1219)
kern  :warn  : [  624.024470] [    T243]  worker_thread (kernel/workqueue.c:3353 (discriminator 2) kernel/workqueue.c:3440 (discriminator 2))
kern  :warn  : [  624.025199] [    T243]  ? __pfx_worker_thread (kernel/workqueue.c:3386)
kern  :warn  : [  624.025970] [    T243]  kthread (kernel/kthread.c:436)
kern  :warn  : [  624.026642] [    T243]  ? recalc_sigpending (arch/x86/include/asm/bitops.h:75 include/asm-generic/bitops/instrumented-atomic.h:42 include/linux/thread_info.h:109 kernel/signal.c:181)
kern  :warn  : [  624.027381] [    T243]  ? __pfx_kthread (kernel/kthread.c:381)
kern  :warn  : [  624.028081] [    T243]  ret_from_fork (arch/x86/kernel/process.c:164)
kern  :warn  : [  624.028770] [    T243]  ? __pfx_ret_from_fork (arch/x86/kernel/process.c:153)
kern  :warn  : [  624.029533] [    T243]  ? switch_fpu (arch/x86/include/asm/bitops.h:202 (discriminator 1) arch/x86/include/asm/bitops.h:232 (discriminator 1) include/asm-generic/bitops/instrumented-non-atomic.h:142 (discriminator 1) include/linux/thread_info.h:133 (discriminator 1) include/linux/sched.h:2064 (discriminator 1) arch/x86/include/asm/fpu/sched.h:34 (discriminator 1))
kern  :warn  : [  624.030233] [    T243]  ? __switch_to (arch/x86/include/asm/cpufeature.h:101 arch/x86/kernel/process_64.c:377 arch/x86/kernel/process_64.c:665)
kern  :warn  : [  624.030868] [    T243]  ? __pfx_kthread (kernel/kthread.c:381)
kern  :warn  : [  624.031524] [    T243]  ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
kern  :warn  : [  624.032186] [    T243]  </TASK>
kern  :warn  : [  624.032704] [    T243] ---[ end trace 0000000000000000 ]---





The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20260326/202603261451.d2a4cd46-lkp@intel.com



-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [LTP] [PATCH] fuse: when copying a folio delay the mark dirty until the end
  2026-03-26  6:35 ` kernel test robot
@ 2026-03-26 15:05   ` Cyril Hrubis
  0 siblings, 0 replies; 14+ messages in thread
From: Cyril Hrubis @ 2026-03-26 15:05 UTC (permalink / raw)
  To: kernel test robot
  Cc: Horst Birthelmer, lkp, Miklos Szeredi, Bernd Schubert,
	linux-kernel, linux-fsdevel, Horst Birthelmer, ltp, oe-lkp,
	Joanne Koong

Hi!
> commit: 47f8dde97f35e32a1003d54e387273bcdf014ddf ("[PATCH] fuse: when copying a folio delay the mark dirty until the end")
> url: https://github.com/intel-lab-lkp/linux/commits/Horst-Birthelmer/fuse-when-copying-a-folio-delay-the-mark-dirty-until-the-end/20260316-234418
> patch link: https://lore.kernel.org/all/20260316-mark-dirty-per-folio-v1-1-8dc39c94b7ce@ddn.com/
> patch subject: [PATCH] fuse: when copying a folio delay the mark dirty until the end
> 
> in testcase: ltp
> version: 
> with following parameters:
> 
> 	disk: 1HDD
> 	fs: ext4
> 	test: fs-03

Looks like the test that has failed was fs_fill that runs several
threads that attempt to fill the filesystem (until they get ENOSCP) then
delete the files and try again. And apparently we managed to get EIO
when file has been closed() by one of the threads after a few minutes of
the runtime on NTFS mounted over FUSE.

-- 
Cyril Hrubis
chrubis@suse.cz

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2026-03-26 15:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-16 15:16 [PATCH] fuse: when copying a folio delay the mark dirty until the end Horst Birthelmer
2026-03-16 17:29 ` Joanne Koong
2026-03-16 20:02   ` Horst Birthelmer
2026-03-16 22:06     ` Joanne Koong
2026-03-18 14:03       ` Horst Birthelmer
2026-03-18 21:19         ` Joanne Koong
2026-03-18 21:52           ` Bernd Schubert
2026-03-19  1:32             ` Joanne Koong
2026-03-19  4:27               ` Darrick J. Wong
2026-03-20 17:24                 ` Joanne Koong
2026-03-19  8:32               ` Horst Birthelmer
2026-03-20 17:18                 ` Joanne Koong
2026-03-26  6:35 ` kernel test robot
2026-03-26 15:05   ` [LTP] " Cyril Hrubis

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox