linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [REGRESSION] fuse: copy_file_range() fails with EIO
@ 2024-08-09 15:53 Jürg Billeter
  2024-08-09 16:03 ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Jürg Billeter @ 2024-08-09 15:53 UTC (permalink / raw)
  To: Miklos Szeredi, Matthew Wilcox (Oracle); +Cc: linux-fsdevel, regressions

Starting with 6.10, I'm seeing `copy_file_range()`, with source and
destination being on the same FUSE filesystem[1], failing with EIO in
some cases. The (low-level libfuse3) userspace filesystem does not
implement `copy_file_range`, so the kernel falls back to the generic
implementation. The userspace filesystem receives read requests and
replies with the `FUSE_BUF_SPLICE_MOVE` flag.

I'm not sure what exactly triggers the issue but it may depend on the
file size, among other things. I can reproduce it fairly reliably
attempting to copy files that are exactly 65536 bytes in size.

6.9 works fine but I see the issue in 6.10, 6.10.3 and also in current
master ee9a43b7cfe2.

413e8f014c8b848e4ce939156f210df59fbd1c24 is the first bad commit
commit 413e8f014c8b848e4ce939156f210df59fbd1c24 (HEAD)
Author: Matthew Wilcox (Oracle) <willy@infradead.org>
Date:   Sat Apr 20 03:50:06 2024 +0100

    fuse: Convert fuse_readpages_end() to use folio_end_read()
    
    Nobody checks the error flag on fuse folios, so stop setting it.
    Optimise the (optional) setting of the uptodate flag and clearing
    of the lock flag by using folio_end_read().
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

I've confirmed the bisection by reverting this commit on top of 6.10.3,
which resolves the issue.

#regzbot introduced: 413e8f014c8b

Cheers,
Jürg

[1] https://gitlab.com/BuildGrid/buildbox/buildbox/-/tree/master/fuse

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-09 15:53 [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
@ 2024-08-09 16:03 ` Matthew Wilcox
  2024-08-09 16:22   ` [PATCH 1/3] fuse: remove call to SetPageError Matthew Wilcox (Oracle)
                     ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Matthew Wilcox @ 2024-08-09 16:03 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Miklos Szeredi, linux-fsdevel, regressions

On Fri, Aug 09, 2024 at 05:53:26PM +0200, Jürg Billeter wrote:
> Starting with 6.10, I'm seeing `copy_file_range()`, with source and
> destination being on the same FUSE filesystem[1], failing with EIO in
> some cases. The (low-level libfuse3) userspace filesystem does not
> implement `copy_file_range`, so the kernel falls back to the generic
> implementation. The userspace filesystem receives read requests and
> replies with the `FUSE_BUF_SPLICE_MOVE` flag.
> 
> I'm not sure what exactly triggers the issue but it may depend on the
> file size, among other things. I can reproduce it fairly reliably
> attempting to copy files that are exactly 65536 bytes in size.
> 
> 6.9 works fine but I see the issue in 6.10, 6.10.3 and also in current
> master ee9a43b7cfe2.
> 
> 413e8f014c8b848e4ce939156f210df59fbd1c24 is the first bad commit
> commit 413e8f014c8b848e4ce939156f210df59fbd1c24 (HEAD)
> Author: Matthew Wilcox (Oracle) <willy@infradead.org>
> Date:   Sat Apr 20 03:50:06 2024 +0100
> 
>     fuse: Convert fuse_readpages_end() to use folio_end_read()
>     
>     Nobody checks the error flag on fuse folios, so stop setting it.
>     Optimise the (optional) setting of the uptodate flag and clearing
>     of the lock flag by using folio_end_read().
>     
>     Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>     Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
> 
> I've confirmed the bisection by reverting this commit on top of 6.10.3,
> which resolves the issue.

Umm.  I don't see it.  This is all that's in the commit:

        for (i = 0; i < ap->num_pages; i++) {
-               struct page *page = ap->pages[i];
+               struct folio *folio = page_folio(ap->pages[i]);

-               if (!err)
-                       SetPageUptodate(page);
-               else
-                       SetPageError(page);
-               unlock_page(page);
-               put_page(page);
+               folio_end_read(folio, !err);
+               folio_put(folio);
        }

Do you have CONFIG_DEBUG_VM enabled?  There are some debugging asserts
which that will enable that might indicate a problem.

Otherwise we can try splitting the commit into individual steps and
seeing which one shows the problem.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/3] fuse: remove call to SetPageError
  2024-08-09 16:03 ` Matthew Wilcox
@ 2024-08-09 16:22   ` Matthew Wilcox (Oracle)
  2024-08-09 16:22   ` [PATCH 2/3] fuse: Use a folio instead of a page Matthew Wilcox (Oracle)
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-08-09 16:22 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Matthew Wilcox (Oracle), Miklos Szeredi, linux-fsdevel

part one

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/fuse/file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index b57ce4157640..4bf452d5ba9d 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -939,8 +939,6 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
 
 		if (!err)
 			SetPageUptodate(page);
-		else
-			SetPageError(page);
 		unlock_page(page);
 		put_page(page);
 	}
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/3] fuse: Use a folio instead of a page
  2024-08-09 16:03 ` Matthew Wilcox
  2024-08-09 16:22   ` [PATCH 1/3] fuse: remove call to SetPageError Matthew Wilcox (Oracle)
@ 2024-08-09 16:22   ` Matthew Wilcox (Oracle)
  2024-08-09 16:22   ` [PATCH 3/3] fuse: use folio_end_read Matthew Wilcox (Oracle)
  2024-08-10  5:56   ` [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
  3 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-08-09 16:22 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Matthew Wilcox (Oracle), Miklos Szeredi, linux-fsdevel

part two

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/fuse/file.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 4bf452d5ba9d..2b5533e41a62 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -935,12 +935,12 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
 	}
 
 	for (i = 0; i < ap->num_pages; i++) {
-		struct page *page = ap->pages[i];
+		struct folio *folio = page_folio(ap->pages[i]);
 
 		if (!err)
-			SetPageUptodate(page);
-		unlock_page(page);
-		put_page(page);
+			folio_mark_uptodate(folio);
+		folio_unlock(folio);
+		folio_put(folio);
 	}
 	if (ia->ff)
 		fuse_file_put(ia->ff, false);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/3] fuse: use folio_end_read
  2024-08-09 16:03 ` Matthew Wilcox
  2024-08-09 16:22   ` [PATCH 1/3] fuse: remove call to SetPageError Matthew Wilcox (Oracle)
  2024-08-09 16:22   ` [PATCH 2/3] fuse: Use a folio instead of a page Matthew Wilcox (Oracle)
@ 2024-08-09 16:22   ` Matthew Wilcox (Oracle)
  2024-08-10 12:24     ` Jürg Billeter
  2024-08-10  5:56   ` [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
  3 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-08-09 16:22 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Matthew Wilcox (Oracle), Miklos Szeredi, linux-fsdevel

part three

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/fuse/file.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 2b5533e41a62..f39456c65ed7 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -937,9 +937,7 @@ static void fuse_readpages_end(struct fuse_mount *fm, struct fuse_args *args,
 	for (i = 0; i < ap->num_pages; i++) {
 		struct folio *folio = page_folio(ap->pages[i]);
 
-		if (!err)
-			folio_mark_uptodate(folio);
-		folio_unlock(folio);
+		folio_end_read(folio, !err);
 		folio_put(folio);
 	}
 	if (ia->ff)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-09 16:03 ` Matthew Wilcox
                     ` (2 preceding siblings ...)
  2024-08-09 16:22   ` [PATCH 3/3] fuse: use folio_end_read Matthew Wilcox (Oracle)
@ 2024-08-10  5:56   ` Jürg Billeter
  2024-08-10 15:12     ` Matthew Wilcox
  3 siblings, 1 reply; 19+ messages in thread
From: Jürg Billeter @ 2024-08-10  5:56 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Miklos Szeredi, linux-fsdevel, regressions

Hi Matthew,

Thanks for the quick response.

On Fri, 2024-08-09 at 17:03 +0100, Matthew Wilcox wrote:
> Do you have CONFIG_DEBUG_VM enabled?  There are some debugging asserts
> which that will enable that might indicate a problem.

With CONFIG_DEBUG_VM enabled, I get:

page: refcount:2 mapcount:0 mapping:00000000b2c30835 index:0x0 pfn:0x12a113
memcg:ffff9d8e3a660800
aops:0xffffffff8a056820 ino:21 dentry name:"bash"
flags: 0x24000000000022d(locked|referenced|uptodate|lru|workingset|node=0|zone=2)
raw: 024000000000022d ffffd9ce04a827c8 ffffd9ce04a84508 ffff9d8e0bbc99f0
raw: 0000000000000000 0000000000000000 00000002ffffffff ffff9d8e3a660800
page dumped because: VM_BUG_ON_FOLIO(folio_test_uptodate(folio))
------------[ cut here ]------------
kernel BUG at mm/filemap.c:1534!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 0 UID: 1000 PID: 1638 Comm: buildbox-fuse Not tainted 6.11.0-rc2+ #26
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:folio_end_read+0xa2/0xb0
Code: 37 8a e8 21 1b 05 00 0f 0b 48 c7 c6 48 7f 39 8a e8 13 1b 05 00 0f 0b 31 f6 e9 7a f9 ff ff 48 c7 c6 a8 7f 39 8a e8 fe 1a 05 00 <0f> 0b 90 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
RSP: 0018:ffffaa0ec289fc68 EFLAGS: 00010246
RAX: 0000000000000040 RBX: ffffd9ce04a844c0 RCX: 0000000000000027
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9d8f77c1c8c0
RBP: ffff9d8e07298a38 R08: 00000000ffffefff R09: ffffffff8a6b0d68
R10: 0000000000000003 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000001 R14: ffff9d8e305ba098 R15: 0000000000000000
FS:  00007f83d62b4140(0000) GS:ffff9d8f77c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564ef5598078 CR3: 00000001013be000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? __die+0x56/0x97
 ? die+0x2e/0x50
 ? do_trap+0x10a/0x110
 ? do_error_trap+0x65/0x80
 ? folio_end_read+0xa2/0xb0
 ? exc_invalid_op+0x50/0x70
 ? folio_end_read+0xa2/0xb0
 ? asm_exc_invalid_op+0x1a/0x20
 ? folio_end_read+0xa2/0xb0
 ? folio_end_read+0xa2/0xb0
 fuse_readpages_end+0xc3/0x150
 fuse_request_end+0x84/0x170
 fuse_dev_do_write+0x24d/0x1050
 ? __kmalloc_node_noprof+0x25e/0x480
 ? fuse_dev_splice_write+0x9d/0x390
 fuse_dev_splice_write+0x2b0/0x390
 do_splice+0x311/0x8b0
 __do_splice+0x204/0x220
 __x64_sys_splice+0xb2/0x120
 do_syscall_64+0x54/0x140
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f83d5f29a03
Code: 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 80 3d 29 48 0d 00 00 49 89 ca 74 14 b8 13 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 75 c3 0f 1f 40 00 55 48 83 ec 30 44 89 4c 24
RSP: 002b:00007fff2c0b8ce8 EFLAGS: 00000202 ORIG_RAX: 0000000000000113
RAX: ffffffffffffffda RBX: 00007fff2c0b8eb0 RCX: 00007f83d5f29a03
RDX: 0000000000000105 RSI: 0000000000000000 RDI: 0000000000000106
RBP: 0000564ef555a4f0 R08: 0000000000004010 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000202 R12: 00007fff2c0b8e20
R13: 0000000000000001 R14: 00000000000000fb R15: 0000000000000000
 </TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:folio_end_read+0xa2/0xb0
Code: 37 8a e8 21 1b 05 00 0f 0b 48 c7 c6 48 7f 39 8a e8 13 1b 05 00 0f 0b 31 f6 e9 7a f9 ff ff 48 c7 c6 a8 7f 39 8a e8 fe 1a 05 00 <0f> 0b 90 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90
RSP: 0018:ffffaa0ec289fc68 EFLAGS: 00010246
RAX: 0000000000000040 RBX: ffffd9ce04a844c0 RCX: 0000000000000027
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff9d8f77c1c8c0
RBP: ffff9d8e07298a38 R08: 00000000ffffefff R09: ffffffff8a6b0d68
R10: 0000000000000003 R11: 0000000000000002 R12: 0000000000000000
R13: 0000000000000001 R14: ffff9d8e305ba098 R15: 0000000000000000
FS:  00007f83d62b4140(0000) GS:ffff9d8f77c00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000564ef5598078 CR3: 00000001013be000 CR4: 0000000000350ef0
Kernel panic - not syncing: Fatal exception
Kernel Offset: 0x8000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Cheers,
Jürg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] fuse: use folio_end_read
  2024-08-09 16:22   ` [PATCH 3/3] fuse: use folio_end_read Matthew Wilcox (Oracle)
@ 2024-08-10 12:24     ` Jürg Billeter
  2024-08-20 12:57       ` Jürg Billeter
  0 siblings, 1 reply; 19+ messages in thread
From: Jürg Billeter @ 2024-08-10 12:24 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Miklos Szeredi, linux-fsdevel

On Fri, 2024-08-09 at 17:22 +0100, Matthew Wilcox (Oracle) wrote:
> part three
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
>  fs/fuse/file.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> index 2b5533e41a62..f39456c65ed7 100644
> --- a/fs/fuse/file.c
> +++ b/fs/fuse/file.c
> @@ -937,9 +937,7 @@ static void fuse_readpages_end(struct fuse_mount
> *fm, struct fuse_args *args,
>  	for (i = 0; i < ap->num_pages; i++) {
>  		struct folio *folio = page_folio(ap->pages[i]);
>  
> -		if (!err)
> -			folio_mark_uptodate(folio);
> -		folio_unlock(folio);
> +		folio_end_read(folio, !err);
>  		folio_put(folio);
>  	}
>  	if (ia->ff)

Reverting this part is sufficient to fix the issue for me.

Cheers,
Jürg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-10  5:56   ` [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
@ 2024-08-10 15:12     ` Matthew Wilcox
  2024-08-22 13:04       ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2024-08-10 15:12 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Miklos Szeredi, linux-fsdevel, regressions

On Sat, Aug 10, 2024 at 07:56:21AM +0200, Jürg Billeter wrote:
> Thanks for the quick response.

Thanks for the quick test!

> On Fri, 2024-08-09 at 17:03 +0100, Matthew Wilcox wrote:
> > Do you have CONFIG_DEBUG_VM enabled?  There are some debugging asserts
> > which that will enable that might indicate a problem.
> 
> With CONFIG_DEBUG_VM enabled, I get:
> 
> page: refcount:2 mapcount:0 mapping:00000000b2c30835 index:0x0 pfn:0x12a113
> memcg:ffff9d8e3a660800
> aops:0xffffffff8a056820 ino:21 dentry name:"bash"
> flags: 0x24000000000022d(locked|referenced|uptodate|lru|workingset|node=0|zone=2)
> raw: 024000000000022d ffffd9ce04a827c8 ffffd9ce04a84508 ffff9d8e0bbc99f0
> raw: 0000000000000000 0000000000000000 00000002ffffffff ffff9d8e3a660800
> page dumped because: VM_BUG_ON_FOLIO(folio_test_uptodate(folio))

That's what I suspected was going wrong -- we're trying to end a read on
a folio that is already uptodate.  Miklos, what the hell is FUSE doing
here?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] fuse: use folio_end_read
  2024-08-10 12:24     ` Jürg Billeter
@ 2024-08-20 12:57       ` Jürg Billeter
  2024-08-20 13:50         ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Jürg Billeter @ 2024-08-20 12:57 UTC (permalink / raw)
  To: Matthew Wilcox (Oracle); +Cc: Miklos Szeredi, linux-fsdevel

On Sat, 2024-08-10 at 14:24 +0200, Jürg Billeter wrote:
> On Fri, 2024-08-09 at 17:22 +0100, Matthew Wilcox (Oracle) wrote:
> > part three
> > 
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> >  fs/fuse/file.c | 4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> > 
> > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > index 2b5533e41a62..f39456c65ed7 100644
> > --- a/fs/fuse/file.c
> > +++ b/fs/fuse/file.c
> > @@ -937,9 +937,7 @@ static void fuse_readpages_end(struct
> > fuse_mount
> > *fm, struct fuse_args *args,
> >  	for (i = 0; i < ap->num_pages; i++) {
> >  		struct folio *folio = page_folio(ap->pages[i]);
> >  
> > -		if (!err)
> > -			folio_mark_uptodate(folio);
> > -		folio_unlock(folio);
> > +		folio_end_read(folio, !err);
> >  		folio_put(folio);
> >  	}
> >  	if (ia->ff)
> 
> Reverting this part is sufficient to fix the issue for me.

Would it make sense to get a revert of this part (or a full revert of
commit 413e8f014c8b) into mainline and also 6.10 stable if a proper fix
will take more time?

Cheers,
Jürg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] fuse: use folio_end_read
  2024-08-20 12:57       ` Jürg Billeter
@ 2024-08-20 13:50         ` Matthew Wilcox
  2024-08-22 11:58           ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Matthew Wilcox @ 2024-08-20 13:50 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Miklos Szeredi, linux-fsdevel

On Tue, Aug 20, 2024 at 02:57:04PM +0200, Jürg Billeter wrote:
> On Sat, 2024-08-10 at 14:24 +0200, Jürg Billeter wrote:
> > On Fri, 2024-08-09 at 17:22 +0100, Matthew Wilcox (Oracle) wrote:
> > > part three
> > > 
> > > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > ---
> > >  fs/fuse/file.c | 4 +---
> > >  1 file changed, 1 insertion(+), 3 deletions(-)
> > > 
> > > diff --git a/fs/fuse/file.c b/fs/fuse/file.c
> > > index 2b5533e41a62..f39456c65ed7 100644
> > > --- a/fs/fuse/file.c
> > > +++ b/fs/fuse/file.c
> > > @@ -937,9 +937,7 @@ static void fuse_readpages_end(struct
> > > fuse_mount
> > > *fm, struct fuse_args *args,
> > >  	for (i = 0; i < ap->num_pages; i++) {
> > >  		struct folio *folio = page_folio(ap->pages[i]);
> > >  
> > > -		if (!err)
> > > -			folio_mark_uptodate(folio);
> > > -		folio_unlock(folio);
> > > +		folio_end_read(folio, !err);
> > >  		folio_put(folio);
> > >  	}
> > >  	if (ia->ff)
> > 
> > Reverting this part is sufficient to fix the issue for me.
> 
> Would it make sense to get a revert of this part (or a full revert of
> commit 413e8f014c8b) into mainline and also 6.10 stable if a proper fix
> will take more time?

As far as I'm concerned, I've found the fix.  It's just that Miklos
isn't responding.  On holiday, perhaps?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] fuse: use folio_end_read
  2024-08-20 13:50         ` Matthew Wilcox
@ 2024-08-22 11:58           ` Miklos Szeredi
  2024-08-22 12:25             ` Matthew Wilcox
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2024-08-22 11:58 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jürg Billeter, linux-fsdevel

On Tue, 20 Aug 2024 at 15:50, Matthew Wilcox <willy@infradead.org> wrote:

> > Would it make sense to get a revert of this part (or a full revert of
> > commit 413e8f014c8b) into mainline and also 6.10 stable if a proper fix
> > will take more time?
>
> As far as I'm concerned, I've found the fix.  It's just that Miklos
> isn't responding.  On holiday, perhaps?

I was, but not anymore.

I'm trying to dig though the unread pile, but haven't seen your fix.

Can you please point me in the right direction?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/3] fuse: use folio_end_read
  2024-08-22 11:58           ` Miklos Szeredi
@ 2024-08-22 12:25             ` Matthew Wilcox
  0 siblings, 0 replies; 19+ messages in thread
From: Matthew Wilcox @ 2024-08-22 12:25 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Jürg Billeter, linux-fsdevel

On Thu, Aug 22, 2024 at 01:58:28PM +0200, Miklos Szeredi wrote:
> On Tue, 20 Aug 2024 at 15:50, Matthew Wilcox <willy@infradead.org> wrote:
> 
> > > Would it make sense to get a revert of this part (or a full revert of
> > > commit 413e8f014c8b) into mainline and also 6.10 stable if a proper fix
> > > will take more time?
> >
> > As far as I'm concerned, I've found the fix.  It's just that Miklos
> > isn't responding.  On holiday, perhaps?
> 
> I was, but not anymore.
> 
> I'm trying to dig though the unread pile, but haven't seen your fix.
> 
> Can you please point me in the right direction?

See the other fork of this thread:

> That's what I suspected was going wrong -- we're trying to end a read on
> a folio that is already uptodate.  Miklos, what the hell is FUSE doing
> here?


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-10 15:12     ` Matthew Wilcox
@ 2024-08-22 13:04       ` Miklos Szeredi
  2024-08-22 13:12         ` Jürg Billeter
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2024-08-22 13:04 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Jürg Billeter, linux-fsdevel, regressions

On Sat, 10 Aug 2024 at 17:12, Matthew Wilcox <willy@infradead.org> wrote:
>
> On Sat, Aug 10, 2024 at 07:56:21AM +0200, Jürg Billeter wrote:
> > Thanks for the quick response.
>
> Thanks for the quick test!
>
> > On Fri, 2024-08-09 at 17:03 +0100, Matthew Wilcox wrote:
> > > Do you have CONFIG_DEBUG_VM enabled?  There are some debugging asserts
> > > which that will enable that might indicate a problem.
> >
> > With CONFIG_DEBUG_VM enabled, I get:
> >
> > page: refcount:2 mapcount:0 mapping:00000000b2c30835 index:0x0 pfn:0x12a113
> > memcg:ffff9d8e3a660800
> > aops:0xffffffff8a056820 ino:21 dentry name:"bash"
> > flags: 0x24000000000022d(locked|referenced|uptodate|lru|workingset|node=0|zone=2)
> > raw: 024000000000022d ffffd9ce04a827c8 ffffd9ce04a84508 ffff9d8e0bbc99f0
> > raw: 0000000000000000 0000000000000000 00000002ffffffff ffff9d8e3a660800
> > page dumped because: VM_BUG_ON_FOLIO(folio_test_uptodate(folio))
>
> That's what I suspected was going wrong -- we're trying to end a read on
> a folio that is already uptodate.  Miklos, what the hell is FUSE doing
> here?

Ah, this is the fancy page cache replacement done in fuse_try_move_page().

I understand how this triggers VM_BUG_ON_FOLIO() in folio_end_read().

What I don't understand is how this results in the -EIO that Jürg reported.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-22 13:04       ` Miklos Szeredi
@ 2024-08-22 13:12         ` Jürg Billeter
  2024-08-22 13:14           ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Jürg Billeter @ 2024-08-22 13:12 UTC (permalink / raw)
  To: Miklos Szeredi, Matthew Wilcox; +Cc: linux-fsdevel, regressions

On Thu, 2024-08-22 at 15:04 +0200, Miklos Szeredi wrote:
> On Sat, 10 Aug 2024 at 17:12, Matthew Wilcox <willy@infradead.org> wrote:
> > That's what I suspected was going wrong -- we're trying to end a read on
> > a folio that is already uptodate.  Miklos, what the hell is FUSE doing
> > here?
> 
> Ah, this is the fancy page cache replacement done in
> fuse_try_move_page().
> 
> I understand how this triggers VM_BUG_ON_FOLIO() in folio_end_read().
> 
> What I don't understand is how this results in the -EIO that Jürg
> reported.

I'm not really familiar with this code but it seems `folio_end_read()`
uses xor to update the `PG_uptodate` flag. So if it was already set, it
will incorrectly clear the `PG_uptodate` set, which I guess triggers
the issue.

Jürg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-22 13:12         ` Jürg Billeter
@ 2024-08-22 13:14           ` Miklos Szeredi
  2024-08-22 13:32             ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2024-08-22 13:14 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Matthew Wilcox, linux-fsdevel, regressions

On Thu, 22 Aug 2024 at 15:12, Jürg Billeter <j@bitron.ch> wrote:
>
> On Thu, 2024-08-22 at 15:04 +0200, Miklos Szeredi wrote:
> > On Sat, 10 Aug 2024 at 17:12, Matthew Wilcox <willy@infradead.org> wrote:
> > > That's what I suspected was going wrong -- we're trying to end a read on
> > > a folio that is already uptodate.  Miklos, what the hell is FUSE doing
> > > here?
> >
> > Ah, this is the fancy page cache replacement done in
> > fuse_try_move_page().
> >
> > I understand how this triggers VM_BUG_ON_FOLIO() in folio_end_read().
> >
> > What I don't understand is how this results in the -EIO that Jürg
> > reported.
>
> I'm not really familiar with this code but it seems `folio_end_read()`
> uses xor to update the `PG_uptodate` flag. So if it was already set, it
> will incorrectly clear the `PG_uptodate` set, which I guess triggers
> the issue.

Indeed, that would explain this.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-22 13:14           ` Miklos Szeredi
@ 2024-08-22 13:32             ` Miklos Szeredi
  2024-08-22 14:14               ` Jürg Billeter
  2024-08-29  9:27               ` Marcel Ziswiler
  0 siblings, 2 replies; 19+ messages in thread
From: Miklos Szeredi @ 2024-08-22 13:32 UTC (permalink / raw)
  To: Jürg Billeter; +Cc: Matthew Wilcox, linux-fsdevel, regressions

[-- Attachment #1: Type: text/plain, Size: 489 bytes --]

On Thu, 22 Aug 2024 at 15:14, Miklos Szeredi <miklos@szeredi.hu> wrote:

> > > What I don't understand is how this results in the -EIO that Jürg
> > > reported.
> >
> > I'm not really familiar with this code but it seems `folio_end_read()`
> > uses xor to update the `PG_uptodate` flag. So if it was already set, it
> > will incorrectly clear the `PG_uptodate` set, which I guess triggers
> > the issue.

Untested patch attached.  Could you please try this?

Thanks,
Miklos

[-- Attachment #2: test.patch --]
[-- Type: text/x-patch, Size: 730 bytes --]

diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 1f1d8a023d35..7cc4f8c9b896 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -789,7 +789,6 @@ static int fuse_check_folio(struct folio *folio)
 	    (folio->flags & PAGE_FLAGS_CHECK_AT_PREP &
 	     ~(1 << PG_locked |
 	       1 << PG_referenced |
-	       1 << PG_uptodate |
 	       1 << PG_lru |
 	       1 << PG_active |
 	       1 << PG_workingset |
@@ -834,9 +833,7 @@ static int fuse_try_move_page(struct fuse_copy_state *cs, struct page **pagep)
 
 	newfolio = page_folio(buf->page);
 
-	if (!folio_test_uptodate(newfolio))
-		folio_mark_uptodate(newfolio);
-
+	folio_clear_uptodate(newfolio);
 	folio_clear_mappedtodisk(newfolio);
 
 	if (fuse_check_folio(newfolio) != 0)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-22 13:32             ` Miklos Szeredi
@ 2024-08-22 14:14               ` Jürg Billeter
  2024-08-29  9:27               ` Marcel Ziswiler
  1 sibling, 0 replies; 19+ messages in thread
From: Jürg Billeter @ 2024-08-22 14:14 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: Matthew Wilcox, linux-fsdevel, regressions

On Thu, 2024-08-22 at 15:32 +0200, Miklos Szeredi wrote:
> On Thu, 22 Aug 2024 at 15:14, Miklos Szeredi <miklos@szeredi.hu>
> wrote:
> 
> > > > What I don't understand is how this results in the -EIO that Jürg
> > > > reported.
> > > 
> > > I'm not really familiar with this code but it seems `folio_end_read()`
> > > uses xor to update the `PG_uptodate` flag. So if it was already set, it
> > > will incorrectly clear the `PG_uptodate` set, which I guess triggers
> > > the issue.
> 
> Untested patch attached.  Could you please try this?

I can no longer reproduce the issue with this patch. So at least with
regards to my test case, it looks good to me.

Cheers,
Jürg

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-22 13:32             ` Miklos Szeredi
  2024-08-22 14:14               ` Jürg Billeter
@ 2024-08-29  9:27               ` Marcel Ziswiler
  2024-08-29  9:30                 ` Miklos Szeredi
  1 sibling, 1 reply; 19+ messages in thread
From: Marcel Ziswiler @ 2024-08-29  9:27 UTC (permalink / raw)
  To: Miklos Szeredi, Jürg Billeter
  Cc: Matthew Wilcox, linux-fsdevel, regressions

Hi Miklos

On Thu, 2024-08-22 at 15:32 +0200, Miklos Szeredi wrote:
> On Thu, 22 Aug 2024 at 15:14, Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > > > What I don't understand is how this results in the -EIO that Jürg
> > > > reported.
> > > 
> > > I'm not really familiar with this code but it seems `folio_end_read()`
> > > uses xor to update the `PG_uptodate` flag. So if it was already set, it
> > > will incorrectly clear the `PG_uptodate` set, which I guess triggers
> > > the issue.
> 
> Untested patch attached.  Could you please try this?

Any progress on this? I can confirm that it does also fix an issue I noticed on Fedora Silverblue 40 running
6.10.6-200.fc40.x86_64. I successfully tested this patch on top of latest kernel-ark 6.11.0-
0.rc4.872cf28b8df9.39.fc40.x86_64. Thanks!

> Thanks,
> Miklos

Cheers

Marcel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [REGRESSION] fuse: copy_file_range() fails with EIO
  2024-08-29  9:27               ` Marcel Ziswiler
@ 2024-08-29  9:30                 ` Miklos Szeredi
  0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2024-08-29  9:30 UTC (permalink / raw)
  To: Marcel Ziswiler
  Cc: Jürg Billeter, Matthew Wilcox, linux-fsdevel, regressions

On Thu, 29 Aug 2024 at 11:27, Marcel Ziswiler <marcel@ziswiler.com> wrote:

> Any progress on this? I can confirm that it does also fix an issue I noticed on Fedora Silverblue 40 running
> 6.10.6-200.fc40.x86_64. I successfully tested this patch on top of latest kernel-ark 6.11.0-
> 0.rc4.872cf28b8df9.39.fc40.x86_64. Thanks!

Planning to send a pull request with fuse fixes this week.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2024-08-29  9:30 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-09 15:53 [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
2024-08-09 16:03 ` Matthew Wilcox
2024-08-09 16:22   ` [PATCH 1/3] fuse: remove call to SetPageError Matthew Wilcox (Oracle)
2024-08-09 16:22   ` [PATCH 2/3] fuse: Use a folio instead of a page Matthew Wilcox (Oracle)
2024-08-09 16:22   ` [PATCH 3/3] fuse: use folio_end_read Matthew Wilcox (Oracle)
2024-08-10 12:24     ` Jürg Billeter
2024-08-20 12:57       ` Jürg Billeter
2024-08-20 13:50         ` Matthew Wilcox
2024-08-22 11:58           ` Miklos Szeredi
2024-08-22 12:25             ` Matthew Wilcox
2024-08-10  5:56   ` [REGRESSION] fuse: copy_file_range() fails with EIO Jürg Billeter
2024-08-10 15:12     ` Matthew Wilcox
2024-08-22 13:04       ` Miklos Szeredi
2024-08-22 13:12         ` Jürg Billeter
2024-08-22 13:14           ` Miklos Szeredi
2024-08-22 13:32             ` Miklos Szeredi
2024-08-22 14:14               ` Jürg Billeter
2024-08-29  9:27               ` Marcel Ziswiler
2024-08-29  9:30                 ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).