* recap of 9p problems in 6.9
@ 2024-04-08 4:17 Dominique Martinet
2024-04-08 23:14 ` Eric Van Hensbergen
2024-04-09 0:11 ` Eric Van Hensbergen
0 siblings, 2 replies; 7+ messages in thread
From: Dominique Martinet @ 2024-04-08 4:17 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: v9fs
Hi Eric (& anyone else reading on the list),
I've spent quite a bit of time testing after Kent's report last week and
we have a few problems, so trying to recap a bit before I blow up (I'm
really busy right now so this is eating up work hours I'm not really
comfortable using for this, and will need to tone it down quite fast)
* Kent's weird error when running xfstests from 9p:
https://lkml.kernel.org/r/f6upxoxa6d2c6cbh4ka775msggvuduigiu7xgvfx7qsufg2lo6@2ellaad6b2on
He's saying reverting new patches that got in 6.9 fixes it but didn't
have time to bisect yet as reproduction rate is too low;
I couldn't reproduce yet on my end but we need to either confirm a
single patch to revert or revert the whole thing for another cycle if we
don't figure it out in a few weeks as I don't like the idea of a stable
release with this bug.
* open / unlink / fstat|ftruncate etc fail
https://lkml.kernel.org/r/E7D462A2-EE93-4A57-9F15-8565EE1567F3@linux.dev
I didn't confirm yet but I think it's a new bug? maybe the 'fix dups
even in uncached mode' patch dropping v9fs_drop_inode(); that's easy
enough to test just a new bug so didn't look yet
* running apt install in a VM with 9p as rootfs in default cache=none
got me this warning once:
```
[ 64.291867] ------------[ cut here ]------------
[ 64.292458] WARNING: CPU: 0 PID: 161 at fs/inode.c:332 drop_nlink+0x2a/0x40
[ 64.293380] Modules linked in: 9p netfs
[ 64.293818] CPU: 0 PID: 161 Comm: dpkg Not tainted 6.9.0-rc2+ #20
[ 64.294583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 64.296043] RIP: 0010:drop_nlink+0x2a/0x40
[ 64.296518] Code: 0f 1f 44 00 00 55 8b 47 38 48 89 e5 85 c0 74 1a 83 e8 01 89 47 38 75 0c 48 8b 47 18 3e 48 ff 80 30 04 00 00 5d c3 cc cc cc cc <0f> 0b c7 47 38 ff ff ff ff 5d c3 cc cc cc cc 0f 1f 80 00 00 00 00
[ 64.298896] RSP: 0018:ffff9e3c8072be18 EFLAGS: 00010246
[ 64.299440] RAX: 0000000000000000 RBX: ffff9c49414c4540 RCX: 000000000002bd46
[ 64.300239] RDX: ffff9c4941bdac0c RSI: 0000000000033990 RDI: ffff9c49414e8dc0
[ 64.301034] RBP: ffff9e3c8072be18 R08: 0000000000000000 R09: ffff9e3c8072bd78
[ 64.302328] R10: ffff9e3c8072bd80 R11: ffff9c4942997298 R12: ffff9c49414e8b00
[ 64.303610] R13: 0000000000000000 R14: ffff9c49414e8dc0 R15: 0000000000000000
[ 64.304904] FS: 00007fdf1a979d00(0000) GS:ffff9c495f200000(0000) knlGS:0000000000000000
[ 64.306407] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 64.307390] CR2: 0000561961b47000 CR3: 0000000002580000 CR4: 00000000000006b0
[ 64.308679] Call Trace:
[ 64.308956] <TASK>
[ 64.309152] ? show_regs+0x64/0x70
[ 64.309644] ? __warn+0x84/0x120
[ 64.310121] ? drop_nlink+0x2a/0x40
[ 64.310620] ? report_bug+0x15d/0x180
[ 64.311182] ? handle_bug+0x44/0x90
[ 64.311684] ? exc_invalid_op+0x18/0x70
[ 64.312268] ? asm_exc_invalid_op+0x1b/0x20
[ 64.312928] ? drop_nlink+0x2a/0x40
[ 64.313429] v9fs_remove+0x132/0x280 [9p]
[ 64.314077] v9fs_vfs_unlink+0x10/0x20 [9p]
[ 64.314732] vfs_unlink+0x135/0x2c0
[ 64.315245] do_unlinkat+0x231/0x2b0
[ 64.315764] __x64_sys_unlink+0x23/0x30
[ 64.316340] do_syscall_64+0x5f/0x130
[ 64.316883] entry_SYSCALL_64_after_hwframe+0x71/0x79
[ 64.317734] RIP: 0033:0x7fdf1ab0ea07
[ 64.318257] Code: f0 ff ff 73 01 c3 48 8b 0d f6 83 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 83 0d 00 f7 d8 64 89 01 48
[ 64.322093] RSP: 002b:00007fff0b13fc58 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
[ 64.323478] RAX: ffffffffffffffda RBX: 00005619848be5e0 RCX: 00007fdf1ab0ea07
[ 64.324768] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000056198492ad40
[ 64.326081] RBP: 000056198492ad40 R08: 0000000000000007 R09: 00005619848bdfb0
[ 64.327371] R10: ed3dec18d54433a1 R11: 0000000000000202 R12: 00005619848bdef0
[ 64.328657] R13: 0000561961b30040 R14: 0000561961b431d8 R15: 00007fdf1ac6e020
[ 64.329969] </TASK>
[ 64.330185] ---[ end trace 0000000000000000 ]---
```
Doesn't seem to be everytime, I don't think it happened before but given
I didn't look into reproducing it I couldn't say for sure.
* some refcounting bug running this (creating/removing the same files many
times in parallel).
This doesn't seem new as I reproduce it without the 6.9 patches, I'll
need to dig into it.
```
mkdir tmp
echo test > tmp/test
for i in 1 2 3 4 5; do
seq 1 1000 | while read _; do
cp -a tmp copy 2>/dev/null
rm -rf copy 2>/dev/null
done &
done
wait
```
I also had a qemu segfault after a similar script (copying/removing a
larger tree), but looking at the code/locals I don't see how it could
fail there given the locals... giving up on this for now, it happened
after refcount UAF error on linux so might be caused by garbage we sent,
but that's still a problem obviously)
```
(gdb) bt
#0 __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:415
#1 0x000064d9dd7a5716 in v9fs_mark_fids_unreclaim (pdu=pdu@entry=0x64d9e16ffe58, path=path@entry=0x74d9888c8f70) at ../hw/9pfs/9p.c:545
#2 0x000064d9dd7aa7ad in v9fs_unlinkat (opaque=0x64d9e16ffe58) at ../hw/9pfs/9p.c:3189
#3 0x000064d9ddd4d64b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:175
#4 0x000074da302bf510 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:90 from /nix/store/cyrrf49i2hm1w7vn2j945ic3rrzgxbqs-glibc-2.38-44/lib/libc.so.6
#5 0x00007fff7b8c4e30 in ?? ()
#6 0x0000000000000000 in ?? ()
(gdb) p fidp->path
$1 = {size = 29, data = 0x74d96c001010 "./copy/1/tests/btrfs/007.out"}
(gdb) p *path
$2 = {size = 29, data = 0x74da1c001e00 "./copy/4/tests/btrfs/049.out"}
# failing on this line:...
!memcmp(fidp->path.data, path->data, path->size)) {
```
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: recap of 9p problems in 6.9
2024-04-08 4:17 recap of 9p problems in 6.9 Dominique Martinet
@ 2024-04-08 23:14 ` Eric Van Hensbergen
2024-04-09 0:11 ` Eric Van Hensbergen
1 sibling, 0 replies; 7+ messages in thread
From: Eric Van Hensbergen @ 2024-04-08 23:14 UTC (permalink / raw)
To: Dominique Martinet; +Cc: v9fs
Thanks for looking into these Dominique - I've got the reports queued up and am just looking to ring fence some time to dig into them before I commit to a revert. The change in handling of inode matching and keeping inodes around in the cache longer are likely the root of all of these, but the old code had so much unexplained corner cases I really want to understand the problems before revert.
The create/remove a file multiple times in parallel could be linked to an underlying issue with only using qid.path for match -- the old extended match logic also matched i_generation number -- but I convinced myself it wasn't necessary and it required a fresh getattr every time which seemed excessive. The alternative is to work this into a server-side fix, but if that breaks all legacy than that's likely unacceptable.
-eric
April 7, 2024 at 11:17 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
>
> Hi Eric (& anyone else reading on the list),
>
> I've spent quite a bit of time testing after Kent's report last week and
>
> we have a few problems, so trying to recap a bit before I blow up (I'm
>
> really busy right now so this is eating up work hours I'm not really
>
> comfortable using for this, and will need to tone it down quite fast)
>
> * Kent's weird error when running xfstests from 9p:
>
> https://lkml.kernel.org/r/f6upxoxa6d2c6cbh4ka775msggvuduigiu7xgvfx7qsufg2lo6@2ellaad6b2on
>
> He's saying reverting new patches that got in 6.9 fixes it but didn't
>
> have time to bisect yet as reproduction rate is too low;
>
> I couldn't reproduce yet on my end but we need to either confirm a
>
> single patch to revert or revert the whole thing for another cycle if we
>
> don't figure it out in a few weeks as I don't like the idea of a stable
>
> release with this bug.
>
> * open / unlink / fstat|ftruncate etc fail
>
> https://lkml.kernel.org/r/E7D462A2-EE93-4A57-9F15-8565EE1567F3@linux.dev
>
> I didn't confirm yet but I think it's a new bug? maybe the 'fix dups
>
> even in uncached mode' patch dropping v9fs_drop_inode(); that's easy
>
> enough to test just a new bug so didn't look yet
>
> * running apt install in a VM with 9p as rootfs in default cache=none
>
> got me this warning once:
>
> ```
>
> [ 64.291867] ------------[ cut here ]------------
>
> [ 64.292458] WARNING: CPU: 0 PID: 161 at fs/inode.c:332 drop_nlink+0x2a/0x40
>
> [ 64.293380] Modules linked in: 9p netfs
>
> [ 64.293818] CPU: 0 PID: 161 Comm: dpkg Not tainted 6.9.0-rc2+ #20
>
> [ 64.294583] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
>
> [ 64.296043] RIP: 0010:drop_nlink+0x2a/0x40
>
> [ 64.296518] Code: 0f 1f 44 00 00 55 8b 47 38 48 89 e5 85 c0 74 1a 83 e8 01 89 47 38 75 0c 48 8b 47 18 3e 48 ff 80 30 04 00 00 5d c3 cc cc cc cc <0f> 0b c7 47 38 ff ff ff ff 5d c3 cc cc cc cc 0f 1f 80 00 00 00 00
>
> [ 64.298896] RSP: 0018:ffff9e3c8072be18 EFLAGS: 00010246
>
> [ 64.299440] RAX: 0000000000000000 RBX: ffff9c49414c4540 RCX: 000000000002bd46
>
> [ 64.300239] RDX: ffff9c4941bdac0c RSI: 0000000000033990 RDI: ffff9c49414e8dc0
>
> [ 64.301034] RBP: ffff9e3c8072be18 R08: 0000000000000000 R09: ffff9e3c8072bd78
>
> [ 64.302328] R10: ffff9e3c8072bd80 R11: ffff9c4942997298 R12: ffff9c49414e8b00
>
> [ 64.303610] R13: 0000000000000000 R14: ffff9c49414e8dc0 R15: 0000000000000000
>
> [ 64.304904] FS: 00007fdf1a979d00(0000) GS:ffff9c495f200000(0000) knlGS:0000000000000000
>
> [ 64.306407] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>
> [ 64.307390] CR2: 0000561961b47000 CR3: 0000000002580000 CR4: 00000000000006b0
>
> [ 64.308679] Call Trace:
>
> [ 64.308956] <TASK>
>
> [ 64.309152] ? show_regs+0x64/0x70
>
> [ 64.309644] ? __warn+0x84/0x120
>
> [ 64.310121] ? drop_nlink+0x2a/0x40
>
> [ 64.310620] ? report_bug+0x15d/0x180
>
> [ 64.311182] ? handle_bug+0x44/0x90
>
> [ 64.311684] ? exc_invalid_op+0x18/0x70
>
> [ 64.312268] ? asm_exc_invalid_op+0x1b/0x20
>
> [ 64.312928] ? drop_nlink+0x2a/0x40
>
> [ 64.313429] v9fs_remove+0x132/0x280 [9p]
>
> [ 64.314077] v9fs_vfs_unlink+0x10/0x20 [9p]
>
> [ 64.314732] vfs_unlink+0x135/0x2c0
>
> [ 64.315245] do_unlinkat+0x231/0x2b0
>
> [ 64.315764] __x64_sys_unlink+0x23/0x30
>
> [ 64.316340] do_syscall_64+0x5f/0x130
>
> [ 64.316883] entry_SYSCALL_64_after_hwframe+0x71/0x79
>
> [ 64.317734] RIP: 0033:0x7fdf1ab0ea07
>
> [ 64.318257] Code: f0 ff ff 73 01 c3 48 8b 0d f6 83 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 b8 57 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 83 0d 00 f7 d8 64 89 01 48
>
> [ 64.322093] RSP: 002b:00007fff0b13fc58 EFLAGS: 00000202 ORIG_RAX: 0000000000000057
>
> [ 64.323478] RAX: ffffffffffffffda RBX: 00005619848be5e0 RCX: 00007fdf1ab0ea07
>
> [ 64.324768] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000056198492ad40
>
> [ 64.326081] RBP: 000056198492ad40 R08: 0000000000000007 R09: 00005619848bdfb0
>
> [ 64.327371] R10: ed3dec18d54433a1 R11: 0000000000000202 R12: 00005619848bdef0
>
> [ 64.328657] R13: 0000561961b30040 R14: 0000561961b431d8 R15: 00007fdf1ac6e020
>
> [ 64.329969] </TASK>
>
> [ 64.330185] ---[ end trace 0000000000000000 ]---
>
> ```
>
> Doesn't seem to be everytime, I don't think it happened before but given
>
> I didn't look into reproducing it I couldn't say for sure.
>
> * some refcounting bug running this (creating/removing the same files many
>
> times in parallel).
>
> This doesn't seem new as I reproduce it without the 6.9 patches, I'll
>
> need to dig into it.
>
> ```
>
> mkdir tmp
>
> echo test > tmp/test
>
> for i in 1 2 3 4 5; do
>
> seq 1 1000 | while read _; do
>
> cp -a tmp copy 2>/dev/null
>
> rm -rf copy 2>/dev/null
>
> done &
>
> done
>
> wait
>
> ```
>
> I also had a qemu segfault after a similar script (copying/removing a
>
> larger tree), but looking at the code/locals I don't see how it could
>
> fail there given the locals... giving up on this for now, it happened
>
> after refcount UAF error on linux so might be caused by garbage we sent,
>
> but that's still a problem obviously)
>
> ```
>
> (gdb) bt
>
> #0 __memcmp_avx2_movbe () at ../sysdeps/x86_64/multiarch/memcmp-avx2-movbe.S:415
>
> #1 0x000064d9dd7a5716 in v9fs_mark_fids_unreclaim (pdu=pdu@entry=0x64d9e16ffe58, path=path@entry=0x74d9888c8f70) at ../hw/9pfs/9p.c:545
>
> #2 0x000064d9dd7aa7ad in v9fs_unlinkat (opaque=0x64d9e16ffe58) at ../hw/9pfs/9p.c:3189
>
> #3 0x000064d9ddd4d64b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:175
>
> #4 0x000074da302bf510 in ?? () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:90 from /nix/store/cyrrf49i2hm1w7vn2j945ic3rrzgxbqs-glibc-2.38-44/lib/libc.so.6
>
> #5 0x00007fff7b8c4e30 in ?? ()
>
> #6 0x0000000000000000 in ?? ()
>
> (gdb) p fidp->path
>
> $1 = {size = 29, data = 0x74d96c001010 "./copy/1/tests/btrfs/007.out"}
>
> (gdb) p *path
>
> $2 = {size = 29, data = 0x74da1c001e00 "./copy/4/tests/btrfs/049.out"}
>
> # failing on this line:...
>
> !memcmp(fidp->path.data, path->data, path->size)) {
>
> ```
>
> --
>
> Dominique Martinet | Asmadeus
>
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: recap of 9p problems in 6.9
2024-04-08 4:17 recap of 9p problems in 6.9 Dominique Martinet
2024-04-08 23:14 ` Eric Van Hensbergen
@ 2024-04-09 0:11 ` Eric Van Hensbergen
2024-04-09 2:07 ` Dominique Martinet
2024-04-09 2:15 ` Eric Van Hensbergen
1 sibling, 2 replies; 7+ messages in thread
From: Eric Van Hensbergen @ 2024-04-09 0:11 UTC (permalink / raw)
To: Dominique Martinet; +Cc: v9fs
April 7, 2024 at 11:17 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
>
> Hi Eric (& anyone else reading on the list),
>
> * open / unlink / fstat|ftruncate etc fail
>
> https://lkml.kernel.org/r/E7D462A2-EE93-4A57-9F15-8565EE1567F3@linux.dev
>
> I didn't confirm yet but I think it's a new bug? maybe the 'fix dups
> even in uncached mode' patch dropping v9fs_drop_inode(); that's easy
> enough to test just a new bug so didn't look yet
I was thinking about this one a bit more -- this smells like the old anonymous file bug -- but I started thinking a bit about it. fstat ultimately leads to a getattr -- in 9p classic those happen against unopen files only so it'll go looking for a transient fid which won't be there if we've unlinked. Its possible that in this scenario we should return the "cached" information for the file (which we have in the inode) if we can't find the fid via a walk (which we won't if its unlinked).
Not sure if ftruncate is that same problem or different - right now the truncate code seems to be in setattr. Need to do a bit of tracing to see if that's the path we are in. There was a truncate bug spotted in legacy 9p, but I think that's something different.
So...is this a new bug? Maybe. I thought I might have removed the "failsafe" fid lookup of open files off of an inode -- but I see that this is still there, but it only uses it if the inode is still linked to the d_entry and we've just unlinked it. I can see in the VFS code path its pulling the inode from the path (d_backing_inode(path->dentry) -- so maybe we just do the same thing for getattr if we fail to lookup the fid.
It would be really nice if we could just get to the file structure, but in both of these cases that seems to be "lost along the way" in the VFS call-chain.
-eric
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: recap of 9p problems in 6.9
2024-04-09 0:11 ` Eric Van Hensbergen
@ 2024-04-09 2:07 ` Dominique Martinet
2024-04-09 11:40 ` Eric Van Hensbergen
2024-04-09 2:15 ` Eric Van Hensbergen
1 sibling, 1 reply; 7+ messages in thread
From: Dominique Martinet @ 2024-04-09 2:07 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: v9fs
Replying to both mails at once
Eric Van Hensbergen wrote on Mon, Apr 08, 2024 at 11:14:07PM +0000:
> Thanks for looking into these Dominique - I've got the reports queued
> up and am just looking to ring fence some time to dig into them before
> I commit to a revert. The change in handling of inode matching and
> keeping inodes around in the cache longer are likely the root of all
> of these, but the old code had so much unexplained corner cases I
> really want to understand the problems before revert.
My opinion here is that there's been quite a few people reporting to be
impacted, so while I agree we need to understand the old cruft
eventually I think it's time to rollback and try harder to reproduce
the various errors we've been reported and investigate in another branch
(not even -next, as that is also used by people for non-reg -- breaking
other parts of the kernel's non-reg will just make people stop using 9p)
And when we do finally manage to reproduce this that'll make another
workload we can test regularly for future regressions, I think we need
more of these (like my parallel unlink code); everything I'm running is
either too nice (e.g. proper build without parallel write accesses
within a file) or too synthetic (fsstress etc); more "real-world" tests
can't hurt.
> The create/remove a file multiple times in parallel could be linked to
> an underlying issue with only using qid.path for match -- the old
> extended match logic also matched i_generation number -- but I
> convinced myself it wasn't necessary and it required a fresh getattr
> very time which seemed excessive. The alternative is to work this
> into a server-side fix, but if that breaks all legacy than that's
> likely unacceptable.
There might be a new problem but I definitely also hit some refcount bug
in the old code with that, so I'm thinking there's a place where the ref
isn't obtained in the right order causing a race or maybe it's missing a
ref for the dentry list or something (we properly unref when freeing the
dentry, but I'm not sure I trust all paths to properly take that ref
when adding to the dentry)
That's a hole I digged myself, I've just fixed perf (again[1]) to allow
hooking on the fid refcount tracepoint and I'll dig into this.
If there's a reproducer that should be easy to figure out even if
there's a bit of noise, will work on it over the next few weekends.
[1] https://lore.kernel.org/linux-perf-users/ZhPn0EwXf_bPBj-p@codewreck.org/T/#m8768e2b920e7111364c8f1309f09edb607d5aec5
Eric Van Hensbergen wrote on Tue, Apr 09, 2024 at 12:11:05AM +0000:
> April 7, 2024 at 11:17 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
> > * open / unlink / fstat|ftruncate etc fail
> >
> > https://lkml.kernel.org/r/E7D462A2-EE93-4A57-9F15-8565EE1567F3@linux.dev
> >
> > I didn't confirm yet but I think it's a new bug? maybe the 'fix dups
> > even in uncached mode' patch dropping v9fs_drop_inode(); that's easy
> > enough to test just a new bug so didn't look yet
>
> I was thinking about this one a bit more -- this smells like the old
> anonymous file bug -- but I started thinking a bit about it. fstat
> ultimately leads to a getattr -- in 9p classic those happen against
> unopen files only so it'll go looking for a transient fid which won't
> be there if we've unlinked. Its possible that in this scenario we
> should return the "cached" information for the file (which we have in
> the inode) if we can't find the fid via a walk (which we won't if its
> unlinked).
The file is still open so we must have a fid around; in cached mode it's
probably ok to return cached information (and a quick look makes me
think we do- we're not failing fstat here) but in cache=none we just
need to send a TGETATTR with that fid... and I just checked and we're
doing this correctly; it's qemu using fstatat with the parent directory
and file name and failing with ENOENT instead of using fstat on the
opened file, I was sure this worked before but this apparently hasn't
changed with 6.9 ; sorry for the wrong conclusion.
> Not sure if ftruncate is that same problem or different - right now
> the truncate code seems to be in setattr. Need to do a bit of tracing
> to see if that's the path we are in. There was a truncate bug spotted
> in legacy 9p, but I think that's something different.
Yeah, I think we're doing the right thing here, we're sending setattr on
the opened fid and qemu fails as well.
I checked qemu code and this doesn't seem to have changed recently, so
I'll reply to the other thread saying that probably never worked and
needs addressing on the server.
That leaves the other bus error problem that might be the same as
Kent's ultimately, but I can't figure how to hit that yet.
> It would be really nice if we could just get to the file structure,
> but in both of these cases that seems to be "lost along the way" in
> the VFS call-chain.
Yeah there are many places where we could really use the file structure
to get the fid that operation was meant to be on and we can't, but at
least the lookup from dentry here seems to work!
Sorry for the noise on this, we can take that bug more leisurely.
--
Dominique Martinet | Asmadeus
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: recap of 9p problems in 6.9
2024-04-09 2:07 ` Dominique Martinet
@ 2024-04-09 11:40 ` Eric Van Hensbergen
2024-04-09 20:57 ` Eric Van Hensbergen
0 siblings, 1 reply; 7+ messages in thread
From: Eric Van Hensbergen @ 2024-04-09 11:40 UTC (permalink / raw)
To: Dominique Martinet; +Cc: v9fs
April 8, 2024 at 9:07 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
> Replying to both mails at once
> Eric Van Hensbergen wrote on Mon, Apr 08, 2024 at 11:14:07PM +0000:
> > Thanks for looking into these Dominique - I've got the reports queued
> > up and am just looking to ring fence some time to dig into them before
> > I commit to a revert. The change in handling of inode matching and
> > keeping inodes around in the cache longer are likely the root of all
> > of these, but the old code had so much unexplained corner cases I
> > really want to understand the problems before revert.
> My opinion here is that there's been quite a few people reporting to be
> impacted, so while I agree we need to understand the old cruft
> eventually I think it's time to rollback and try harder to reproduce
> the various errors we've been reported and investigate in another branch
> (not even -next, as that is also used by people for non-reg -- breaking
> other parts of the kernel's non-reg will just make people stop using 9p)
> And when we do finally manage to reproduce this that'll make another
> workload we can test regularly for future regressions, I think we need
> more of these (like my parallel unlink code); everything I'm running is
> either too nice (e.g. proper build without parallel write accesses
> within a file) or too synthetic (fsstress etc); more "real-world" tests
> can't hurt.
>
I do agree with this, but want one more day before I give up. I'm going to work on this today and tomorrow to see if I can fix Kent and Oleg's problems, including looking at a partial revert (as you suggest, maybe just reverting the inode_drop patch would clean up most of these for the time being since everyone doesn't seem to be operating in cached mode anyways).
Sorry for diving down the rat hole last night, I should have just looked at the trace. I've added access=client to my test sweeps so I can see if Kent's problem is related to acls. fstest just takes so long to run.
-eric
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: recap of 9p problems in 6.9
2024-04-09 11:40 ` Eric Van Hensbergen
@ 2024-04-09 20:57 ` Eric Van Hensbergen
0 siblings, 0 replies; 7+ messages in thread
From: Eric Van Hensbergen @ 2024-04-09 20:57 UTC (permalink / raw)
To: Eric Van Hensbergen; +Cc: Dominique Martinet, v9fs
Also just realized that this patch dropped out of my merges -- wonder
if this would help with the nlink race conditions:
https://patchwork.kernel.org/project/v9fs-devel/patch/20240108-fix-nlink-handling-v2-1-8718b4c7199a@kernel.org/
-eric
On Tue, Apr 9, 2024 at 6:40 AM Eric Van Hensbergen
<eric.vanhensbergen@linux.dev> wrote:
>
> April 8, 2024 at 9:07 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
> > Replying to both mails at once
> > Eric Van Hensbergen wrote on Mon, Apr 08, 2024 at 11:14:07PM +0000:
> > > Thanks for looking into these Dominique - I've got the reports queued
> > > up and am just looking to ring fence some time to dig into them before
> > > I commit to a revert. The change in handling of inode matching and
> > > keeping inodes around in the cache longer are likely the root of all
> > > of these, but the old code had so much unexplained corner cases I
> > > really want to understand the problems before revert.
> > My opinion here is that there's been quite a few people reporting to be
> > impacted, so while I agree we need to understand the old cruft
> > eventually I think it's time to rollback and try harder to reproduce
> > the various errors we've been reported and investigate in another branch
> > (not even -next, as that is also used by people for non-reg -- breaking
> > other parts of the kernel's non-reg will just make people stop using 9p)
> > And when we do finally manage to reproduce this that'll make another
> > workload we can test regularly for future regressions, I think we need
> > more of these (like my parallel unlink code); everything I'm running is
> > either too nice (e.g. proper build without parallel write accesses
> > within a file) or too synthetic (fsstress etc); more "real-world" tests
> > can't hurt.
> >
>
> I do agree with this, but want one more day before I give up. I'm going to work on this today and tomorrow to see if I can fix Kent and Oleg's problems, including looking at a partial revert (as you suggest, maybe just reverting the inode_drop patch would clean up most of these for the time being since everyone doesn't seem to be operating in cached mode anyways).
>
> Sorry for diving down the rat hole last night, I should have just looked at the trace. I've added access=client to my test sweeps so I can see if Kent's problem is related to acls. fstest just takes so long to run.
>
> -eric
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: recap of 9p problems in 6.9
2024-04-09 0:11 ` Eric Van Hensbergen
2024-04-09 2:07 ` Dominique Martinet
@ 2024-04-09 2:15 ` Eric Van Hensbergen
1 sibling, 0 replies; 7+ messages in thread
From: Eric Van Hensbergen @ 2024-04-09 2:15 UTC (permalink / raw)
To: Dominique Martinet; +Cc: v9fs
April 8, 2024 at 7:11 PM, "Eric Van Hensbergen" <eric.vanhensbergen@linux.dev> wrote:
> April 7, 2024 at 11:17 PM, "Dominique Martinet" <asmadeus@codewreck.org> wrote:
> > * open / unlink / fstat|ftruncate etc fail
Okay - a lot more digging on this one. I'm wondering if these issues have to do with changes inside vfs_lookup -- there as a bit of code that was in there that I didn't quite understand. Previously we always created a new inode on lookup (unless we were in cache mode) -- this leads to a lot of duplicate inode structs floating around. One of the simplification changes was to remove this (but I left the comment in which talks about parallel unlinks) -- now the unlink in this case isn't in parallel here but I wonder if we were sidestepping this bug because unlink was essentially operating on its own inode and the open fd had a distinct inode structure that wasn't perturbed.
There's potentailly a relationship to the ACL issues Kent observed as well since there's a similar code path in create which uses a new inode and then associates ACLs with that.
I'm just still struggling why this requires a new inode structure if there is one already associated with the qid.path -- unless the inode numbers as being stored in the qid.path are being reused too quickly and there are unlinked inodes which still maintain that number in the cache??
-eric
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-04-09 20:57 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-08 4:17 recap of 9p problems in 6.9 Dominique Martinet
2024-04-08 23:14 ` Eric Van Hensbergen
2024-04-09 0:11 ` Eric Van Hensbergen
2024-04-09 2:07 ` Dominique Martinet
2024-04-09 11:40 ` Eric Van Hensbergen
2024-04-09 20:57 ` Eric Van Hensbergen
2024-04-09 2:15 ` Eric Van Hensbergen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox