* [REGRESSION] 9pfs issues on 6.12-rc1
@ 2024-10-02 17:08 Maximilian Bosch
2024-10-02 17:31 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 28+ messages in thread
From: Maximilian Bosch @ 2024-10-02 17:08 UTC (permalink / raw)
To: linux-fsdevel; +Cc: regressions
Hi!
Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot
anymore and fail like this:
mounting nix-store on /nix/.ro-store...
[ 1.604781] 9p: Installing v9fs 9p2000 file system support
mounting tmpfs on /nix/.rw-store...
mounting overlay on /nix/store...
mounting shared on /tmp/shared...
mounting xchg on /tmp/xchg...
switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error
[ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100
[ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS
[ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1.738309] Call Trace:
[ 1.738698] <TASK>
[ 1.739034] panic+0x324/0x340
[ 1.739458] do_exit+0x92e/0xa90
[ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40
[ 1.740568] ? srso_return_thunk+0x5/0x5f
[ 1.741095] ? handle_mm_fault+0xb0/0x2e0
[ 1.741709] do_group_exit+0x30/0x80
[ 1.742229] __x64_sys_exit_group+0x18/0x20
[ 1.742800] x64_sys_call+0x17f3/0x1800
[ 1.743326] do_syscall_64+0xb7/0x210
[ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 1.744530] RIP: 0033:0x7f8e1a7b9d1d
[ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e
[ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
[ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d
[ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001
[ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000
[ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0
[ 1.752583] </TASK>
[ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
The failing script here is the initrd's /init when it tries to perform a
switch_root to `/sysroot`:
exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init"
Said "$stage2Init" file consistently gets a different hash when doing
`sha256sum` on it in the initrd script, but looks & behaves correct
on the host. I reproduced the test failures on 4 different build
machines and two architectures (x86_64-linux, aarch64-linux) now.
The "$stage2Init" script is a shell-script itself. When trying to
start the interpreter from its shebang inside the initrd (via
`$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the
switch_root I get a different error:
+ exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init
[ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48)
[ 1.914237] ------------[ cut here ]------------
[ 1.915059] kernel BUG at lib/list_debug.c:62!
[ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS
[ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0
[ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac
[ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046
[ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000
[ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000
[ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90
[ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809
[ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000
[ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0
[ 1.931333] Call Trace:
[ 1.931727] <TASK>
[ 1.932088] ? die+0x36/0x90
[ 1.932595] ? do_trap+0xed/0x110
[ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0
[ 1.933757] ? do_error_trap+0x6a/0xa0
[ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0
[ 1.935073] ? exc_invalid_op+0x51/0x80
[ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0
[ 1.936326] ? asm_exc_invalid_op+0x1a/0x20
[ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0
[ 1.937622] free_pcppages_bulk+0x130/0x280
[ 1.938151] free_unref_page_commit+0x21c/0x380
[ 1.938753] free_unref_page+0x472/0x4f0
[ 1.939343] __put_partials+0xee/0x130
[ 1.939921] ? rcu_do_batch+0x1f2/0x800
[ 1.940471] kmem_cache_free+0x2c3/0x370
[ 1.940990] rcu_do_batch+0x1f2/0x800
[ 1.941508] ? rcu_do_batch+0x180/0x800
[ 1.942031] rcu_core+0x182/0x340
[ 1.942500] handle_softirqs+0xe4/0x2f0
[ 1.943034] run_ksoftirqd+0x33/0x40
[ 1.943522] smpboot_thread_fn+0xdd/0x1d0
[ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 1.944679] kthread+0xd0/0x100
[ 1.945126] ? __pfx_kthread+0x10/0x10
[ 1.945656] ret_from_fork+0x34/0x50
[ 1.946151] ? __pfx_kthread+0x10/0x10
[ 1.946680] ret_from_fork_asm+0x1a/0x30
[ 1.947269] </TASK>
[ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[ 1.952291] ---[ end trace 0000000000000000 ]---
[ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0
[ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac
[ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046
[ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000
[ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000
[ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90
[ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809
[ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000
[ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0
[ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt
[ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
On a subsequent run to verify this, it failed earlier while reading
$targetRoot/.../bash like this:
[ 1.871810] BUG: Bad page state in process cat pfn:2e74a
[ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a
[ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff)
[ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000
[ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000
[ 1.876295] page dumped because: nonzero _refcount
[ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS
[ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1.883684] Call Trace:
[ 1.884103] <TASK>
[ 1.884440] dump_stack_lvl+0x64/0x90
[ 1.884954] bad_page+0x70/0x110
[ 1.885468] __rmqueue_pcplist+0x209/0xd00
[ 1.886029] ? srso_return_thunk+0x5/0x5f
[ 1.886572] ? pdu_read+0x36/0x50 [9pnet]
[ 1.887177] get_page_from_freelist+0x2df/0x1910
[ 1.887788] ? srso_return_thunk+0x5/0x5f
[ 1.888324] ? enqueue_entity+0xce/0x510
[ 1.888881] ? srso_return_thunk+0x5/0x5f
[ 1.889415] ? pick_eevdf+0x76/0x1a0
[ 1.889970] ? update_curr+0x35/0x270
[ 1.890476] __alloc_pages_noprof+0x1a3/0x1150
[ 1.891158] ? srso_return_thunk+0x5/0x5f
[ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160
[ 1.892346] ? srso_return_thunk+0x5/0x5f
[ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0
[ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0
[ 1.894148] folio_alloc_noprof+0x5b/0xb0
[ 1.894671] page_cache_ra_unbounded+0x11f/0x200
[ 1.895270] filemap_get_pages+0x538/0x6d0
[ 1.895813] ? srso_return_thunk+0x5/0x5f
[ 1.896361] filemap_splice_read+0x136/0x320
[ 1.896948] backing_file_splice_read+0x52/0xa0
[ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay]
[ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay]
[ 1.898817] splice_direct_to_actor+0xb4/0x270
[ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10
[ 1.900103] do_splice_direct+0x77/0xd0
[ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10
[ 1.901308] do_sendfile+0x359/0x410
[ 1.901788] __x64_sys_sendfile64+0xb9/0xd0
[ 1.902370] do_syscall_64+0xb7/0x210
[ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 1.903604] RIP: 0033:0x7fa9f3a7289e
[ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db
[ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
[ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e
[ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000
[ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001
[ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000
[ 1.911998] </TASK>
[ 1.912376] Disabling lock debugging due to kernel taint
[ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8)
[ 1.914823] ------------[ cut here ]------------
[ 1.915408] kernel BUG at lib/list_debug.c:65!
[ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS
[ 1.917877] Tainted: [B]=BAD_PAGE
[ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0
[ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
[ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246
[ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000
[ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000
[ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180
[ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000
[ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0
[ 1.931444] Call Trace:
[ 1.931916] <TASK>
[ 1.932357] ? die+0x36/0x90
[ 1.932831] ? do_trap+0xed/0x110
[ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0
[ 1.934073] ? do_error_trap+0x6a/0xa0
[ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0
[ 1.935242] ? exc_invalid_op+0x51/0x80
[ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0
[ 1.936484] ? asm_exc_invalid_op+0x1a/0x20
[ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0
[ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0
[ 1.938685] __rmqueue_pcplist+0xa5/0xd00
[ 1.939292] ? srso_return_thunk+0x5/0x5f
[ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160
[ 1.940758] ? srso_return_thunk+0x5/0x5f
[ 1.941417] ? update_load_avg+0x7e/0x7f0
[ 1.942133] ? srso_return_thunk+0x5/0x5f
[ 1.942838] ? srso_return_thunk+0x5/0x5f
[ 1.943508] get_page_from_freelist+0x2df/0x1910
[ 1.944143] ? srso_return_thunk+0x5/0x5f
[ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240
[ 1.945335] ? srso_return_thunk+0x5/0x5f
[ 1.945905] __alloc_pages_noprof+0x1a3/0x1150
[ 1.946489] ? __blk_flush_plug+0xf5/0x150
[ 1.947105] ? srso_return_thunk+0x5/0x5f
[ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0
[ 1.948404] ? srso_return_thunk+0x5/0x5f
[ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160
[ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0
[ 1.950514] folio_alloc_mpol_noprof+0x14/0x40
[ 1.951153] shmem_alloc_folio+0xa7/0xd0
[ 1.951692] ? shmem_recalc_inode+0x20/0x90
[ 1.952272] shmem_alloc_and_add_folio+0x109/0x490
[ 1.952940] ? filemap_get_entry+0x10f/0x1a0
[ 1.953570] ? srso_return_thunk+0x5/0x5f
[ 1.954185] shmem_get_folio_gfp+0x248/0x610
[ 1.954791] shmem_write_begin+0x64/0x110
[ 1.955484] generic_perform_write+0xdf/0x2a0
[ 1.956239] shmem_file_write_iter+0x8a/0x90
[ 1.956882] iter_file_splice_write+0x33f/0x580
[ 1.957577] direct_splice_actor+0x54/0x140
[ 1.958178] splice_direct_to_actor+0xec/0x270
[ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10
[ 1.959442] do_splice_direct+0x77/0xd0
[ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10
[ 1.960726] do_sendfile+0x359/0x410
[ 1.961248] __x64_sys_sendfile64+0xb9/0xd0
[ 1.961905] do_syscall_64+0xb7/0x210
[ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[ 1.963211] RIP: 0033:0x7fa9f3a7289e
[ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db
[ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028
[ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e
[ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001
[ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000
[ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001
[ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000
[ 1.971214] </TASK>
[ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring
[ 1.976558] ---[ end trace 0000000000000000 ]---
[ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0
[ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f
[ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246
[ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000
[ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000
[ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
[ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180
[ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000
[ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0
[ 1.990418] note: cat[315] exited with preempt_count 2
I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which
also seems to touch part of the 9p VFS code.
Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't
meaningfully test whether a simple revert solves the problem.
The VMs get the Nix store mounted via 9p. In the store are basically all
build artifacts including the stage-2 init script of the system that is
booted into in the VM test.
The invocation basically looks like this:
qemu-system-x86_64 -cpu max \
-name machine \
-m 1024 \
-smp 1 \
-device virtio-rng-pci \
-net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \
-virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \
-virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \
-virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \
-drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \
-device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \
-netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \
-device virtio-keyboard \
-usb \
-device usb-tablet,bus=usb-bus.0 \
-kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \
-initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \
-append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \
$QEMU_OPTS
If you're using Nix, you can also reproduce this by running
nix-build nixos/tests/kernel-generic.nix -A linux_testing
on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs.
To me, this seems like a regression in rc1.
Is there anything else I can do to help troubleshooting this?
With best regards
Maximilian
^ permalink raw reply [flat|nested] 28+ messages in thread* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-02 17:08 [REGRESSION] 9pfs issues on 6.12-rc1 Maximilian Bosch @ 2024-10-02 17:31 ` Linux regression tracking (Thorsten Leemhuis) 2024-10-02 21:48 ` Maximilian Bosch 0 siblings, 1 reply; 28+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2024-10-02 17:31 UTC (permalink / raw) To: Maximilian Bosch, David Howells Cc: regressions, LKML, linux-fsdevel, Christian Brauner Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting for once, to make this easily accessible to everyone. Thx for the report. Not my area of expertise (so everyone: corrent me if I'm wrong), but I suspect your problem might be a duplicate of the following report, which was bisected to the same commit from dhowells (ee4cdf7ba857a8 ("netfs: Speed up buffered reading") [v6.12-rc1]): https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ A fix for it is already pending in the vfs.fixes branch and -next: https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. On 02.10.24 19:08, Maximilian Bosch wrote: > > Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot > anymore and fail like this: > > mounting nix-store on /nix/.ro-store... > [ 1.604781] 9p: Installing v9fs 9p2000 file system support > mounting tmpfs on /nix/.rw-store... > mounting overlay on /nix/store... > mounting shared on /tmp/shared... > mounting xchg on /tmp/xchg... > switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error > [ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 > [ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS > [ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 1.738309] Call Trace: > [ 1.738698] <TASK> > [ 1.739034] panic+0x324/0x340 > [ 1.739458] do_exit+0x92e/0xa90 > [ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40 > [ 1.740568] ? srso_return_thunk+0x5/0x5f > [ 1.741095] ? handle_mm_fault+0xb0/0x2e0 > [ 1.741709] do_group_exit+0x30/0x80 > [ 1.742229] __x64_sys_exit_group+0x18/0x20 > [ 1.742800] x64_sys_call+0x17f3/0x1800 > [ 1.743326] do_syscall_64+0xb7/0x210 > [ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 1.744530] RIP: 0033:0x7f8e1a7b9d1d > [ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e > [ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > [ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d > [ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001 > [ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > [ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > [ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0 > [ 1.752583] </TASK> > [ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > The failing script here is the initrd's /init when it tries to perform a > switch_root to `/sysroot`: > > exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init" > > Said "$stage2Init" file consistently gets a different hash when doing > `sha256sum` on it in the initrd script, but looks & behaves correct > on the host. I reproduced the test failures on 4 different build > machines and two architectures (x86_64-linux, aarch64-linux) now. > > The "$stage2Init" script is a shell-script itself. When trying to > start the interpreter from its shebang inside the initrd (via > `$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the > switch_root I get a different error: > > + exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init > [ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48) > [ 1.914237] ------------[ cut here ]------------ > [ 1.915059] kernel BUG at lib/list_debug.c:62! > [ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS > [ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > [ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > [ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > [ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > [ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > [ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > [ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > [ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > [ 1.931333] Call Trace: > [ 1.931727] <TASK> > [ 1.932088] ? die+0x36/0x90 > [ 1.932595] ? do_trap+0xed/0x110 > [ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.933757] ? do_error_trap+0x6a/0xa0 > [ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.935073] ? exc_invalid_op+0x51/0x80 > [ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.936326] ? asm_exc_invalid_op+0x1a/0x20 > [ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.937622] free_pcppages_bulk+0x130/0x280 > [ 1.938151] free_unref_page_commit+0x21c/0x380 > [ 1.938753] free_unref_page+0x472/0x4f0 > [ 1.939343] __put_partials+0xee/0x130 > [ 1.939921] ? rcu_do_batch+0x1f2/0x800 > [ 1.940471] kmem_cache_free+0x2c3/0x370 > [ 1.940990] rcu_do_batch+0x1f2/0x800 > [ 1.941508] ? rcu_do_batch+0x180/0x800 > [ 1.942031] rcu_core+0x182/0x340 > [ 1.942500] handle_softirqs+0xe4/0x2f0 > [ 1.943034] run_ksoftirqd+0x33/0x40 > [ 1.943522] smpboot_thread_fn+0xdd/0x1d0 > [ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10 > [ 1.944679] kthread+0xd0/0x100 > [ 1.945126] ? __pfx_kthread+0x10/0x10 > [ 1.945656] ret_from_fork+0x34/0x50 > [ 1.946151] ? __pfx_kthread+0x10/0x10 > [ 1.946680] ret_from_fork_asm+0x1a/0x30 > [ 1.947269] </TASK> > [ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > [ 1.952291] ---[ end trace 0000000000000000 ]--- > [ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > [ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > [ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > [ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > [ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > [ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > [ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > [ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > [ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > [ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt > [ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > On a subsequent run to verify this, it failed earlier while reading > $targetRoot/.../bash like this: > > > [ 1.871810] BUG: Bad page state in process cat pfn:2e74a > [ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a > [ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) > [ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000 > [ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000 > [ 1.876295] page dumped because: nonzero _refcount > [ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > [ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS > [ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 1.883684] Call Trace: > [ 1.884103] <TASK> > [ 1.884440] dump_stack_lvl+0x64/0x90 > [ 1.884954] bad_page+0x70/0x110 > [ 1.885468] __rmqueue_pcplist+0x209/0xd00 > [ 1.886029] ? srso_return_thunk+0x5/0x5f > [ 1.886572] ? pdu_read+0x36/0x50 [9pnet] > [ 1.887177] get_page_from_freelist+0x2df/0x1910 > [ 1.887788] ? srso_return_thunk+0x5/0x5f > [ 1.888324] ? enqueue_entity+0xce/0x510 > [ 1.888881] ? srso_return_thunk+0x5/0x5f > [ 1.889415] ? pick_eevdf+0x76/0x1a0 > [ 1.889970] ? update_curr+0x35/0x270 > [ 1.890476] __alloc_pages_noprof+0x1a3/0x1150 > [ 1.891158] ? srso_return_thunk+0x5/0x5f > [ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160 > [ 1.892346] ? srso_return_thunk+0x5/0x5f > [ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0 > [ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0 > [ 1.894148] folio_alloc_noprof+0x5b/0xb0 > [ 1.894671] page_cache_ra_unbounded+0x11f/0x200 > [ 1.895270] filemap_get_pages+0x538/0x6d0 > [ 1.895813] ? srso_return_thunk+0x5/0x5f > [ 1.896361] filemap_splice_read+0x136/0x320 > [ 1.896948] backing_file_splice_read+0x52/0xa0 > [ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay] > [ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] > [ 1.898817] splice_direct_to_actor+0xb4/0x270 > [ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10 > [ 1.900103] do_splice_direct+0x77/0xd0 > [ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10 > [ 1.901308] do_sendfile+0x359/0x410 > [ 1.901788] __x64_sys_sendfile64+0xb9/0xd0 > [ 1.902370] do_syscall_64+0xb7/0x210 > [ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 1.903604] RIP: 0033:0x7fa9f3a7289e > [ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > [ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > [ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > [ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > [ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > [ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > [ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > [ 1.911998] </TASK> > [ 1.912376] Disabling lock debugging due to kernel taint > [ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8) > [ 1.914823] ------------[ cut here ]------------ > [ 1.915408] kernel BUG at lib/list_debug.c:65! > [ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS > [ 1.917877] Tainted: [B]=BAD_PAGE > [ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > [ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > [ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > [ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > [ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > [ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > [ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > [ 1.931444] Call Trace: > [ 1.931916] <TASK> > [ 1.932357] ? die+0x36/0x90 > [ 1.932831] ? do_trap+0xed/0x110 > [ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.934073] ? do_error_trap+0x6a/0xa0 > [ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.935242] ? exc_invalid_op+0x51/0x80 > [ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.936484] ? asm_exc_invalid_op+0x1a/0x20 > [ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0 > [ 1.938685] __rmqueue_pcplist+0xa5/0xd00 > [ 1.939292] ? srso_return_thunk+0x5/0x5f > [ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160 > [ 1.940758] ? srso_return_thunk+0x5/0x5f > [ 1.941417] ? update_load_avg+0x7e/0x7f0 > [ 1.942133] ? srso_return_thunk+0x5/0x5f > [ 1.942838] ? srso_return_thunk+0x5/0x5f > [ 1.943508] get_page_from_freelist+0x2df/0x1910 > [ 1.944143] ? srso_return_thunk+0x5/0x5f > [ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240 > [ 1.945335] ? srso_return_thunk+0x5/0x5f > [ 1.945905] __alloc_pages_noprof+0x1a3/0x1150 > [ 1.946489] ? __blk_flush_plug+0xf5/0x150 > [ 1.947105] ? srso_return_thunk+0x5/0x5f > [ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0 > [ 1.948404] ? srso_return_thunk+0x5/0x5f > [ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160 > [ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0 > [ 1.950514] folio_alloc_mpol_noprof+0x14/0x40 > [ 1.951153] shmem_alloc_folio+0xa7/0xd0 > [ 1.951692] ? shmem_recalc_inode+0x20/0x90 > [ 1.952272] shmem_alloc_and_add_folio+0x109/0x490 > [ 1.952940] ? filemap_get_entry+0x10f/0x1a0 > [ 1.953570] ? srso_return_thunk+0x5/0x5f > [ 1.954185] shmem_get_folio_gfp+0x248/0x610 > [ 1.954791] shmem_write_begin+0x64/0x110 > [ 1.955484] generic_perform_write+0xdf/0x2a0 > [ 1.956239] shmem_file_write_iter+0x8a/0x90 > [ 1.956882] iter_file_splice_write+0x33f/0x580 > [ 1.957577] direct_splice_actor+0x54/0x140 > [ 1.958178] splice_direct_to_actor+0xec/0x270 > [ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10 > [ 1.959442] do_splice_direct+0x77/0xd0 > [ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10 > [ 1.960726] do_sendfile+0x359/0x410 > [ 1.961248] __x64_sys_sendfile64+0xb9/0xd0 > [ 1.961905] do_syscall_64+0xb7/0x210 > [ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f > [ 1.963211] RIP: 0033:0x7fa9f3a7289e > [ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > [ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > [ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > [ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > [ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > [ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > [ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > [ 1.971214] </TASK> > [ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > [ 1.976558] ---[ end trace 0000000000000000 ]--- > [ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > [ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > [ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > [ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > [ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > [ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > [ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > [ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > [ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > [ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > [ 1.990418] note: cat[315] exited with preempt_count 2 > > I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which > also seems to touch part of the 9p VFS code. > > Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't > meaningfully test whether a simple revert solves the problem. > > The VMs get the Nix store mounted via 9p. In the store are basically all > build artifacts including the stage-2 init script of the system that is > booted into in the VM test. > > The invocation basically looks like this: > > qemu-system-x86_64 -cpu max \ > -name machine \ > -m 1024 \ > -smp 1 \ > -device virtio-rng-pci \ > -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ > -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ > -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ > -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ > -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \ > -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \ > -netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \ > -device virtio-keyboard \ > -usb \ > -device usb-tablet,bus=usb-bus.0 \ > -kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \ > -initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \ > -append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \ > $QEMU_OPTS > > If you're using Nix, you can also reproduce this by running > > nix-build nixos/tests/kernel-generic.nix -A linux_testing > > on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs. > > To me, this seems like a regression in rc1. > > Is there anything else I can do to help troubleshooting this? > > With best regards > > Maximilian > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-02 17:31 ` Linux regression tracking (Thorsten Leemhuis) @ 2024-10-02 21:48 ` Maximilian Bosch 2024-10-03 1:12 ` Sedat Dilek 0 siblings, 1 reply; 28+ messages in thread From: Maximilian Bosch @ 2024-10-02 21:48 UTC (permalink / raw) To: Linux regressions mailing list, David Howells Cc: LKML, linux-fsdevel, Christian Brauner Good evening, thanks a lot for the quick reply! > A fix for it is already pending in the vfs.fixes branch and -next: > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ I applied the patch on top of Linux 6.12-rc1 locally and I can confirm that this resolves the issue, thanks! With best regards Maximilian On Wed Oct 2, 2024 at 7:31 PM CEST, Linux regression tracking (Thorsten Leemhuis) wrote: > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > for once, to make this easily accessible to everyone. > > Thx for the report. Not my area of expertise (so everyone: corrent me if > I'm wrong), but I suspect your problem might be a duplicate of the > following report, which was bisected to the same commit from dhowells > (ee4cdf7ba857a8 ("netfs: Speed up buffered reading") [v6.12-rc1]): > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > > A fix for it is already pending in the vfs.fixes branch and -next: > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > On 02.10.24 19:08, Maximilian Bosch wrote: > > > > Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot > > anymore and fail like this: > > > mounting nix-store on /nix/.ro-store... > > [ 1.604781] 9p: Installing v9fs 9p2000 file system support > > mounting tmpfs on /nix/.rw-store... > > mounting overlay on /nix/store... > > mounting shared on /tmp/shared... > > mounting xchg on /tmp/xchg... > > switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error > > [ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 > > [ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS > > [ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [ 1.738309] Call Trace: > > [ 1.738698] <TASK> > > [ 1.739034] panic+0x324/0x340 > > [ 1.739458] do_exit+0x92e/0xa90 > > [ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40 > > [ 1.740568] ? srso_return_thunk+0x5/0x5f > > [ 1.741095] ? handle_mm_fault+0xb0/0x2e0 > > [ 1.741709] do_group_exit+0x30/0x80 > > [ 1.742229] __x64_sys_exit_group+0x18/0x20 > > [ 1.742800] x64_sys_call+0x17f3/0x1800 > > [ 1.743326] do_syscall_64+0xb7/0x210 > > [ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 1.744530] RIP: 0033:0x7f8e1a7b9d1d > > [ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e > > [ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > [ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d > > [ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001 > > [ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > [ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0 > > [ 1.752583] </TASK> > > [ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > The failing script here is the initrd's /init when it tries to perform a > > switch_root to `/sysroot`: > > > > exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init" > > > > Said "$stage2Init" file consistently gets a different hash when doing > > `sha256sum` on it in the initrd script, but looks & behaves correct > > on the host. I reproduced the test failures on 4 different build > > machines and two architectures (x86_64-linux, aarch64-linux) now. > > > > The "$stage2Init" script is a shell-script itself. When trying to > > start the interpreter from its shebang inside the initrd (via > > `$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the > > switch_root I get a different error: > > > > + exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init > > [ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48) > > [ 1.914237] ------------[ cut here ]------------ > > [ 1.915059] kernel BUG at lib/list_debug.c:62! > > [ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > [ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS > > [ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > [ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > [ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > [ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > [ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > [ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > [ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > [ 1.931333] Call Trace: > > [ 1.931727] <TASK> > > [ 1.932088] ? die+0x36/0x90 > > [ 1.932595] ? do_trap+0xed/0x110 > > [ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.933757] ? do_error_trap+0x6a/0xa0 > > [ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.935073] ? exc_invalid_op+0x51/0x80 > > [ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.936326] ? asm_exc_invalid_op+0x1a/0x20 > > [ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.937622] free_pcppages_bulk+0x130/0x280 > > [ 1.938151] free_unref_page_commit+0x21c/0x380 > > [ 1.938753] free_unref_page+0x472/0x4f0 > > [ 1.939343] __put_partials+0xee/0x130 > > [ 1.939921] ? rcu_do_batch+0x1f2/0x800 > > [ 1.940471] kmem_cache_free+0x2c3/0x370 > > [ 1.940990] rcu_do_batch+0x1f2/0x800 > > [ 1.941508] ? rcu_do_batch+0x180/0x800 > > [ 1.942031] rcu_core+0x182/0x340 > > [ 1.942500] handle_softirqs+0xe4/0x2f0 > > [ 1.943034] run_ksoftirqd+0x33/0x40 > > [ 1.943522] smpboot_thread_fn+0xdd/0x1d0 > > [ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10 > > [ 1.944679] kthread+0xd0/0x100 > > [ 1.945126] ? __pfx_kthread+0x10/0x10 > > [ 1.945656] ret_from_fork+0x34/0x50 > > [ 1.946151] ? __pfx_kthread+0x10/0x10 > > [ 1.946680] ret_from_fork_asm+0x1a/0x30 > > [ 1.947269] </TASK> > > [ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > [ 1.952291] ---[ end trace 0000000000000000 ]--- > > [ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > [ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > [ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > [ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > [ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > [ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > [ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > [ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > [ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt > > [ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > On a subsequent run to verify this, it failed earlier while reading > > $targetRoot/.../bash like this: > > > > > > [ 1.871810] BUG: Bad page state in process cat pfn:2e74a > > [ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a > > [ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) > > [ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000 > > [ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000 > > [ 1.876295] page dumped because: nonzero _refcount > > [ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > [ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS > > [ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [ 1.883684] Call Trace: > > [ 1.884103] <TASK> > > [ 1.884440] dump_stack_lvl+0x64/0x90 > > [ 1.884954] bad_page+0x70/0x110 > > [ 1.885468] __rmqueue_pcplist+0x209/0xd00 > > [ 1.886029] ? srso_return_thunk+0x5/0x5f > > [ 1.886572] ? pdu_read+0x36/0x50 [9pnet] > > [ 1.887177] get_page_from_freelist+0x2df/0x1910 > > [ 1.887788] ? srso_return_thunk+0x5/0x5f > > [ 1.888324] ? enqueue_entity+0xce/0x510 > > [ 1.888881] ? srso_return_thunk+0x5/0x5f > > [ 1.889415] ? pick_eevdf+0x76/0x1a0 > > [ 1.889970] ? update_curr+0x35/0x270 > > [ 1.890476] __alloc_pages_noprof+0x1a3/0x1150 > > [ 1.891158] ? srso_return_thunk+0x5/0x5f > > [ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160 > > [ 1.892346] ? srso_return_thunk+0x5/0x5f > > [ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0 > > [ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0 > > [ 1.894148] folio_alloc_noprof+0x5b/0xb0 > > [ 1.894671] page_cache_ra_unbounded+0x11f/0x200 > > [ 1.895270] filemap_get_pages+0x538/0x6d0 > > [ 1.895813] ? srso_return_thunk+0x5/0x5f > > [ 1.896361] filemap_splice_read+0x136/0x320 > > [ 1.896948] backing_file_splice_read+0x52/0xa0 > > [ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay] > > [ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] > > [ 1.898817] splice_direct_to_actor+0xb4/0x270 > > [ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10 > > [ 1.900103] do_splice_direct+0x77/0xd0 > > [ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10 > > [ 1.901308] do_sendfile+0x359/0x410 > > [ 1.901788] __x64_sys_sendfile64+0xb9/0xd0 > > [ 1.902370] do_syscall_64+0xb7/0x210 > > [ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 1.903604] RIP: 0033:0x7fa9f3a7289e > > [ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > [ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > [ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > [ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > [ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > [ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > [ 1.911998] </TASK> > > [ 1.912376] Disabling lock debugging due to kernel taint > > [ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8) > > [ 1.914823] ------------[ cut here ]------------ > > [ 1.915408] kernel BUG at lib/list_debug.c:65! > > [ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > [ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS > > [ 1.917877] Tainted: [B]=BAD_PAGE > > [ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > [ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > [ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > [ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > [ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > [ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > [ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > [ 1.931444] Call Trace: > > [ 1.931916] <TASK> > > [ 1.932357] ? die+0x36/0x90 > > [ 1.932831] ? do_trap+0xed/0x110 > > [ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.934073] ? do_error_trap+0x6a/0xa0 > > [ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.935242] ? exc_invalid_op+0x51/0x80 > > [ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.936484] ? asm_exc_invalid_op+0x1a/0x20 > > [ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0 > > [ 1.938685] __rmqueue_pcplist+0xa5/0xd00 > > [ 1.939292] ? srso_return_thunk+0x5/0x5f > > [ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160 > > [ 1.940758] ? srso_return_thunk+0x5/0x5f > > [ 1.941417] ? update_load_avg+0x7e/0x7f0 > > [ 1.942133] ? srso_return_thunk+0x5/0x5f > > [ 1.942838] ? srso_return_thunk+0x5/0x5f > > [ 1.943508] get_page_from_freelist+0x2df/0x1910 > > [ 1.944143] ? srso_return_thunk+0x5/0x5f > > [ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240 > > [ 1.945335] ? srso_return_thunk+0x5/0x5f > > [ 1.945905] __alloc_pages_noprof+0x1a3/0x1150 > > [ 1.946489] ? __blk_flush_plug+0xf5/0x150 > > [ 1.947105] ? srso_return_thunk+0x5/0x5f > > [ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0 > > [ 1.948404] ? srso_return_thunk+0x5/0x5f > > [ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160 > > [ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0 > > [ 1.950514] folio_alloc_mpol_noprof+0x14/0x40 > > [ 1.951153] shmem_alloc_folio+0xa7/0xd0 > > [ 1.951692] ? shmem_recalc_inode+0x20/0x90 > > [ 1.952272] shmem_alloc_and_add_folio+0x109/0x490 > > [ 1.952940] ? filemap_get_entry+0x10f/0x1a0 > > [ 1.953570] ? srso_return_thunk+0x5/0x5f > > [ 1.954185] shmem_get_folio_gfp+0x248/0x610 > > [ 1.954791] shmem_write_begin+0x64/0x110 > > [ 1.955484] generic_perform_write+0xdf/0x2a0 > > [ 1.956239] shmem_file_write_iter+0x8a/0x90 > > [ 1.956882] iter_file_splice_write+0x33f/0x580 > > [ 1.957577] direct_splice_actor+0x54/0x140 > > [ 1.958178] splice_direct_to_actor+0xec/0x270 > > [ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10 > > [ 1.959442] do_splice_direct+0x77/0xd0 > > [ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10 > > [ 1.960726] do_sendfile+0x359/0x410 > > [ 1.961248] __x64_sys_sendfile64+0xb9/0xd0 > > [ 1.961905] do_syscall_64+0xb7/0x210 > > [ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > [ 1.963211] RIP: 0033:0x7fa9f3a7289e > > [ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > [ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > [ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > [ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > [ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > [ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > [ 1.971214] </TASK> > > [ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > [ 1.976558] ---[ end trace 0000000000000000 ]--- > > [ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > [ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > [ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > [ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > [ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > [ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > [ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > [ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > [ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > [ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > [ 1.990418] note: cat[315] exited with preempt_count 2 > > > > I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which > > also seems to touch part of the 9p VFS code. > > > > Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't > > meaningfully test whether a simple revert solves the problem. > > > > The VMs get the Nix store mounted via 9p. In the store are basically all > > build artifacts including the stage-2 init script of the system that is > > booted into in the VM test. > > > > The invocation basically looks like this: > > > > qemu-system-x86_64 -cpu max \ > > -name machine \ > > -m 1024 \ > > -smp 1 \ > > -device virtio-rng-pci \ > > -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ > > -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ > > -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ > > -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ > > -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \ > > -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \ > > -netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \ > > -device virtio-keyboard \ > > -usb \ > > -device usb-tablet,bus=usb-bus.0 \ > > -kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \ > > -initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \ > > -append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \ > > $QEMU_OPTS > > > > If you're using Nix, you can also reproduce this by running > > > > nix-build nixos/tests/kernel-generic.nix -A linux_testing > > > > on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs. > > > > To me, this seems like a regression in rc1. > > > > Is there anything else I can do to help troubleshooting this? > > > > With best regards > > > > Maximilian > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-02 21:48 ` Maximilian Bosch @ 2024-10-03 1:12 ` Sedat Dilek 2024-10-17 18:00 ` Antony Antony 0 siblings, 1 reply; 28+ messages in thread From: Sedat Dilek @ 2024-10-03 1:12 UTC (permalink / raw) To: Maximilian Bosch Cc: Linux regressions mailing list, David Howells, LKML, linux-fsdevel, Christian Brauner On Wed, Oct 2, 2024 at 11:58 PM Maximilian Bosch <maximilian@mbosch.me> wrote: > > Good evening, > > thanks a lot for the quick reply! > > > A fix for it is already pending in the vfs.fixes branch and -next: > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > I applied the patch on top of Linux 6.12-rc1 locally and I can confirm > that this resolves the issue, thanks! > > With best regards > > Maximilian > Thanks for testing. For the records: iov_iter: fix advancing slot in iter_folioq_get_pages() https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.fixes&id=0d24852bd71ec85ca0016b6d6fc997e6a3381552 https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.fixes > > On Wed Oct 2, 2024 at 7:31 PM CEST, Linux regression tracking (Thorsten Leemhuis) wrote: > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > > for once, to make this easily accessible to everyone. > > > > Thx for the report. Not my area of expertise (so everyone: corrent me if > > I'm wrong), but I suspect your problem might be a duplicate of the > > following report, which was bisected to the same commit from dhowells > > (ee4cdf7ba857a8 ("netfs: Speed up buffered reading") [v6.12-rc1]): > > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > -- > > Everything you wanna know about Linux kernel regression tracking: > > https://linux-regtracking.leemhuis.info/about/#tldr > > If I did something stupid, please tell me, as explained on that page. > > > > On 02.10.24 19:08, Maximilian Bosch wrote: > > > > > > Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot > > > anymore and fail like this: > > > > mounting nix-store on /nix/.ro-store... > > > [ 1.604781] 9p: Installing v9fs 9p2000 file system support > > > mounting tmpfs on /nix/.rw-store... > > > mounting overlay on /nix/store... > > > mounting shared on /tmp/shared... > > > mounting xchg on /tmp/xchg... > > > switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error > > > [ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 > > > [ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS > > > [ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > [ 1.738309] Call Trace: > > > [ 1.738698] <TASK> > > > [ 1.739034] panic+0x324/0x340 > > > [ 1.739458] do_exit+0x92e/0xa90 > > > [ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40 > > > [ 1.740568] ? srso_return_thunk+0x5/0x5f > > > [ 1.741095] ? handle_mm_fault+0xb0/0x2e0 > > > [ 1.741709] do_group_exit+0x30/0x80 > > > [ 1.742229] __x64_sys_exit_group+0x18/0x20 > > > [ 1.742800] x64_sys_call+0x17f3/0x1800 > > > [ 1.743326] do_syscall_64+0xb7/0x210 > > > [ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 1.744530] RIP: 0033:0x7f8e1a7b9d1d > > > [ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e > > > [ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > [ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d > > > [ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001 > > > [ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > [ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0 > > > [ 1.752583] </TASK> > > > [ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > The failing script here is the initrd's /init when it tries to perform a > > > switch_root to `/sysroot`: > > > > > > exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init" > > > > > > Said "$stage2Init" file consistently gets a different hash when doing > > > `sha256sum` on it in the initrd script, but looks & behaves correct > > > on the host. I reproduced the test failures on 4 different build > > > machines and two architectures (x86_64-linux, aarch64-linux) now. > > > > > > The "$stage2Init" script is a shell-script itself. When trying to > > > start the interpreter from its shebang inside the initrd (via > > > `$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the > > > switch_root I get a different error: > > > > > > + exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init > > > [ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48) > > > [ 1.914237] ------------[ cut here ]------------ > > > [ 1.915059] kernel BUG at lib/list_debug.c:62! > > > [ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > [ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS > > > [ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > [ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > [ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > [ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > [ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > [ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > [ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > [ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > [ 1.931333] Call Trace: > > > [ 1.931727] <TASK> > > > [ 1.932088] ? die+0x36/0x90 > > > [ 1.932595] ? do_trap+0xed/0x110 > > > [ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.933757] ? do_error_trap+0x6a/0xa0 > > > [ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.935073] ? exc_invalid_op+0x51/0x80 > > > [ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.936326] ? asm_exc_invalid_op+0x1a/0x20 > > > [ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.937622] free_pcppages_bulk+0x130/0x280 > > > [ 1.938151] free_unref_page_commit+0x21c/0x380 > > > [ 1.938753] free_unref_page+0x472/0x4f0 > > > [ 1.939343] __put_partials+0xee/0x130 > > > [ 1.939921] ? rcu_do_batch+0x1f2/0x800 > > > [ 1.940471] kmem_cache_free+0x2c3/0x370 > > > [ 1.940990] rcu_do_batch+0x1f2/0x800 > > > [ 1.941508] ? rcu_do_batch+0x180/0x800 > > > [ 1.942031] rcu_core+0x182/0x340 > > > [ 1.942500] handle_softirqs+0xe4/0x2f0 > > > [ 1.943034] run_ksoftirqd+0x33/0x40 > > > [ 1.943522] smpboot_thread_fn+0xdd/0x1d0 > > > [ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > [ 1.944679] kthread+0xd0/0x100 > > > [ 1.945126] ? __pfx_kthread+0x10/0x10 > > > [ 1.945656] ret_from_fork+0x34/0x50 > > > [ 1.946151] ? __pfx_kthread+0x10/0x10 > > > [ 1.946680] ret_from_fork_asm+0x1a/0x30 > > > [ 1.947269] </TASK> > > > [ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > [ 1.952291] ---[ end trace 0000000000000000 ]--- > > > [ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > [ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > [ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > [ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > [ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > [ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > [ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > [ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > [ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt > > > [ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > On a subsequent run to verify this, it failed earlier while reading > > > $targetRoot/.../bash like this: > > > > > > > > > [ 1.871810] BUG: Bad page state in process cat pfn:2e74a > > > [ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a > > > [ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) > > > [ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000 > > > [ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000 > > > [ 1.876295] page dumped because: nonzero _refcount > > > [ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > [ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS > > > [ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > [ 1.883684] Call Trace: > > > [ 1.884103] <TASK> > > > [ 1.884440] dump_stack_lvl+0x64/0x90 > > > [ 1.884954] bad_page+0x70/0x110 > > > [ 1.885468] __rmqueue_pcplist+0x209/0xd00 > > > [ 1.886029] ? srso_return_thunk+0x5/0x5f > > > [ 1.886572] ? pdu_read+0x36/0x50 [9pnet] > > > [ 1.887177] get_page_from_freelist+0x2df/0x1910 > > > [ 1.887788] ? srso_return_thunk+0x5/0x5f > > > [ 1.888324] ? enqueue_entity+0xce/0x510 > > > [ 1.888881] ? srso_return_thunk+0x5/0x5f > > > [ 1.889415] ? pick_eevdf+0x76/0x1a0 > > > [ 1.889970] ? update_curr+0x35/0x270 > > > [ 1.890476] __alloc_pages_noprof+0x1a3/0x1150 > > > [ 1.891158] ? srso_return_thunk+0x5/0x5f > > > [ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > [ 1.892346] ? srso_return_thunk+0x5/0x5f > > > [ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0 > > > [ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > [ 1.894148] folio_alloc_noprof+0x5b/0xb0 > > > [ 1.894671] page_cache_ra_unbounded+0x11f/0x200 > > > [ 1.895270] filemap_get_pages+0x538/0x6d0 > > > [ 1.895813] ? srso_return_thunk+0x5/0x5f > > > [ 1.896361] filemap_splice_read+0x136/0x320 > > > [ 1.896948] backing_file_splice_read+0x52/0xa0 > > > [ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay] > > > [ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] > > > [ 1.898817] splice_direct_to_actor+0xb4/0x270 > > > [ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10 > > > [ 1.900103] do_splice_direct+0x77/0xd0 > > > [ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > [ 1.901308] do_sendfile+0x359/0x410 > > > [ 1.901788] __x64_sys_sendfile64+0xb9/0xd0 > > > [ 1.902370] do_syscall_64+0xb7/0x210 > > > [ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 1.903604] RIP: 0033:0x7fa9f3a7289e > > > [ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > [ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > [ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > [ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > [ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > [ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > [ 1.911998] </TASK> > > > [ 1.912376] Disabling lock debugging due to kernel taint > > > [ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8) > > > [ 1.914823] ------------[ cut here ]------------ > > > [ 1.915408] kernel BUG at lib/list_debug.c:65! > > > [ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > [ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS > > > [ 1.917877] Tainted: [B]=BAD_PAGE > > > [ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > [ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > [ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > [ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > [ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > [ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > [ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > [ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > [ 1.931444] Call Trace: > > > [ 1.931916] <TASK> > > > [ 1.932357] ? die+0x36/0x90 > > > [ 1.932831] ? do_trap+0xed/0x110 > > > [ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.934073] ? do_error_trap+0x6a/0xa0 > > > [ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.935242] ? exc_invalid_op+0x51/0x80 > > > [ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.936484] ? asm_exc_invalid_op+0x1a/0x20 > > > [ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0 > > > [ 1.938685] __rmqueue_pcplist+0xa5/0xd00 > > > [ 1.939292] ? srso_return_thunk+0x5/0x5f > > > [ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > [ 1.940758] ? srso_return_thunk+0x5/0x5f > > > [ 1.941417] ? update_load_avg+0x7e/0x7f0 > > > [ 1.942133] ? srso_return_thunk+0x5/0x5f > > > [ 1.942838] ? srso_return_thunk+0x5/0x5f > > > [ 1.943508] get_page_from_freelist+0x2df/0x1910 > > > [ 1.944143] ? srso_return_thunk+0x5/0x5f > > > [ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240 > > > [ 1.945335] ? srso_return_thunk+0x5/0x5f > > > [ 1.945905] __alloc_pages_noprof+0x1a3/0x1150 > > > [ 1.946489] ? __blk_flush_plug+0xf5/0x150 > > > [ 1.947105] ? srso_return_thunk+0x5/0x5f > > > [ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0 > > > [ 1.948404] ? srso_return_thunk+0x5/0x5f > > > [ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > [ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > [ 1.950514] folio_alloc_mpol_noprof+0x14/0x40 > > > [ 1.951153] shmem_alloc_folio+0xa7/0xd0 > > > [ 1.951692] ? shmem_recalc_inode+0x20/0x90 > > > [ 1.952272] shmem_alloc_and_add_folio+0x109/0x490 > > > [ 1.952940] ? filemap_get_entry+0x10f/0x1a0 > > > [ 1.953570] ? srso_return_thunk+0x5/0x5f > > > [ 1.954185] shmem_get_folio_gfp+0x248/0x610 > > > [ 1.954791] shmem_write_begin+0x64/0x110 > > > [ 1.955484] generic_perform_write+0xdf/0x2a0 > > > [ 1.956239] shmem_file_write_iter+0x8a/0x90 > > > [ 1.956882] iter_file_splice_write+0x33f/0x580 > > > [ 1.957577] direct_splice_actor+0x54/0x140 > > > [ 1.958178] splice_direct_to_actor+0xec/0x270 > > > [ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10 > > > [ 1.959442] do_splice_direct+0x77/0xd0 > > > [ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > [ 1.960726] do_sendfile+0x359/0x410 > > > [ 1.961248] __x64_sys_sendfile64+0xb9/0xd0 > > > [ 1.961905] do_syscall_64+0xb7/0x210 > > > [ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > [ 1.963211] RIP: 0033:0x7fa9f3a7289e > > > [ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > [ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > [ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > [ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > [ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > [ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > [ 1.971214] </TASK> > > > [ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > [ 1.976558] ---[ end trace 0000000000000000 ]--- > > > [ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > [ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > [ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > [ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > [ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > [ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > [ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > [ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > [ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > [ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > [ 1.990418] note: cat[315] exited with preempt_count 2 > > > > > > I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which > > > also seems to touch part of the 9p VFS code. > > > > > > Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't > > > meaningfully test whether a simple revert solves the problem. > > > > > > The VMs get the Nix store mounted via 9p. In the store are basically all > > > build artifacts including the stage-2 init script of the system that is > > > booted into in the VM test. > > > > > > The invocation basically looks like this: > > > > > > qemu-system-x86_64 -cpu max \ > > > -name machine \ > > > -m 1024 \ > > > -smp 1 \ > > > -device virtio-rng-pci \ > > > -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ > > > -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ > > > -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ > > > -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ > > > -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \ > > > -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \ > > > -netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \ > > > -device virtio-keyboard \ > > > -usb \ > > > -device usb-tablet,bus=usb-bus.0 \ > > > -kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \ > > > -initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \ > > > -append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \ > > > $QEMU_OPTS > > > > > > If you're using Nix, you can also reproduce this by running > > > > > > nix-build nixos/tests/kernel-generic.nix -A linux_testing > > > > > > on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs. > > > > > > To me, this seems like a regression in rc1. > > > > > > Is there anything else I can do to help troubleshooting this? > > > > > > With best regards > > > > > > Maximilian > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-03 1:12 ` Sedat Dilek @ 2024-10-17 18:00 ` Antony Antony 2024-10-21 13:23 ` Christian Brauner ` (5 more replies) 0 siblings, 6 replies; 28+ messages in thread From: Antony Antony @ 2024-10-17 18:00 UTC (permalink / raw) To: Sedat Dilek, Maximilian Bosch Cc: Linux regressions mailing list, David Howells, LKML, linux-fsdevel, Christian Brauner Hi, On Thu, Oct 03, 2024 at 03:12:15AM +0200, Sedat Dilek wrote: > On Wed, Oct 2, 2024 at 11:58 PM Maximilian Bosch <maximilian@mbosch.me> wrote: > > > > Good evening, > > > > thanks a lot for the quick reply! > > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > I applied the patch on top of Linux 6.12-rc1 locally and I can confirm > > that this resolves the issue, thanks! Maximilian, would you like to re-run the test a few times? I wonder if there is another intermittend bug related to the same commit. > > > > With best regards > > > > Maximilian > > > > Thanks for testing. > > For the records: > > iov_iter: fix advancing slot in iter_folioq_get_pages() > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.fixes&id=0d24852bd71ec85ca0016b6d6fc997e6a3381552 > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.fixes I’m still seeing a kernel oops after the fix in 6.12-rc3, but I’ve noticed that the issue is no longer 100% reproducible. Most of the time, the system crashes. Before this fix it was 100% reproducible. When using the nix testing, I have to force the test to re-run. result=$(readlink -f ./result); rm ./result && nix-store --delete $result nix-build -v nixos/tests/kernel-generic.nix -A linux_testing So may be there is a new bug showing up after the fix. I have reported it. https://lore.kernel.org/regressions/ZxFEi1Tod43pD6JC@moon.secunet.de/T/#u -antony > > > > > On Wed Oct 2, 2024 at 7:31 PM CEST, Linux regression tracking (Thorsten Leemhuis) wrote: > > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > > > for once, to make this easily accessible to everyone. > > > > > > Thx for the report. Not my area of expertise (so everyone: corrent me if > > > I'm wrong), but I suspect your problem might be a duplicate of the > > > following report, which was bisected to the same commit from dhowells > > > (ee4cdf7ba857a8 ("netfs: Speed up buffered reading") [v6.12-rc1]): > > > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > > > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > > -- > > > Everything you wanna know about Linux kernel regression tracking: > > > https://linux-regtracking.leemhuis.info/about/#tldr > > > If I did something stupid, please tell me, as explained on that page. > > > > > > On 02.10.24 19:08, Maximilian Bosch wrote: > > > > > > > > Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot > > > > anymore and fail like this: > > > > > mounting nix-store on /nix/.ro-store... > > > > [ 1.604781] 9p: Installing v9fs 9p2000 file system support > > > > mounting tmpfs on /nix/.rw-store... > > > > mounting overlay on /nix/store... > > > > mounting shared on /tmp/shared... > > > > mounting xchg on /tmp/xchg... > > > > switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error > > > > [ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 > > > > [ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS > > > > [ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > [ 1.738309] Call Trace: > > > > [ 1.738698] <TASK> > > > > [ 1.739034] panic+0x324/0x340 > > > > [ 1.739458] do_exit+0x92e/0xa90 > > > > [ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40 > > > > [ 1.740568] ? srso_return_thunk+0x5/0x5f > > > > [ 1.741095] ? handle_mm_fault+0xb0/0x2e0 > > > > [ 1.741709] do_group_exit+0x30/0x80 > > > > [ 1.742229] __x64_sys_exit_group+0x18/0x20 > > > > [ 1.742800] x64_sys_call+0x17f3/0x1800 > > > > [ 1.743326] do_syscall_64+0xb7/0x210 > > > > [ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 1.744530] RIP: 0033:0x7f8e1a7b9d1d > > > > [ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e > > > > [ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > > [ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d > > > > [ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001 > > > > [ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > [ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0 > > > > [ 1.752583] </TASK> > > > > [ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > > > The failing script here is the initrd's /init when it tries to perform a > > > > switch_root to `/sysroot`: > > > > > > > > exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init" > > > > > > > > Said "$stage2Init" file consistently gets a different hash when doing > > > > `sha256sum` on it in the initrd script, but looks & behaves correct > > > > on the host. I reproduced the test failures on 4 different build > > > > machines and two architectures (x86_64-linux, aarch64-linux) now. > > > > > > > > The "$stage2Init" script is a shell-script itself. When trying to > > > > start the interpreter from its shebang inside the initrd (via > > > > `$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the > > > > switch_root I get a different error: > > > > > > > > + exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init > > > > [ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48) > > > > [ 1.914237] ------------[ cut here ]------------ > > > > [ 1.915059] kernel BUG at lib/list_debug.c:62! > > > > [ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > [ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS > > > > [ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > [ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > > [ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > > [ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > > [ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > > [ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > > [ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > > [ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > > [ 1.931333] Call Trace: > > > > [ 1.931727] <TASK> > > > > [ 1.932088] ? die+0x36/0x90 > > > > [ 1.932595] ? do_trap+0xed/0x110 > > > > [ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.933757] ? do_error_trap+0x6a/0xa0 > > > > [ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.935073] ? exc_invalid_op+0x51/0x80 > > > > [ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.936326] ? asm_exc_invalid_op+0x1a/0x20 > > > > [ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.937622] free_pcppages_bulk+0x130/0x280 > > > > [ 1.938151] free_unref_page_commit+0x21c/0x380 > > > > [ 1.938753] free_unref_page+0x472/0x4f0 > > > > [ 1.939343] __put_partials+0xee/0x130 > > > > [ 1.939921] ? rcu_do_batch+0x1f2/0x800 > > > > [ 1.940471] kmem_cache_free+0x2c3/0x370 > > > > [ 1.940990] rcu_do_batch+0x1f2/0x800 > > > > [ 1.941508] ? rcu_do_batch+0x180/0x800 > > > > [ 1.942031] rcu_core+0x182/0x340 > > > > [ 1.942500] handle_softirqs+0xe4/0x2f0 > > > > [ 1.943034] run_ksoftirqd+0x33/0x40 > > > > [ 1.943522] smpboot_thread_fn+0xdd/0x1d0 > > > > [ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > > [ 1.944679] kthread+0xd0/0x100 > > > > [ 1.945126] ? __pfx_kthread+0x10/0x10 > > > > [ 1.945656] ret_from_fork+0x34/0x50 > > > > [ 1.946151] ? __pfx_kthread+0x10/0x10 > > > > [ 1.946680] ret_from_fork_asm+0x1a/0x30 > > > > [ 1.947269] </TASK> > > > > [ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > [ 1.952291] ---[ end trace 0000000000000000 ]--- > > > > [ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > > [ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > > [ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > > [ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > > [ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > > [ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > > [ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > > [ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > > [ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt > > > > [ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > > > On a subsequent run to verify this, it failed earlier while reading > > > > $targetRoot/.../bash like this: > > > > > > > > > > > > [ 1.871810] BUG: Bad page state in process cat pfn:2e74a > > > > [ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a > > > > [ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) > > > > [ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000 > > > > [ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000 > > > > [ 1.876295] page dumped because: nonzero _refcount > > > > [ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > [ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS > > > > [ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > [ 1.883684] Call Trace: > > > > [ 1.884103] <TASK> > > > > [ 1.884440] dump_stack_lvl+0x64/0x90 > > > > [ 1.884954] bad_page+0x70/0x110 > > > > [ 1.885468] __rmqueue_pcplist+0x209/0xd00 > > > > [ 1.886029] ? srso_return_thunk+0x5/0x5f > > > > [ 1.886572] ? pdu_read+0x36/0x50 [9pnet] > > > > [ 1.887177] get_page_from_freelist+0x2df/0x1910 > > > > [ 1.887788] ? srso_return_thunk+0x5/0x5f > > > > [ 1.888324] ? enqueue_entity+0xce/0x510 > > > > [ 1.888881] ? srso_return_thunk+0x5/0x5f > > > > [ 1.889415] ? pick_eevdf+0x76/0x1a0 > > > > [ 1.889970] ? update_curr+0x35/0x270 > > > > [ 1.890476] __alloc_pages_noprof+0x1a3/0x1150 > > > > [ 1.891158] ? srso_return_thunk+0x5/0x5f > > > > [ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > [ 1.892346] ? srso_return_thunk+0x5/0x5f > > > > [ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0 > > > > [ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > > [ 1.894148] folio_alloc_noprof+0x5b/0xb0 > > > > [ 1.894671] page_cache_ra_unbounded+0x11f/0x200 > > > > [ 1.895270] filemap_get_pages+0x538/0x6d0 > > > > [ 1.895813] ? srso_return_thunk+0x5/0x5f > > > > [ 1.896361] filemap_splice_read+0x136/0x320 > > > > [ 1.896948] backing_file_splice_read+0x52/0xa0 > > > > [ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay] > > > > [ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] > > > > [ 1.898817] splice_direct_to_actor+0xb4/0x270 > > > > [ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10 > > > > [ 1.900103] do_splice_direct+0x77/0xd0 > > > > [ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > > [ 1.901308] do_sendfile+0x359/0x410 > > > > [ 1.901788] __x64_sys_sendfile64+0xb9/0xd0 > > > > [ 1.902370] do_syscall_64+0xb7/0x210 > > > > [ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 1.903604] RIP: 0033:0x7fa9f3a7289e > > > > [ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > > [ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > > [ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > > [ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > > [ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > > [ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > > [ 1.911998] </TASK> > > > > [ 1.912376] Disabling lock debugging due to kernel taint > > > > [ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8) > > > > [ 1.914823] ------------[ cut here ]------------ > > > > [ 1.915408] kernel BUG at lib/list_debug.c:65! > > > > [ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > [ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS > > > > [ 1.917877] Tainted: [B]=BAD_PAGE > > > > [ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > [ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > > [ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > > [ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > > [ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > > [ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > > [ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > > [ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > > [ 1.931444] Call Trace: > > > > [ 1.931916] <TASK> > > > > [ 1.932357] ? die+0x36/0x90 > > > > [ 1.932831] ? do_trap+0xed/0x110 > > > > [ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.934073] ? do_error_trap+0x6a/0xa0 > > > > [ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.935242] ? exc_invalid_op+0x51/0x80 > > > > [ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.936484] ? asm_exc_invalid_op+0x1a/0x20 > > > > [ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0 > > > > [ 1.938685] __rmqueue_pcplist+0xa5/0xd00 > > > > [ 1.939292] ? srso_return_thunk+0x5/0x5f > > > > [ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > [ 1.940758] ? srso_return_thunk+0x5/0x5f > > > > [ 1.941417] ? update_load_avg+0x7e/0x7f0 > > > > [ 1.942133] ? srso_return_thunk+0x5/0x5f > > > > [ 1.942838] ? srso_return_thunk+0x5/0x5f > > > > [ 1.943508] get_page_from_freelist+0x2df/0x1910 > > > > [ 1.944143] ? srso_return_thunk+0x5/0x5f > > > > [ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240 > > > > [ 1.945335] ? srso_return_thunk+0x5/0x5f > > > > [ 1.945905] __alloc_pages_noprof+0x1a3/0x1150 > > > > [ 1.946489] ? __blk_flush_plug+0xf5/0x150 > > > > [ 1.947105] ? srso_return_thunk+0x5/0x5f > > > > [ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0 > > > > [ 1.948404] ? srso_return_thunk+0x5/0x5f > > > > [ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > [ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > > [ 1.950514] folio_alloc_mpol_noprof+0x14/0x40 > > > > [ 1.951153] shmem_alloc_folio+0xa7/0xd0 > > > > [ 1.951692] ? shmem_recalc_inode+0x20/0x90 > > > > [ 1.952272] shmem_alloc_and_add_folio+0x109/0x490 > > > > [ 1.952940] ? filemap_get_entry+0x10f/0x1a0 > > > > [ 1.953570] ? srso_return_thunk+0x5/0x5f > > > > [ 1.954185] shmem_get_folio_gfp+0x248/0x610 > > > > [ 1.954791] shmem_write_begin+0x64/0x110 > > > > [ 1.955484] generic_perform_write+0xdf/0x2a0 > > > > [ 1.956239] shmem_file_write_iter+0x8a/0x90 > > > > [ 1.956882] iter_file_splice_write+0x33f/0x580 > > > > [ 1.957577] direct_splice_actor+0x54/0x140 > > > > [ 1.958178] splice_direct_to_actor+0xec/0x270 > > > > [ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10 > > > > [ 1.959442] do_splice_direct+0x77/0xd0 > > > > [ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > > [ 1.960726] do_sendfile+0x359/0x410 > > > > [ 1.961248] __x64_sys_sendfile64+0xb9/0xd0 > > > > [ 1.961905] do_syscall_64+0xb7/0x210 > > > > [ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > [ 1.963211] RIP: 0033:0x7fa9f3a7289e > > > > [ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > > [ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > > [ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > > [ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > > [ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > > [ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > > [ 1.971214] </TASK> > > > > [ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > [ 1.976558] ---[ end trace 0000000000000000 ]--- > > > > [ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > > [ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > > [ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > > [ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > > [ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > [ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > > [ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > > [ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > > [ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > > [ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > [ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > > [ 1.990418] note: cat[315] exited with preempt_count 2 > > > > > > > > I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which > > > > also seems to touch part of the 9p VFS code. > > > > > > > > Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't > > > > meaningfully test whether a simple revert solves the problem. > > > > > > > > The VMs get the Nix store mounted via 9p. In the store are basically all > > > > build artifacts including the stage-2 init script of the system that is > > > > booted into in the VM test. > > > > > > > > The invocation basically looks like this: > > > > > > > > qemu-system-x86_64 -cpu max \ > > > > -name machine \ > > > > -m 1024 \ > > > > -smp 1 \ > > > > -device virtio-rng-pci \ > > > > -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ > > > > -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ > > > > -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ > > > > -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ > > > > -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \ > > > > -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \ > > > > -netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \ > > > > -device virtio-keyboard \ > > > > -usb \ > > > > -device usb-tablet,bus=usb-bus.0 \ > > > > -kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \ > > > > -initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \ > > > > -append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \ > > > > $QEMU_OPTS > > > > > > > > If you're using Nix, you can also reproduce this by running > > > > > > > > nix-build nixos/tests/kernel-generic.nix -A linux_testing > > > > > > > > on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs. > > > > > > > > To me, this seems like a regression in rc1. > > > > > > > > Is there anything else I can do to help troubleshooting this? > > > > > > > > With best regards > > > > > > > > Maximilian > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-17 18:00 ` Antony Antony @ 2024-10-21 13:23 ` Christian Brauner 2024-10-21 14:12 ` David Howells ` (4 subsequent siblings) 5 siblings, 0 replies; 28+ messages in thread From: Christian Brauner @ 2024-10-21 13:23 UTC (permalink / raw) To: Antony Antony Cc: Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, David Howells, LKML, linux-fsdevel On Thu, Oct 17, 2024 at 08:00:35PM +0200, Antony Antony wrote: > Hi, > > On Thu, Oct 03, 2024 at 03:12:15AM +0200, Sedat Dilek wrote: > > On Wed, Oct 2, 2024 at 11:58 PM Maximilian Bosch <maximilian@mbosch.me> wrote: > > > > > > Good evening, > > > > > > thanks a lot for the quick reply! > > > > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > > > I applied the patch on top of Linux 6.12-rc1 locally and I can confirm > > > that this resolves the issue, thanks! > > Maximilian, would you like to re-run the test a few times? I wonder if there > is another intermittend bug related to the same commit. > > > > > > > With best regards > > > > > > Maximilian > > > > > > > Thanks for testing. > > > > For the records: > > > > iov_iter: fix advancing slot in iter_folioq_get_pages() > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.fixes&id=0d24852bd71ec85ca0016b6d6fc997e6a3381552 > > > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.fixes > > I’m still seeing a kernel oops after the fix in 6.12-rc3, but I’ve noticed > that the issue is no longer 100% reproducible. Most of the time, the system > crashes. Before this fix it was 100% reproducible. > > When using the nix testing, I have to force the test to re-run. > > result=$(readlink -f ./result); rm ./result && nix-store --delete $result > > nix-build -v nixos/tests/kernel-generic.nix -A linux_testing > > So may be there is a new bug showing up after the fix. I have reported it. > > https://lore.kernel.org/regressions/ZxFEi1Tod43pD6JC@moon.secunet.de/T/#u I've pinged David again so hopefully we'll have a fix soon. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-17 18:00 ` Antony Antony 2024-10-21 13:23 ` Christian Brauner @ 2024-10-21 14:12 ` David Howells 2024-10-21 15:33 ` Antony Antony 2024-10-21 14:45 ` David Howells ` (3 subsequent siblings) 5 siblings, 1 reply; 28+ messages in thread From: David Howells @ 2024-10-21 14:12 UTC (permalink / raw) To: Antony Antony Cc: dhowells, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner Antony Antony <antony@phenome.org> wrote: > When using the nix testing, I have to force the test to re-run. > > result=$(readlink -f ./result); rm ./result && nix-store --delete $result > > nix-build -v nixos/tests/kernel-generic.nix -A linux_testing Is there a way to run this on Fedora? David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-21 14:12 ` David Howells @ 2024-10-21 15:33 ` Antony Antony 0 siblings, 0 replies; 28+ messages in thread From: Antony Antony @ 2024-10-21 15:33 UTC (permalink / raw) To: David Howells Cc: Antony Antony, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner On Mon, Oct 21, 2024 at 03:12:38PM +0100, David Howells wrote: > Antony Antony <antony@phenome.org> wrote: > > > When using the nix testing, I have to force the test to re-run. > > > > result=$(readlink -f ./result); rm ./result && nix-store --delete $result > > > > nix-build -v nixos/tests/kernel-generic.nix -A linux_testing > > Is there a way to run this on Fedora? Yes. You can run it on Fedora. try these steps? 1. Install nix. a: preferd way: curl --proto '=https' --tlsv1.2 -sSf -L \ https://install.determinate.systems/nix | sh -s -- install b: may be use dnf? I am advised dnf is a bad idea! 2. clone latest nixpkgs git clone https://github.com/NixOS/nixpkgs 3. cd nixpkgs nix-build -v nixos/tests/kernel-generic.nix -A linux_testing currently this will run 6.12-rc3. when the test does not finish running, "Ctrl + C" to sop when it succeds to re-run: result=$(readlink -f ./result); rm ./result && nix-store --delete $result nix-build -v nixos/tests/kernel-generic.nix -A linux_testing -antony ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-17 18:00 ` Antony Antony 2024-10-21 13:23 ` Christian Brauner 2024-10-21 14:12 ` David Howells @ 2024-10-21 14:45 ` David Howells 2024-10-21 15:53 ` Antony Antony 2025-08-10 5:10 ` Arnout Engelen 2024-10-21 20:38 ` [PATCH] 9p: Don't revert the I/O iterator after reading David Howells ` (2 subsequent siblings) 5 siblings, 2 replies; 28+ messages in thread From: David Howells @ 2024-10-21 14:45 UTC (permalink / raw) To: Antony Antony Cc: dhowells, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner Can you tell me what parameters you're mounting 9p with? Looking at the backtrace: [ 32.390878] bad_page+0x70/0x110 [ 32.391056] free_unref_page+0x363/0x4f0 [ 32.391257] p9_release_pages+0x41/0x90 [9pnet] [ 32.391627] p9_virtio_zc_request+0x3d4/0x720 [9pnet_virtio] [ 32.391896] ? p9pdu_finalize+0x32/0xa0 [9pnet] [ 32.392153] p9_client_zc_rpc.constprop.0+0x102/0x310 [9pnet] [ 32.392447] ? kmem_cache_free+0x36/0x370 [ 32.392703] p9_client_read_once+0x1a6/0x310 [9pnet] [ 32.392992] p9_client_read+0x56/0x80 [9pnet] [ 32.393238] v9fs_issue_read+0x50/0xd0 [9p] [ 32.393467] netfs_read_to_pagecache+0x20c/0x480 [netfs] [ 32.393832] netfs_readahead+0x225/0x330 [netfs] [ 32.394154] read_pages+0x6a/0x250 it's using buffered I/O, but when I try and use 9p from qemu, it wants to use unbuffered/direct I/O. David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-21 14:45 ` David Howells @ 2024-10-21 15:53 ` Antony Antony 2024-10-21 19:48 ` David Howells 2025-08-10 5:10 ` Arnout Engelen 1 sibling, 1 reply; 28+ messages in thread From: Antony Antony @ 2024-10-21 15:53 UTC (permalink / raw) To: David Howells Cc: Antony Antony, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner On Mon, Oct 21, 2024 at 03:45:50PM +0100, David Howells wrote: > Can you tell me what parameters you're mounting 9p with? Looking at the > backtrace: > > [ 32.390878] bad_page+0x70/0x110 > [ 32.391056] free_unref_page+0x363/0x4f0 > [ 32.391257] p9_release_pages+0x41/0x90 [9pnet] > [ 32.391627] p9_virtio_zc_request+0x3d4/0x720 [9pnet_virtio] > [ 32.391896] ? p9pdu_finalize+0x32/0xa0 [9pnet] > [ 32.392153] p9_client_zc_rpc.constprop.0+0x102/0x310 [9pnet] > [ 32.392447] ? kmem_cache_free+0x36/0x370 > [ 32.392703] p9_client_read_once+0x1a6/0x310 [9pnet] > [ 32.392992] p9_client_read+0x56/0x80 [9pnet] > [ 32.393238] v9fs_issue_read+0x50/0xd0 [9p] > [ 32.393467] netfs_read_to_pagecache+0x20c/0x480 [netfs] > [ 32.393832] netfs_readahead+0x225/0x330 [netfs] > [ 32.394154] read_pages+0x6a/0x250 > > it's using buffered I/O, but when I try and use 9p from qemu, it wants to use > unbuffered/direct I/O. how can I check what it is using? could you see from the command line? /nix/store/s7zgdx5i9gs4abxjl94jcsw3xn4m861i-qemu-host-cpu-only-for-vm-tests-9.1.0/bin/qemu-kvm -cpu max -name machine -m 1024 -smp 1 -device virtio-rng-pci -net nic,netdev=user.0,model=virtio -netdev user,id=user.0, -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store -virtfs local,path=/build/shared-xchg,security_model=none,mount_tag=shared -virtfs local,path=/build/vm-state-machine/xchg,security_model=none,mount_tag=xchg -drive cache=writeback,file=/build/vm-state-machine/machine.qcow2,id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 -netdev vde,id=vlan1,sock=/build/vde1.ctl -device virtio-keyboard -usb -device usb-tablet,bus=usb-bus.0 -kernel /nix/store/i4xrqfq4jrk2chv6iqm2rgxdk8biynlr-nixos-system-machine-test/kernel -initrd /nix/store/i06b3wvd4c83x8slnd1f85dj7msjy398-initrd-linux-6.12-rc3/initrd -append console=ttyS0 console=tty0 panic=1 boot.panic_on_fail clocksource=acpi_pm loglevel=7 net.ifnames=0 init=/nix/store/i4xrqfq4jrk2chv6iqm2rgxdk8biynlr-nixos-system-machine-test/init regInfo=/nix/store/5ygkzfld2zk20cy95iipmw2xxfvqalaz-closure-info/registration console=ttyS0 -qmp unix:/build/vm-state-machine/qmp,server=on,wait=off -monitor unix:/build/vm-state-machine/monitor -chardev socket,id=shell,path=/build/vm-state-machine/shell -device virtio-serial -device virtconsole,chardev=shell -device virtio-rng-pci -serial stdio -no-reboot -nographic or inside a guest (running similar test an older kernel) >>> print(alice.execute("mount | grep 9p")[1]) nix-store on /nix/.ro-store type 9p (rw,relatime,dirsync,loose,access=client,msize=16384,trans=virtio) shared on /tmp/shared type 9p (rw,relatime,sync,dirsync,access=client,msize=16384,trans=virtio) xchg on /tmp/xchg type 9p (rw,relatime,sync,dirsync,access=client,msize=16384,trans=virtio) -antony ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-21 15:53 ` Antony Antony @ 2024-10-21 19:48 ` David Howells 0 siblings, 0 replies; 28+ messages in thread From: David Howells @ 2024-10-21 19:48 UTC (permalink / raw) To: Antony Antony Cc: dhowells, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner I may have reproduced the bug (see attached), though the symptoms are slightly different. Hopefully, it's just the one bug. David --- page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x5626ca804 pfn:0x7948 flags: 0x2000000000000000(zone=1) raw: 2000000000000000 ffffea000024d3c8 ffffea00001e5248 0000000000000000 raw: 00000005626ca804 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1444! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 UID: 0 PID: 303 Comm: md5sum Not tainted 6.12.0-rc2-ktest-00012-g57e4ac5316ef-dirty #8 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 RIP: 0010:__iov_iter_get_pages_alloc+0x701/0x7e0 Code: 0f 0b 4d 85 f6 0f 85 f8 fc ff ff e9 21 fe ff ff 48 c7 c6 38 6b 40 82 e8 cd f4 aa ff 0f 0b 48 c7 c6 38 6b 40 82 e8 bf f4 aa ff <0f> 0b 4d 89 6a 18 49 89 52 08 4d 89 42 10 45 88 4a 20 e9 26 fb ff RSP: 0018:ffff88800804f5c0 EFLAGS: 00010286 RAX: 000000000000005c RBX: 0000000000001000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000027 RDI: 00000000ffffffff RBP: ffff88800804f648 R08: ffff88807f1d8fa8 R09: 00000000fffc0000 R10: ffff88807dbd9000 R11: 0000000000000002 R12: 0000000000000001 R13: 0000000000001000 R14: 0000000000000000 R15: 0000000000001000 FS: 00007f0c42c29580(0000) GS:ffff88807d8c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000560cff5bb000 CR3: 00000000046a0002 CR4: 0000000000370eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body.cold+0x19/0x2b ? __die+0x2a/0x40 ? die+0x2f/0x50 ? do_trap+0xb8/0x100 ? do_error_trap+0x6c/0x90 ? __iov_iter_get_pages_alloc+0x701/0x7e0 ? exc_invalid_op+0x52/0x70 ? __iov_iter_get_pages_alloc+0x701/0x7e0 ? asm_exc_invalid_op+0x1b/0x20 ? __iov_iter_get_pages_alloc+0x701/0x7e0 ? radix_tree_node_alloc.constprop.0+0xab/0xf0 iov_iter_get_pages_alloc2+0x20/0x50 p9_get_mapped_pages.part.0+0x77/0x260 ? find_held_lock+0x31/0x90 ? p9_tag_alloc+0x1c8/0x2f0 p9_virtio_zc_request+0x339/0x6f0 ? debug_smp_processor_id+0x17/0x20 ? debug_smp_processor_id+0x17/0x20 ? rcu_is_watching+0x11/0x50 ? p9_client_prepare_req+0x15f/0x190 p9_client_zc_rpc.constprop.0+0xe6/0x330 p9_client_read_once+0x145/0x2b0 p9_client_read+0x59/0x80 v9fs_issue_read+0x3d/0xa0 netfs_read_to_pagecache+0x27b/0x580 netfs_readahead+0x197/0x2f0 read_pages+0x4a/0x300 page_cache_ra_unbounded+0x197/0x250 page_cache_ra_order+0x2f7/0x400 ? __this_cpu_preempt_check+0x13/0x20 ? lock_release+0x168/0x290 page_cache_async_ra+0x1be/0x220 filemap_get_pages+0x2f3/0x870 filemap_read+0xdc/0x470 ? __this_cpu_preempt_check+0x13/0x20 ? lock_acquire+0xcc/0x1c0 ? preempt_count_add+0x4e/0xc0 ? down_read_interruptible+0xb3/0x1b0 netfs_buffered_read_iter+0x5c/0x90 netfs_file_read_iter+0x29/0x40 v9fs_file_read_iter+0x1b/0x30 vfs_read+0x22b/0x330 ksys_read+0x62/0xe0 __x64_sys_read+0x19/0x20 x64_sys_call+0x1b70/0x1d20 do_syscall_64+0x47/0x110 entry_SYSCALL_64_after_hwframe+0x76/0x7e ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-21 14:45 ` David Howells 2024-10-21 15:53 ` Antony Antony @ 2025-08-10 5:10 ` Arnout Engelen 1 sibling, 0 replies; 28+ messages in thread From: Arnout Engelen @ 2025-08-10 5:10 UTC (permalink / raw) To: dhowells Cc: antony, brauner, linux-fsdevel, linux-kernel, maximilian, regressions, sedat.dilek > Can you tell me what parameters you're mounting 9p with? Looking at the > backtrace: > > [ 32.390878] bad_page+0x70/0x110 > [ 32.391056] free_unref_page+0x363/0x4f0 > [ 32.391257] p9_release_pages+0x41/0x90 [9pnet] > [ 32.391627] p9_virtio_zc_request+0x3d4/0x720 [9pnet_virtio] > [ 32.391896] ? p9pdu_finalize+0x32/0xa0 [9pnet] > [ 32.392153] p9_client_zc_rpc.constprop.0+0x102/0x310 [9pnet] > [ 32.392447] ? kmem_cache_free+0x36/0x370 > [ 32.392703] p9_client_read_once+0x1a6/0x310 [9pnet] > [ 32.392992] p9_client_read+0x56/0x80 [9pnet] > [ 32.393238] v9fs_issue_read+0x50/0xd0 [9p] > [ 32.393467] netfs_read_to_pagecache+0x20c/0x480 [netfs] > [ 32.393832] netfs_readahead+0x225/0x330 [netfs] > [ 32.394154] read_pages+0x6a/0x250 > > it's using buffered I/O, but when I try and use 9p from qemu, it wants to use > unbuffered/direct I/O. The NixOS integration tests mount 9p with 'cache=loose' which triggers the buffered I/O. (I'm still seeing an issue in this area on 6.16-rc6, which remains also with 'only' 'cache=lookahead' - I'll do some more analysis before sharing more about that, though) Kind regards, Arnout ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH] 9p: Don't revert the I/O iterator after reading 2024-10-17 18:00 ` Antony Antony ` (2 preceding siblings ...) 2024-10-21 14:45 ` David Howells @ 2024-10-21 20:38 ` David Howells 2024-10-21 23:53 ` Antony Antony 2024-10-22 8:56 ` Christian Brauner 2024-10-23 10:07 ` [REGRESSION] 9pfs issues on 6.12-rc1 David Howells 2024-10-23 18:35 ` Maximilian Bosch 5 siblings, 2 replies; 28+ messages in thread From: David Howells @ 2024-10-21 20:38 UTC (permalink / raw) To: Antony Antony Cc: dhowells, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner Hi Antony, I think I may have a fix already lurking on my netfs-writeback branch for the next merge window. Can you try the attached? David --- Don't revert the I/O iterator before returning from p9_client_read_once(). netfslib doesn't require the reversion and nor doed 9P directory reading. Make p9_client_read() use a temporary iterator to call down into p9_client_read_once(), and advance that by the amount read. Reported-by: Antony Antony <antony@phenome.org> Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org --- net/9p/client.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/net/9p/client.c b/net/9p/client.c index 5cd94721d974..be59b0a94eaf 100644 --- a/net/9p/client.c +++ b/net/9p/client.c @@ -1519,13 +1519,15 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err) *err = 0; while (iov_iter_count(to)) { + struct iov_iter tmp = *to; int count; - count = p9_client_read_once(fid, offset, to, err); + count = p9_client_read_once(fid, offset, &tmp, err); if (!count || *err) break; offset += count; total += count; + iov_iter_advance(to, count); } return total; } @@ -1567,16 +1569,12 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, } if (IS_ERR(req)) { *err = PTR_ERR(req); - if (!non_zc) - iov_iter_revert(to, count - iov_iter_count(to)); return 0; } *err = p9pdu_readf(&req->rc, clnt->proto_version, "D", &received, &dataptr); if (*err) { - if (!non_zc) - iov_iter_revert(to, count - iov_iter_count(to)); trace_9p_protocol_dump(clnt, &req->rc); p9_req_put(clnt, req); return 0; @@ -1596,8 +1594,6 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, p9_req_put(clnt, req); return n; } - } else { - iov_iter_revert(to, count - received - iov_iter_count(to)); } p9_req_put(clnt, req); return received; ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH] 9p: Don't revert the I/O iterator after reading 2024-10-21 20:38 ` [PATCH] 9p: Don't revert the I/O iterator after reading David Howells @ 2024-10-21 23:53 ` Antony Antony 2024-10-22 8:56 ` Christian Brauner 1 sibling, 0 replies; 28+ messages in thread From: Antony Antony @ 2024-10-21 23:53 UTC (permalink / raw) To: David Howells Cc: Antony Antony, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel, Christian Brauner Hi David, On Mon, Oct 21, 2024 at 21:38:23 +0100, David Howells wrote: > Hi Antony, > > I think I may have a fix already lurking on my netfs-writeback branch for the > next merge window. Can you try the attached? yes. The fix works. I rebooted a few times and no crash. Tested-by: Antony Antony <antony.antony@secunet.com> I am running test script in a loop over night. thanks, -antony > > David > --- > Don't revert the I/O iterator before returning from p9_client_read_once(). > netfslib doesn't require the reversion and nor doed 9P directory reading. > > Make p9_client_read() use a temporary iterator to call down into > p9_client_read_once(), and advance that by the amount read. > > Reported-by: Antony Antony <antony@phenome.org> > Signed-off-by: David Howells <dhowells@redhat.com> > cc: Eric Van Hensbergen <ericvh@kernel.org> > cc: Latchesar Ionkov <lucho@ionkov.net> > cc: Dominique Martinet <asmadeus@codewreck.org> > cc: Christian Schoenebeck <linux_oss@crudebyte.com> > cc: v9fs@lists.linux.dev > cc: netfs@lists.linux.dev > cc: linux-fsdevel@vger.kernel.org > --- > net/9p/client.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/net/9p/client.c b/net/9p/client.c > index 5cd94721d974..be59b0a94eaf 100644 > --- a/net/9p/client.c > +++ b/net/9p/client.c > @@ -1519,13 +1519,15 @@ p9_client_read(struct p9_fid *fid, u64 offset, struct iov_iter *to, int *err) > *err = 0; > > while (iov_iter_count(to)) { > + struct iov_iter tmp = *to; > int count; > > - count = p9_client_read_once(fid, offset, to, err); > + count = p9_client_read_once(fid, offset, &tmp, err); > if (!count || *err) > break; > offset += count; > total += count; > + iov_iter_advance(to, count); > } > return total; > } > @@ -1567,16 +1569,12 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, > } > if (IS_ERR(req)) { > *err = PTR_ERR(req); > - if (!non_zc) > - iov_iter_revert(to, count - iov_iter_count(to)); > return 0; > } > > *err = p9pdu_readf(&req->rc, clnt->proto_version, > "D", &received, &dataptr); > if (*err) { > - if (!non_zc) > - iov_iter_revert(to, count - iov_iter_count(to)); > trace_9p_protocol_dump(clnt, &req->rc); > p9_req_put(clnt, req); > return 0; > @@ -1596,8 +1594,6 @@ p9_client_read_once(struct p9_fid *fid, u64 offset, struct iov_iter *to, > p9_req_put(clnt, req); > return n; > } > - } else { > - iov_iter_revert(to, count - received - iov_iter_count(to)); > } > p9_req_put(clnt, req); > return received; > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH] 9p: Don't revert the I/O iterator after reading 2024-10-21 20:38 ` [PATCH] 9p: Don't revert the I/O iterator after reading David Howells 2024-10-21 23:53 ` Antony Antony @ 2024-10-22 8:56 ` Christian Brauner 1 sibling, 0 replies; 28+ messages in thread From: Christian Brauner @ 2024-10-22 8:56 UTC (permalink / raw) To: Antony Antony, David Howells Cc: Christian Brauner, Sedat Dilek, Maximilian Bosch, Linux regressions mailing list, LKML, linux-fsdevel On Mon, 21 Oct 2024 21:38:23 +0100, David Howells wrote: > I think I may have a fix already lurking on my netfs-writeback branch for the > next merge window. Can you try the attached? > > David > > Applied to the vfs.fixes branch of the vfs/vfs.git tree. Patches in the vfs.fixes branch should appear in linux-next soon. Please report any outstanding bugs that were missed during review in a new review to the original patch series allowing us to drop it. It's encouraged to provide Acked-bys and Reviewed-bys even though the patch has now been applied. If possible patch trailers will be updated. Note that commit hashes shown below are subject to change due to rebase, trailer updates or similar. If in doubt, please check the listed branch. tree: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git branch: vfs.fixes [1/1] 9p: Don't revert the I/O iterator after reading https://git.kernel.org/vfs/vfs/c/09c729da3283 ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-17 18:00 ` Antony Antony ` (3 preceding siblings ...) 2024-10-21 20:38 ` [PATCH] 9p: Don't revert the I/O iterator after reading David Howells @ 2024-10-23 10:07 ` David Howells 2024-10-23 19:38 ` Antony Antony 2024-10-23 18:35 ` Maximilian Bosch 5 siblings, 1 reply; 28+ messages in thread From: David Howells @ 2024-10-23 10:07 UTC (permalink / raw) To: Antony Antony Cc: dhowells, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi Antony, I think the attached should fix it properly rather than working around it as the previous patch did. If you could give it a whirl? Thanks, David --- commit 68dddbfdf45e8f176cc8556a3db69af24dfb8519 Author: David Howells <dhowells@redhat.com> Date: Wed Oct 23 10:24:12 2024 +0100 iov_iter: Fix iov_iter_get_pages*() for folio_queue p9_get_mapped_pages() uses iov_iter_get_pages_alloc2() to extract pages from an iterator when performing a zero-copy request and under some circumstances, this crashes with odd page errors[1], for example, I see: page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xbcf0 flags: 0x2000000000000000(zone=1) ... page dumped because: VM_BUG_ON_FOLIO(((unsigned int) folio_ref_count(folio) + 127u <= 127u)) ------------[ cut here ]------------ kernel BUG at include/linux/mm.h:1444! This is because, unlike in iov_iter_extract_folioq_pages(), the iter_folioq_get_pages() helper function doesn't skip the current folio when iov_offset points to the end of it, but rather extracts the next page beyond the end of the folio and adds it to the list. Reading will then clobber the contents of this page, leading to system corruption, and if the page is not in use, put_page() may try to clean up the unused page. This can be worked around by copying the iterator before each extraction[2] and using iov_iter_advance() on the original as the advance function steps over the page we're at the end of. Fix this by skipping the page extraction if we're at the end of the folio. This was reproduced in the ktest environment[3] by forcing 9p to use the fscache caching mode and then reading a file through 9p. Fixes: db0aa2e9566f ("mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios") Reported-by: Antony Antony <antony@phenome.org> Closes: https://lore.kernel.org/r/ZxFQw4OI9rrc7UYc@Antony2201.local/ Signed-off-by: David Howells <dhowells@redhat.com> cc: Eric Van Hensbergen <ericvh@kernel.org> cc: Latchesar Ionkov <lucho@ionkov.net> cc: Dominique Martinet <asmadeus@codewreck.org> cc: Christian Schoenebeck <linux_oss@crudebyte.com> cc: v9fs@lists.linux.dev cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/ZxFEi1Tod43pD6JC@moon.secunet.de/ [1] Link: https://lore.kernel.org/r/2299159.1729543103@warthog.procyon.org.uk/ [2] Link: https://github.com/koverstreet/ktest.git [3] diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 1abb32c0da50..cc4b5541eef8 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1021,15 +1021,18 @@ static ssize_t iter_folioq_get_pages(struct iov_iter *iter, size_t offset = iov_offset, fsize = folioq_folio_size(folioq, slot); size_t part = PAGE_SIZE - offset % PAGE_SIZE; - part = umin(part, umin(maxsize - extracted, fsize - offset)); - count -= part; - iov_offset += part; - extracted += part; - - *pages = folio_page(folio, offset / PAGE_SIZE); - get_page(*pages); - pages++; - maxpages--; + if (offset < fsize) { + part = umin(part, umin(maxsize - extracted, fsize - offset)); + count -= part; + iov_offset += part; + extracted += part; + + *pages = folio_page(folio, offset / PAGE_SIZE); + get_page(*pages); + pages++; + maxpages--; + } + if (maxpages == 0 || extracted >= maxsize) break; ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-23 10:07 ` [REGRESSION] 9pfs issues on 6.12-rc1 David Howells @ 2024-10-23 19:38 ` Antony Antony 2025-06-12 22:24 ` Ryan Lahfa 0 siblings, 1 reply; 28+ messages in thread From: Antony Antony @ 2024-10-23 19:38 UTC (permalink / raw) To: David Howells Cc: Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel, Antony Antony On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote: > Hi Antony, > > I think the attached should fix it properly rather than working around it as > the previous patch did. If you could give it a whirl? Yes this also fix the crash. Tested-by: Antony Antony <antony.antony@secunet.com> thanks, -antony ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-23 19:38 ` Antony Antony @ 2025-06-12 22:24 ` Ryan Lahfa 2025-06-27 5:44 ` Christian Theune ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Ryan Lahfa @ 2025-06-12 22:24 UTC (permalink / raw) To: Antony Antony Cc: David Howells, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi everyone, Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit : > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote: > > Hi Antony, > > > > I think the attached should fix it properly rather than working around it as > > the previous patch did. If you could give it a whirl? > > Yes this also fix the crash. > > Tested-by: Antony Antony <antony.antony@secunet.com> I cannot confirm this fixes the crash for me. My reproducer is slightly more complicated than Max's original one, albeit, still on NixOS and probably uses 9p more intensively than the automated NixOS testings workload. Here is how to reproduce it: $ git clone https://gerrit.lix.systems/lix $ cd lix $ git fetch https://gerrit.lix.systems/lix refs/changes/29/3329/8 && git checkout FETCH_HEAD $ nix-build -A hydraJobs.tests.local-releng I suspect the reason for why Antony considers the crash to be fixed is that the workload used to test it requires a significant amount of chance and retries to trigger the bug. On my end, you can see our CI showing the symptoms: https://buildkite.com/organizations/lix-project/pipelines/lix/builds/2357/jobs/019761e7-784e-4790-8c1b-f609270d9d19/log. We retried probably hundreds of times and saw different corruption patterns, Python getting confused, ld.so getting confused, systemd sometimes too. Python had a much higher chance of crashing in many of our tests. We reproduced it over aarch64-linux (Ampere Altra Q80-30) but also Intel and AMD CPUs (~5 different systems). As soon as we reverted to Linux 6.6 series, the bug went away. We bisected but we started to have weirder problems, this is because we encountered the original regression mentioned in October 2024 and for a certain range of commits, we were unable to bisect anything further. So I switched my bisection strategy to understand when the bug was fixed, this lead me on the commit e65a0dc1cabe71b91ef5603e5814359451b74ca7 which is the proper fix mentioned here and on this discussion. Reverting this on the top of 6.12 cause indeed a massive amount of traces, see this gist [1] for examples. Applying the "workaround patch" aka "[PATCH] 9p: Don't revert the I/O iterator after reading" after reverting e65a0dc1cabe makes the problem go away after 5 tries (5 tries were sufficient to trigger with the proper fix). If this can be helpful, the nature of the test above is to copy a significant amount of assets to an S3 implementation (Garage) running inside of the VM. Many of these assets comes from the Nix store which sits over 9p. Anyhow, I see three patterns: - Kernel panic when starting the /init, this is the crash Max reported back in October 2024 and the one we started to encounter while bisecting this problem in the range between v6.11 and v6.12. - systemd crashing very quickly, this is what we see when reverting e65a0dc1cabe71b91ef5603e5814359451b74ca7 on the top of v6.12 *OR* when we are around v6.12rc5. - what the CI above shows which are userspace programs crashing after some serious I/O exercising has been done, which happens on the top of v6.12, v6.14, v6.15 (incl. stable kernels). If you need me to test things, please let me know. [1]: https://gist.dgnum.eu/raito/3d1fa61ebaf642218342ffe644fb6efd -- Ryan Lahfa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-12 22:24 ` Ryan Lahfa @ 2025-06-27 5:44 ` Christian Theune 2025-06-27 6:44 ` Dominique Martinet 2025-06-27 10:00 ` David Howells 2025-08-10 17:57 ` Arnout Engelen 2 siblings, 1 reply; 28+ messages in thread From: Christian Theune @ 2025-06-27 5:44 UTC (permalink / raw) To: Ryan Lahfa Cc: Antony Antony, David Howells, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi, we’re experiencing the same issue with a number of NixOS tests that are heavy in operations copying from the v9fs mounted nix store. > On 13. Jun 2025, at 00:24, Ryan Lahfa <ryan@lahfa.xyz> wrote: > > Hi everyone, > > Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit : >> On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote: >>> Hi Antony, >>> >>> I think the attached should fix it properly rather than working around it as >>> the previous patch did. If you could give it a whirl? >> >> Yes this also fix the crash. >> >> Tested-by: Antony Antony <antony.antony@secunet.com> > > I cannot confirm this fixes the crash for me. My reproducer is slightly > more complicated than Max's original one, albeit, still on NixOS and > probably uses 9p more intensively than the automated NixOS testings > workload. > > Here is how to reproduce it: > > $ git clone https://gerrit.lix.systems/lix > $ cd lix > $ git fetch https://gerrit.lix.systems/lix refs/changes/29/3329/8 && git checkout FETCH_HEAD > $ nix-build -A hydraJobs.tests.local-releng > > I suspect the reason for why Antony considers the crash to be fixed is > that the workload used to test it requires a significant amount of > chance and retries to trigger the bug. > > On my end, you can see our CI showing the symptoms: > https://buildkite.com/organizations/lix-project/pipelines/lix/builds/2357/jobs/019761e7-784e-4790-8c1b-f609270d9d19/log. > > We retried probably hundreds of times and saw different corruption > patterns, Python getting confused, ld.so getting confused, systemd > sometimes too. Python had a much higher chance of crashing in many of > our tests. We reproduced it over aarch64-linux (Ampere Altra Q80-30) but > also Intel and AMD CPUs (~5 different systems). Yeah. We’re on AMD CPUs and it wasn’t hardware-bound. The errors we saw where: - malloc(): unaligned tcache chunk detected - segfaulting java processes - misbehaving filesystems (errors about internal structures in ext4, incorrect file content in xfs) - crashing kernels when dealing with the outfall of those errors > As soon as we reverted to Linux 6.6 series, the bug went away. Same here, the otherway around: we came from 6.6.94 and updated to 6.12.34 and immediately saw a number of tests failing, all of which were heavy in copying data from v9fs to the root filesystem in the VM. > We bisected but we started to have weirder problems, this is because we > encountered the original regression mentioned in October 2024 and for a > certain range of commits, we were unable to bisect anything further. I had already found the issue from last October when started bisecting, I later got in touch with Ryan who recognized that we were chasing the same issue. I stopped bisecting at that point - the bisect was already homing in around the time of the changes in last October. > So I switched my bisection strategy to understand when the bug was > fixed, this lead me on the commit > e65a0dc1cabe71b91ef5603e5814359451b74ca7 which is the proper fix > mentioned here and on this discussion. > > Reverting this on the top of 6.12 cause indeed a massive amount of > traces, see this gist [1] for examples. Yeah. During bisect I noticed it flapping around with the original October issues crashing immediately during boot. > Applying the "workaround patch" aka "[PATCH] 9p: Don't revert the I/O > iterator after reading" after reverting e65a0dc1cabe makes the problem > go away after 5 tries (5 tries were sufficient to trigger with the > proper fix). Yup, I applied the revert and workaround patch on top of 6.12.34 and the reliably broken test became reliably green again. Our test can be reproduced, too: $ git clone https://github.com/flyingcircusio/fc-nixos.git $ cd fc-nixos $ eval $(./dev-setup) $ nix-build tests/matomo.nix The test will fail with ext4 complaining something like this: machine # [ 42.596728] vn2haz1283lxz6iy0rai850a7jlgxbja-matomo-setup-update-pre[1233]: Copied files, updating package link in /var/lib/matomo/current-package. machine # [ 42.788956] EXT4-fs error (device vda): htree_dirblock_to_tree:1109: inode #13138: block 5883: comm setfacl: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=606087968, rec_len=31074, size=4096 fake=0 machine # [ 42.958590] EXT4-fs error (device vda): htree_dirblock_to_tree:1109: inode #13138: block 5883: comm chown: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=606087968, rec_len=31074, size=4096 fake=0 machine # [ 43.068003] EXT4-fs error (device vda): htree_dirblock_to_tree:1109: inode #13138: block 5883: comm chmod: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=606087968, rec_len=31074, size=4096 fake=0 machine # [ 43.004098] vn2haz1283lxz6iy0rai850a7jlgxbja-matomo-setup-update-pre[1233]: Giving matomo read+write access to /var/lib/matomo/share/matomo.js, /var/lib/matomo/share/piwik.js, /var/lib/matomo/share/config, /var/lib/matomo/share/misc/user, /var/lib/matomo/share/js, /var/lib/matomo/share/tmp, /var/lib/matomo/share/misc machine # [ 43.201319] EXT4-fs error (device vda): htree_dirblock_to_tree:1109: inode #13138: block 5883: comm setfacl: bad entry in directory: rec_len % 4 != 0 - offset=0, inode=606087968, rec_len=31074, size=4096 fake=0 I’m also available for testing and further diagnosis. Christian -- Christian Theune · ct@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-27 5:44 ` Christian Theune @ 2025-06-27 6:44 ` Dominique Martinet 2025-06-27 8:19 ` Christian Theune 0 siblings, 1 reply; 28+ messages in thread From: Dominique Martinet @ 2025-06-27 6:44 UTC (permalink / raw) To: Christian Theune, David Howells Cc: Ryan Lahfa, Antony Antony, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi all, sorry for the slow reply; I wasn't in Cc of most of the mails back in October so this is a pain to navigate... Let me recap a bit: - stuff started failing in 6.12-rc1 - David first posted "9p: Don't revert the I/O iterator after reading"[1], which fixed the bug, but then found a "better" fix as "iov_iter: Fix iov_iter_get_pages*() for folio_queue" [2] which was merged instead (so the first patch was not merged) But it turns out the second patch is not enough (or causes another issue?), and the reverting it + applying first one works, is that correct? What happens if you keep [2] and just apply [1], does that still bug? (I've tried reading through the thread now and I don't even see what was the "bad" patch in the first place, although I assume it's ee4cdf7ba857 ("netfs: Speed up buffered reading") -- was that confirmed?) David, as you worked on this at the time it'd be great if you could have another look, I have no idea what made you try [1] in the first place but unless you think 9p is doing something wrong like double-reverting on error or something like that I'd like to understand a bit more what happens... Although given 6.12 is getting used more now it could make sense to just apply [1] first until we understand, and have a proper fix come second -- if someone can confirm we don't need to revert [2]. [1] https://lore.kernel.org/all/3327438.1729678025@warthog.procyon.org.uk/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe [2] https://lore.kernel.org/all/3327438.1729678025@warthog.procyon.org.uk/T/#m89597a1144806db4ae89992953031cdffa0b0bf9 Thanks, -- Dominique ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-27 6:44 ` Dominique Martinet @ 2025-06-27 8:19 ` Christian Theune 0 siblings, 0 replies; 28+ messages in thread From: Christian Theune @ 2025-06-27 8:19 UTC (permalink / raw) To: Dominique Martinet Cc: David Howells, Ryan Lahfa, Antony Antony, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi, > On 27. Jun 2025, at 08:44, Dominique Martinet <asmadeus@codewreck.org> wrote: > > Hi all, > > sorry for the slow reply; I wasn't in Cc of most of the mails back in > October so this is a pain to navigate... Let me recap a bit: > - stuff started failing in 6.12-rc1 yes, to my knowledge and interpretation of this thread. > - David first posted "9p: Don't revert the I/O iterator after > reading"[1], which fixed the bug, but then found a "better" fix as > "iov_iter: Fix iov_iter_get_pages*() for folio_queue" [2] which was > merged instead (so the first patch was not merged) > > But it turns out the second patch is not enough (or causes another > issue?), and the reverting it + applying first one works, is that > correct? > What happens if you keep [2] and just apply [1], does that still bug? I tried that and the test that so far under all the variations reliably crashed (or not) is not crashing in this case. > (I've tried reading through the thread now and I don't even see what was > the "bad" patch in the first place, although I assume it's ee4cdf7ba857 > ("netfs: Speed up buffered reading") -- was that confirmed?) I was late to the party, to, so I’ll defer to the others. > David, as you worked on this at the time it'd be great if you could have > another look, I have no idea what made you try [1] in the first place > but unless you think 9p is doing something wrong like double-reverting > on error or something like that I'd like to understand a bit more what > happens... Although given 6.12 is getting used more now it could make > sense to just apply [1] first until we understand, and have a proper fix > come second -- if someone can confirm we don't need to revert [2]. I guess I confirmed this. However, I’m just barely better than a monkey here so I can’t tell whether this makes sense from the internal logic of things. To repeat, for safety: my test case worked with the situation you described and suggested: [1] applied on top of 6.12.34 and *not* having [2] reverted. Christian -- Christian Theune · ct@flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-12 22:24 ` Ryan Lahfa 2025-06-27 5:44 ` Christian Theune @ 2025-06-27 10:00 ` David Howells 2025-06-27 10:33 ` Ryan Lahfa 2025-08-10 17:57 ` Arnout Engelen 2 siblings, 1 reply; 28+ messages in thread From: David Howells @ 2025-06-27 10:00 UTC (permalink / raw) To: Ryan Lahfa Cc: dhowells, Antony Antony, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Ryan Lahfa <ryan@lahfa.xyz> wrote: > Here is how to reproduce it: > > $ git clone https://gerrit.lix.systems/lix > $ cd lix > $ git fetch https://gerrit.lix.systems/lix refs/changes/29/3329/8 && git checkout FETCH_HEAD > $ nix-build -A hydraJobs.tests.local-releng How do I build and run this on Fedora is the problem :-/ > [1]: https://gist.dgnum.eu/raito/3d1fa61ebaf642218342ffe644fb6efd Looking at this, it looks very much like a page may have been double-freed. Just to check, what are you using 9p for? Containers? And which transport is being used, the virtio one? David ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-27 10:00 ` David Howells @ 2025-06-27 10:33 ` Ryan Lahfa 0 siblings, 0 replies; 28+ messages in thread From: Ryan Lahfa @ 2025-06-27 10:33 UTC (permalink / raw) To: David Howells Cc: Antony Antony, Antony Antony, Christian Brauner, Eric Van Hensbergen, Latchesar Ionkov, Dominique Martinet, Christian Schoenebeck, Sedat Dilek, Maximilian Bosch, regressions, v9fs, netfs, linux-fsdevel, linux-kernel Hi David, Le Fri, Jun 27, 2025 at 11:00:06AM +0100, David Howells a écrit : > Ryan Lahfa <ryan@lahfa.xyz> wrote: > > > Here is how to reproduce it: > > > > $ git clone https://gerrit.lix.systems/lix > > $ cd lix > > $ git fetch https://gerrit.lix.systems/lix refs/changes/29/3329/8 && git checkout FETCH_HEAD > > $ nix-build -A hydraJobs.tests.local-releng > > How do I build and run this on Fedora is the problem :-/ This may introduce another layer but you could use a Docker container (Lix has http://ghcr.io/lix-project/lix) and run these instructions inside that context. Alternatives are the following: - static binary for Nix, I can build one for you and make it available. - the Lix installer, https://lix.systems/install/ (curl | sh but it does prompt you for any step and tell you what it does, it should also be very easy to uninstall!). - Debian has Nix packaged: https://packages.debian.org/sid/nix-bin (not Lix, but doesn't matter for this reproducer). - Can install a remote VM for you with Fedora with one of the previous option and give you root@ over there. Let me know how I can help. > > [1]: https://gist.dgnum.eu/raito/3d1fa61ebaf642218342ffe644fb6efd > > Looking at this, it looks very much like a page may have been double-freed. > > Just to check, what are you using 9p for? Containers? And which transport is > being used, the virtio one? 9p is used in QEMU in this context. NixOS has a framework of end to end testing à la OpenQA from OpenSUSE that makes use of 9pfs to mount the host Nix store inside the guest VM to avoid copying back'n'forth things that are not under test. Yep, transport is virtio. Kind regards, -- Ryan Lahfa ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-06-12 22:24 ` Ryan Lahfa 2025-06-27 5:44 ` Christian Theune 2025-06-27 10:00 ` David Howells @ 2025-08-10 17:57 ` Arnout Engelen 2025-08-11 0:57 ` asmadeus 2 siblings, 1 reply; 28+ messages in thread From: Arnout Engelen @ 2025-08-10 17:57 UTC (permalink / raw) To: ryan Cc: antony.antony, antony, asmadeus, brauner, dhowells, ericvh, linux-fsdevel, linux-kernel, linux_oss, lucho, maximilian, netfs, regressions, sedat.dilek, v9fs On Fri, 13 Jun 2025 00:24:13 +0200, Ryan Lahfa wrote: > Le Wed, Oct 23, 2024 at 09:38:39PM +0200, Antony Antony a écrit : > > On Wed, Oct 23, 2024 at 11:07:05 +0100, David Howells wrote: > > > Hi Antony, > > > > > > I think the attached should fix it properly rather than working around it as > > > the previous patch did. If you could give it a whirl? > > > > Yes this also fix the crash. > > > > Tested-by: Antony Antony <antony.antony@secunet.com> > > I cannot confirm this fixes the crash for me. My reproducer is slightly > more complicated than Max's original one, albeit, still on NixOS and > probably uses 9p more intensively than the automated NixOS testings > workload. I'm seeing a problem in the same area - the symptom is slightly different, but the location seems very similar. I'm also running a NixOS image. Mounting a 9p filesystem in qemu with `cache=readahead`, reading a 12943-byte file, in the guest I do see a 12943-byte file, but only the first 12288 bytes are populated: the rest are zero. This also reproduces (most but not all of the time) on 6.16-rc7, but not on all host machines I've tried. After applying a simplified version of [1] (i.e. [2]), the problem does not reproduce anymore. It seems something in `p9_client_read_once` somehow leaves the iov_iter in an unhealthy state. It would be good to understand exactly what, but I haven't been able to figure that out yet. I have a smallish nix-based reproducer at [3], and a more involved setup with a lot of logging enabled and a convenient way to attach gdb at [4]. You start the VM and then 'cat /repro/default.json' manually, and see if it looks 'truncated'. Interestingly, the file is read in two p9 read calls: one of 12288 bytes and one of 655 bytes. The first read is a zero-copy one, the second is not zero-copy (because it is smaller than 1024). I've also tried with a slightly larger version of the file, that is read as 2 zero-copy reads, and I have not been able to reproduce the problem with that. From my (admittedly limited) understanding the non-zerocopy code path looks fine, though. I hope this is helpful - I'd be happy to keep looking into this further, but any help pointing me in the right direction would be much appreciated :) Kind regards, Arnout [1] https://lore.kernel.org/all/3327438.1729678025@warthog.procyon.org.uk/T/#mc97a248b0f673dff6dc8613b508ca4fd45c4fefe [2] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging/kernel-use-copied-iov_iter.patch [3] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/small-reproducer [4] https://codeberg.org/raboof/nextcloud-onlyoffice-test-vm/src/branch/reproducer-with-debugging ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-08-10 17:57 ` Arnout Engelen @ 2025-08-11 0:57 ` asmadeus 2025-08-11 7:43 ` Dominique Martinet 0 siblings, 1 reply; 28+ messages in thread From: asmadeus @ 2025-08-11 0:57 UTC (permalink / raw) To: Arnout Engelen Cc: ryan, antony.antony, antony, brauner, dhowells, ericvh, linux-fsdevel, linux-kernel, linux_oss, lucho, maximilian, netfs, regressions, sedat.dilek, v9fs Arnout Engelen wrote on Sun, Aug 10, 2025 at 07:57:11PM +0200: > I have a smallish nix-based reproducer at [3], and a more involved setup > with a lot of logging enabled and a convenient way to attach gdb at [4]. > You start the VM and then 'cat /repro/default.json' manually, and see if > it looks 'truncated'. Thank you!!! I was able to reproduce with this! (well, `nix -L build .#nixosConfigurations.default.config.system.build.vm` to build the VM as this machine isn't running nixos and doesn't have nixos-rebuild...) > Interestingly, the file is read in two p9 read calls: one of 12288 bytes and > one of 655 bytes. The first read is a zero-copy one, the second is not > zero-copy (because it is smaller than 1024). Yes, your msize is set to 16k but with the 9p overhead the largest, 4k-aligned read that can be done is 12k, so that's coherent. (Changing the msize to 32k so it's read in a single zero-copy read, obviously makes this particular error go away, but it's a huge hint) Removing readahead also makes the problem go away, which is also surprising because from looking at traces it's only calling into p9_client_read() once (which forks the two p9_client_read_once, one with zc and the other without), so readahead shouldn't matter at all but it obviously does... Also I haven't been able to reproduce it with a kernel I built myself/my environment, but it reproduces reliably 99% of the times in the nixos VM, so we're missing a last piece for a "simple" (non-nix) reproducer, but I think it's good enough for me to dig into this; I'll try to find time to check in details this afternoon... Basically "just" have to follow where the data is written and why it doesn't end up in the iov and fix that, but I'll need to reproduce on a kernel I built first to be able to validate the fix. Anyway this is a huge leap forward (hopeful it's the same problem and we don't have two similar issues lurking here...), we can't thank you enough. I'll report back ASAP. -- Dominique Martinet | Asmadeus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-08-11 0:57 ` asmadeus @ 2025-08-11 7:43 ` Dominique Martinet 2025-08-11 12:43 ` Arnout Engelen 0 siblings, 1 reply; 28+ messages in thread From: Dominique Martinet @ 2025-08-11 7:43 UTC (permalink / raw) To: Arnout Engelen Cc: ryan, antony.antony, antony, brauner, ericvh, linux-fsdevel, linux-kernel, linux_oss, lucho, maximilian, netfs, regressions, sedat.dilek, v9fs, Matthew Wilcox, dhowells asmadeus@codewreck.org wrote on Mon, Aug 11, 2025 at 09:57:51AM +0900: > Also I haven't been able to reproduce it with a kernel I built myself/my > environment, but it reproduces reliably 99% of the times in the nixos > VM, so we're missing a last piece for a "simple" (non-nix) reproducer, > but I think it's good enough for me to dig into this; (okay, I got this to work wedging my kernel into the nixos initrd; this requires the iov to be built off non-contiguous pages and so having systemd thrash around was apparently a required step...) So that wasn't a 9p bug, I'm not sure if I should be happy or not? I've sent "proper-ish" patches at [1] which most concerned people should be in Cc; I'm fairly confident this will make the bug go away but any testing is appreciated, please reply to the patches with a Tested-by if you have time. [1] https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewreck.org Thank you all again, and sorry I haven't had the time to look further into this without a clean reproducer, this really shouldn't have taken that long... -- Dominique Martinet | Asmadeus ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2025-08-11 7:43 ` Dominique Martinet @ 2025-08-11 12:43 ` Arnout Engelen 0 siblings, 0 replies; 28+ messages in thread From: Arnout Engelen @ 2025-08-11 12:43 UTC (permalink / raw) To: Dominique Martinet Cc: ryan, antony.antony, antony, brauner, ericvh, linux-fsdevel, linux-kernel, linux_oss, lucho, maximilian, netfs, regressions, sedat.dilek, v9fs, Matthew Wilcox, dhowells On Mon, Aug 11, 2025, at 02:57, asmadeus@codewreck.org wrote: > Arnout Engelen wrote on Sun, Aug 10, 2025 at 07:57:11PM +0200: > > I have a smallish nix-based reproducer at [3], and a more involved setup > > with a lot of logging enabled and a convenient way to attach gdb at [4]. > > You start the VM and then 'cat /repro/default.json' manually, and see if > > it looks 'truncated'. > > Thank you!!! I was able to reproduce with this! > > Anyway this is a huge leap forward (hopeful it's the same problem and we > don't have two similar issues lurking here...), we can't thank you > enough. Great - that means a lot ;) On Mon, Aug 11, 2025, at 09:43, Dominique Martinet wrote: > So that wasn't a 9p bug, I'm not sure if I should be happy or not? :D > I've sent "proper-ish" patches at [1] which most concerned people should > be in Cc; I'm fairly confident this will make the bug go away but any > testing is appreciated, please reply to the patches with a Tested-by if > you have time. > > [1] https://lkml.kernel.org/r/20250811-iot_iter_folio-v1-0-d9c223adf93c@codewreck.org Awesome! Kind regards, -- Arnout Engelen Engelen Open Source https://engelen.eu ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [REGRESSION] 9pfs issues on 6.12-rc1 2024-10-17 18:00 ` Antony Antony ` (4 preceding siblings ...) 2024-10-23 10:07 ` [REGRESSION] 9pfs issues on 6.12-rc1 David Howells @ 2024-10-23 18:35 ` Maximilian Bosch 5 siblings, 0 replies; 28+ messages in thread From: Maximilian Bosch @ 2024-10-23 18:35 UTC (permalink / raw) To: Antony Antony, Sedat Dilek Cc: Linux regressions mailing list, David Howells, LKML, linux-fsdevel, Christian Brauner Hi, On Thu Oct 17, 2024 at 8:00 PM CEST, Antony Antony wrote: > Hi, > > On Thu, Oct 03, 2024 at 03:12:15AM +0200, Sedat Dilek wrote: > > On Wed, Oct 2, 2024 at 11:58 PM Maximilian Bosch <maximilian@mbosch.me> wrote: > > > > > > Good evening, > > > > > > thanks a lot for the quick reply! > > > > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > > > I applied the patch on top of Linux 6.12-rc1 locally and I can confirm > > > that this resolves the issue, thanks! > > Maximilian, would you like to re-run the test a few times? I wonder if there > is another intermittend bug related to the same commit. Apologies for getting back to you that late, have had a few busy days, but it seems you figured out how to test it anyways? For completeness sake, I ran the `linux_testing` test-case from `kernel-generic.nix` in a loop 40 times each on Linux 6.12-rc4 and every single run booted and succeeded on two different machines. So I'm not sure if the test-case is sufficient to trigger it. Let me know if there's anything else I can do to help. > result=$(readlink -f ./result); rm ./result && nix-store --delete $result Just as a well-meant hint, it's possible to rebuild existing things with `nix-build --check` (which also gives feedback on if the new build result has changed). > > > > > > > With best regards > > > > > > Maximilian > > > > > > > Thanks for testing. > > > > For the records: > > > > iov_iter: fix advancing slot in iter_folioq_get_pages() > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/commit/?h=vfs.fixes&id=0d24852bd71ec85ca0016b6d6fc997e6a3381552 > > > > https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs.fixes > > I’m still seeing a kernel oops after the fix in 6.12-rc3, but I’ve noticed > that the issue is no longer 100% reproducible. Most of the time, the system > crashes. Before this fix it was 100% reproducible. > > When using the nix testing, I have to force the test to re-run. > > result=$(readlink -f ./result); rm ./result && nix-store --delete $result > > nix-build -v nixos/tests/kernel-generic.nix -A linux_testing > > So may be there is a new bug showing up after the fix. I have reported it. > > https://lore.kernel.org/regressions/ZxFEi1Tod43pD6JC@moon.secunet.de/T/#u > > -antony > > > > > > > > > On Wed Oct 2, 2024 at 7:31 PM CEST, Linux regression tracking (Thorsten Leemhuis) wrote: > > > > Hi, Thorsten here, the Linux kernel's regression tracker. Top-posting > > > > for once, to make this easily accessible to everyone. > > > > > > > > Thx for the report. Not my area of expertise (so everyone: corrent me if > > > > I'm wrong), but I suspect your problem might be a duplicate of the > > > > following report, which was bisected to the same commit from dhowells > > > > (ee4cdf7ba857a8 ("netfs: Speed up buffered reading") [v6.12-rc1]): > > > > https://lore.kernel.org/all/20240923183432.1876750-1-chantr4@gmail.com/ > > > > > > > > A fix for it is already pending in the vfs.fixes branch and -next: > > > > https://lore.kernel.org/all/cbaf141ba6c0e2e209717d02746584072844841a.1727722269.git.osandov@fb.com/ > > > > > > > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > > > > -- > > > > Everything you wanna know about Linux kernel regression tracking: > > > > https://linux-regtracking.leemhuis.info/about/#tldr > > > > If I did something stupid, please tell me, as explained on that page. > > > > > > > > On 02.10.24 19:08, Maximilian Bosch wrote: > > > > > > > > > > Starting with Linux 6.12-rc1 the automatic VM tests of NixOS don't boot > > > > > anymore and fail like this: > > > > > > mounting nix-store on /nix/.ro-store... > > > > > [ 1.604781] 9p: Installing v9fs 9p2000 file system support > > > > > mounting tmpfs on /nix/.rw-store... > > > > > mounting overlay on /nix/store... > > > > > mounting shared on /tmp/shared... > > > > > mounting xchg on /tmp/xchg... > > > > > switch_root: can't execute '/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init': Exec format error > > > > > [ 1.734997] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000100 > > > > > [ 1.736002] CPU: 0 UID: 0 PID: 1 Comm: switch_root Not tainted 6.12.0-rc1 #1-NixOS > > > > > [ 1.736965] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > > [ 1.738309] Call Trace: > > > > > [ 1.738698] <TASK> > > > > > [ 1.739034] panic+0x324/0x340 > > > > > [ 1.739458] do_exit+0x92e/0xa90 > > > > > [ 1.739919] ? count_memcg_events.constprop.0+0x1a/0x40 > > > > > [ 1.740568] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.741095] ? handle_mm_fault+0xb0/0x2e0 > > > > > [ 1.741709] do_group_exit+0x30/0x80 > > > > > [ 1.742229] __x64_sys_exit_group+0x18/0x20 > > > > > [ 1.742800] x64_sys_call+0x17f3/0x1800 > > > > > [ 1.743326] do_syscall_64+0xb7/0x210 > > > > > [ 1.743895] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > [ 1.744530] RIP: 0033:0x7f8e1a7b9d1d > > > > > [ 1.745061] Code: 45 31 c0 45 31 d2 45 31 db c3 0f 1f 00 f3 0f 1e fa 48 8b 35 e5 e0 10 00 ba e7 00 00 00 eb 07 66 0f 1f 44 00 00 f4 89 d0 0f 05 <48> 3d 00 f0 ff ff 76 f3 f7 d8 64 89 06 eb ec 0f 1f 40 00 f3 0f 1e > > > > > [ 1.747263] RSP: 002b:00007ffcb56d63b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > > > > [ 1.748250] RAX: ffffffffffffffda RBX: 00007f8e1a8c9fa8 RCX: 00007f8e1a7b9d1d > > > > > [ 1.749187] RDX: 00000000000000e7 RSI: ffffffffffffff88 RDI: 0000000000000001 > > > > > [ 1.750050] RBP: 0000000000000001 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.750891] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > > > > > [ 1.751706] R13: 0000000000000001 R14: 00007f8e1a8c8680 R15: 00007f8e1a8c9fc0 > > > > > [ 1.752583] </TASK> > > > > > [ 1.753010] Kernel Offset: 0xb800000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > > > > > The failing script here is the initrd's /init when it tries to perform a > > > > > switch_root to `/sysroot`: > > > > > > > > > > exec env -i $(type -P switch_root) "$targetRoot" "$stage2Init" > > > > > > > > > > Said "$stage2Init" file consistently gets a different hash when doing > > > > > `sha256sum` on it in the initrd script, but looks & behaves correct > > > > > on the host. I reproduced the test failures on 4 different build > > > > > machines and two architectures (x86_64-linux, aarch64-linux) now. > > > > > > > > > > The "$stage2Init" script is a shell-script itself. When trying to > > > > > start the interpreter from its shebang inside the initrd (via > > > > > `$targetRoot/nix/store/...-bash-5.2p32/bin/bash`) and do the > > > > > switch_root I get a different error: > > > > > > > > > > + exec env -i /nix/store/akm69s5sngxyvqrzys326dss9rsrvbpy-extra-utils/bin/switch_root /mnt-root /nix/store/k3pm4iv44y7x7p74kky6cwxiswmr6kpi-nixos-system-machine-test/init > > > > > [ 1.912859] list_del corruption. prev->next should be ffffc5cf80be0248, but was ffffc5cf80bd9208. (prev=ffffc5cf80bb4d48) > > > > > [ 1.914237] ------------[ cut here ]------------ > > > > > [ 1.915059] kernel BUG at lib/list_debug.c:62! > > > > > [ 1.915854] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > > [ 1.916739] CPU: 0 UID: 0 PID: 17 Comm: ksoftirqd/0 Not tainted 6.12.0-rc1 #1-NixOS > > > > > [ 1.917837] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > > [ 1.919354] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.920180] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > > > [ 1.922636] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > > > [ 1.923563] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > > > [ 1.924692] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > > [ 1.925664] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.926646] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > > > [ 1.927584] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > > > [ 1.928533] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > > > [ 1.929647] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 1.930431] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > > > [ 1.931333] Call Trace: > > > > > [ 1.931727] <TASK> > > > > > [ 1.932088] ? die+0x36/0x90 > > > > > [ 1.932595] ? do_trap+0xed/0x110 > > > > > [ 1.933047] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.933757] ? do_error_trap+0x6a/0xa0 > > > > > [ 1.934390] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.935073] ? exc_invalid_op+0x51/0x80 > > > > > [ 1.935627] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.936326] ? asm_exc_invalid_op+0x1a/0x20 > > > > > [ 1.936904] ? __list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.937622] free_pcppages_bulk+0x130/0x280 > > > > > [ 1.938151] free_unref_page_commit+0x21c/0x380 > > > > > [ 1.938753] free_unref_page+0x472/0x4f0 > > > > > [ 1.939343] __put_partials+0xee/0x130 > > > > > [ 1.939921] ? rcu_do_batch+0x1f2/0x800 > > > > > [ 1.940471] kmem_cache_free+0x2c3/0x370 > > > > > [ 1.940990] rcu_do_batch+0x1f2/0x800 > > > > > [ 1.941508] ? rcu_do_batch+0x180/0x800 > > > > > [ 1.942031] rcu_core+0x182/0x340 > > > > > [ 1.942500] handle_softirqs+0xe4/0x2f0 > > > > > [ 1.943034] run_ksoftirqd+0x33/0x40 > > > > > [ 1.943522] smpboot_thread_fn+0xdd/0x1d0 > > > > > [ 1.944056] ? __pfx_smpboot_thread_fn+0x10/0x10 > > > > > [ 1.944679] kthread+0xd0/0x100 > > > > > [ 1.945126] ? __pfx_kthread+0x10/0x10 > > > > > [ 1.945656] ret_from_fork+0x34/0x50 > > > > > [ 1.946151] ? __pfx_kthread+0x10/0x10 > > > > > [ 1.946680] ret_from_fork_asm+0x1a/0x30 > > > > > [ 1.947269] </TASK> > > > > > [ 1.947622] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata uhci_hcd scsi_mod ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > > [ 1.952291] ---[ end trace 0000000000000000 ]--- > > > > > [ 1.952893] RIP: 0010:__list_del_entry_valid_or_report+0xb4/0xd0 > > > > > [ 1.953678] Code: 0f 0b 48 89 fe 48 89 ca 48 c7 c7 38 52 41 9f e8 42 91 ac ff 90 0f 0b 48 89 fe 48 89 c2 48 c7 c7 70 52 41 9f e8 2d 91 ac ff 90 <0f> 0b 48 89 d1 48 c7 c7 c0 52 41 9f 48 89 f2 48 89 c6 e8 15 91 ac > > > > > [ 1.955888] RSP: 0018:ffff96f800093c00 EFLAGS: 00010046 > > > > > [ 1.956548] RAX: 000000000000006d RBX: 0000000000000001 RCX: 0000000000000000 > > > > > [ 1.957436] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > > [ 1.958328] RBP: 0000000000000341 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.959166] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8fbebd83dc90 > > > > > [ 1.960044] R13: ffffc5cf80be0240 R14: ffff8fbebd83dc80 R15: 000000000002f809 > > > > > [ 1.960905] FS: 0000000000000000(0000) GS:ffff8fbebd800000(0000) knlGS:0000000000000000 > > > > > [ 1.961926] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 1.962693] CR2: 00007fed6f09b000 CR3: 0000000001e02000 CR4: 0000000000350ef0 > > > > > [ 1.963548] Kernel panic - not syncing: Fatal exception in interrupt > > > > > [ 1.964417] Kernel Offset: 0x1ce00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) > > > > > > > > > > On a subsequent run to verify this, it failed earlier while reading > > > > > $targetRoot/.../bash like this: > > > > > > > > > > > > > > > [ 1.871810] BUG: Bad page state in process cat pfn:2e74a > > > > > [ 1.872481] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x1e5 pfn:0x2e74a > > > > > [ 1.873499] flags: 0xffffc000000000(node=0|zone=1|lastcpupid=0x1ffff) > > > > > [ 1.874260] raw: 00ffffc000000000 dead000000000100 dead000000000122 0000000000000000 > > > > > [ 1.875250] raw: 00000000000001e5 0000000000000000 00000001ffffffff 0000000000000000 > > > > > [ 1.876295] page dumped because: nonzero _refcount > > > > > [ 1.876910] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > > [ 1.881465] CPU: 0 UID: 0 PID: 315 Comm: cat Not tainted 6.12.0-rc1 #1-NixOS > > > > > [ 1.882326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > > [ 1.883684] Call Trace: > > > > > [ 1.884103] <TASK> > > > > > [ 1.884440] dump_stack_lvl+0x64/0x90 > > > > > [ 1.884954] bad_page+0x70/0x110 > > > > > [ 1.885468] __rmqueue_pcplist+0x209/0xd00 > > > > > [ 1.886029] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.886572] ? pdu_read+0x36/0x50 [9pnet] > > > > > [ 1.887177] get_page_from_freelist+0x2df/0x1910 > > > > > [ 1.887788] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.888324] ? enqueue_entity+0xce/0x510 > > > > > [ 1.888881] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.889415] ? pick_eevdf+0x76/0x1a0 > > > > > [ 1.889970] ? update_curr+0x35/0x270 > > > > > [ 1.890476] __alloc_pages_noprof+0x1a3/0x1150 > > > > > [ 1.891158] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.891712] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > > [ 1.892346] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.892919] ? __lruvec_stat_mod_folio+0x83/0xd0 > > > > > [ 1.893521] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > > > [ 1.894148] folio_alloc_noprof+0x5b/0xb0 > > > > > [ 1.894671] page_cache_ra_unbounded+0x11f/0x200 > > > > > [ 1.895270] filemap_get_pages+0x538/0x6d0 > > > > > [ 1.895813] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.896361] filemap_splice_read+0x136/0x320 > > > > > [ 1.896948] backing_file_splice_read+0x52/0xa0 > > > > > [ 1.897522] ovl_splice_read+0xd2/0xf0 [overlay] > > > > > [ 1.898160] ? __pfx_ovl_file_accessed+0x10/0x10 [overlay] > > > > > [ 1.898817] splice_direct_to_actor+0xb4/0x270 > > > > > [ 1.899404] ? __pfx_direct_splice_actor+0x10/0x10 > > > > > [ 1.900103] do_splice_direct+0x77/0xd0 > > > > > [ 1.900627] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > > > [ 1.901308] do_sendfile+0x359/0x410 > > > > > [ 1.901788] __x64_sys_sendfile64+0xb9/0xd0 > > > > > [ 1.902370] do_syscall_64+0xb7/0x210 > > > > > [ 1.902904] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > [ 1.903604] RIP: 0033:0x7fa9f3a7289e > > > > > [ 1.904214] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > > > [ 1.906436] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > > > [ 1.907400] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > > > [ 1.908241] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > > > [ 1.909184] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.910212] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > > > [ 1.911117] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > > > [ 1.911998] </TASK> > > > > > [ 1.912376] Disabling lock debugging due to kernel taint > > > > > [ 1.913479] list_del corruption. next->prev should be ffffc80e40b9d948, but was ffffc80e40b9d0c8. (next=ffffc80e40b9c7c8) > > > > > [ 1.914823] ------------[ cut here ]------------ > > > > > [ 1.915408] kernel BUG at lib/list_debug.c:65! > > > > > [ 1.916050] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > > > > > [ 1.916785] CPU: 0 UID: 0 PID: 315 Comm: cat Tainted: G B 6.12.0-rc1 #1-NixOS > > > > > [ 1.917877] Tainted: [B]=BAD_PAGE > > > > > [ 1.918350] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > > > > [ 1.919996] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.920903] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > > > [ 1.923423] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > > > [ 1.924210] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > > > [ 1.925147] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > > [ 1.926051] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.926940] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > > > [ 1.927809] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > > > [ 1.928695] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > > > [ 1.929728] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 1.930540] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > > > [ 1.931444] Call Trace: > > > > > [ 1.931916] <TASK> > > > > > [ 1.932357] ? die+0x36/0x90 > > > > > [ 1.932831] ? do_trap+0xed/0x110 > > > > > [ 1.933385] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.934073] ? do_error_trap+0x6a/0xa0 > > > > > [ 1.934583] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.935242] ? exc_invalid_op+0x51/0x80 > > > > > [ 1.935781] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.936484] ? asm_exc_invalid_op+0x1a/0x20 > > > > > [ 1.937174] ? __list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.937926] ? __list_del_entry_valid_or_report+0xcb/0xd0 > > > > > [ 1.938685] __rmqueue_pcplist+0xa5/0xd00 > > > > > [ 1.939292] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.940004] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > > [ 1.940758] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.941417] ? update_load_avg+0x7e/0x7f0 > > > > > [ 1.942133] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.942838] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.943508] get_page_from_freelist+0x2df/0x1910 > > > > > [ 1.944143] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.944696] ? check_preempt_wakeup_fair+0x1ee/0x240 > > > > > [ 1.945335] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.945905] __alloc_pages_noprof+0x1a3/0x1150 > > > > > [ 1.946489] ? __blk_flush_plug+0xf5/0x150 > > > > > [ 1.947105] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.947629] ? __dquot_alloc_space+0x2a8/0x3a0 > > > > > [ 1.948404] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.949116] ? __mod_memcg_lruvec_state+0xa9/0x160 > > > > > [ 1.949888] alloc_pages_mpol_noprof+0x8f/0x1f0 > > > > > [ 1.950514] folio_alloc_mpol_noprof+0x14/0x40 > > > > > [ 1.951153] shmem_alloc_folio+0xa7/0xd0 > > > > > [ 1.951692] ? shmem_recalc_inode+0x20/0x90 > > > > > [ 1.952272] shmem_alloc_and_add_folio+0x109/0x490 > > > > > [ 1.952940] ? filemap_get_entry+0x10f/0x1a0 > > > > > [ 1.953570] ? srso_return_thunk+0x5/0x5f > > > > > [ 1.954185] shmem_get_folio_gfp+0x248/0x610 > > > > > [ 1.954791] shmem_write_begin+0x64/0x110 > > > > > [ 1.955484] generic_perform_write+0xdf/0x2a0 > > > > > [ 1.956239] shmem_file_write_iter+0x8a/0x90 > > > > > [ 1.956882] iter_file_splice_write+0x33f/0x580 > > > > > [ 1.957577] direct_splice_actor+0x54/0x140 > > > > > [ 1.958178] splice_direct_to_actor+0xec/0x270 > > > > > [ 1.958813] ? __pfx_direct_splice_actor+0x10/0x10 > > > > > [ 1.959442] do_splice_direct+0x77/0xd0 > > > > > [ 1.960018] ? __pfx_direct_file_splice_eof+0x10/0x10 > > > > > [ 1.960726] do_sendfile+0x359/0x410 > > > > > [ 1.961248] __x64_sys_sendfile64+0xb9/0xd0 > > > > > [ 1.961905] do_syscall_64+0xb7/0x210 > > > > > [ 1.962467] entry_SYSCALL_64_after_hwframe+0x77/0x7f > > > > > [ 1.963211] RIP: 0033:0x7fa9f3a7289e > > > > > [ 1.963711] Code: 75 0e 00 f7 d8 64 89 02 b8 ff ff ff ff 31 d2 31 c9 31 ff 45 31 db c3 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 28 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 12 31 d2 31 c9 31 f6 31 ff 45 31 d2 45 31 db > > > > > [ 1.965846] RSP: 002b:00007ffe6a82bde8 EFLAGS: 00000246 ORIG_RAX: 0000000000000028 > > > > > [ 1.966788] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa9f3a7289e > > > > > [ 1.967644] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 0000000000000001 > > > > > [ 1.968480] RBP: 00007ffe6a82be50 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.969396] R10: 0000000001000000 R11: 0000000000000246 R12: 0000000000000001 > > > > > [ 1.970315] R13: 0000000001000000 R14: 0000000000000001 R15: 0000000000000000 > > > > > [ 1.971214] </TASK> > > > > > [ 1.971572] Modules linked in: overlay 9p ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid hid 9pnet_virtio 9pnet netfs sr_mod virtio_net cdrom virtio_blk net_failover atkbd failover libps2 vivaldi_fmap crc32c_intel ata_piix libata scsi_mod uhci_hcd ehci_hcd virtio_pci virtio_pci_legacy_dev virtio_pci_modern_dev scsi_common i8042 serio rtc_cmos dm_mod dax virtio_gpu virtio_dma_buf virtio_rng rng_core virtio_console virtio_balloon virtio virtio_ring > > > > > [ 1.976558] ---[ end trace 0000000000000000 ]--- > > > > > [ 1.977219] RIP: 0010:__list_del_entry_valid_or_report+0xcc/0xd0 > > > > > [ 1.978033] Code: 89 fe 48 89 c2 48 c7 c7 70 52 41 ba e8 2d 91 ac ff 90 0f 0b 48 89 d1 48 c7 c7 c0 52 41 ba 48 89 f2 48 89 c6 e8 15 91 ac ff 90 <0f> 0b 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f > > > > > [ 1.980179] RSP: 0018:ffff9ed880187748 EFLAGS: 00010246 > > > > > [ 1.980847] RAX: 000000000000006d RBX: ffff94db3d83dc80 RCX: 0000000000000000 > > > > > [ 1.981705] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > > > > [ 1.982584] RBP: ffffc80e40b9d940 R08: 0000000000000000 R09: 0000000000000000 > > > > > [ 1.983464] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 > > > > > [ 1.984358] R13: ffff94db3d83dc80 R14: ffffc80e40b9d948 R15: ffff94db3ffd6180 > > > > > [ 1.987765] FS: 00007fa9f396eb80(0000) GS:ffff94db3d800000(0000) knlGS:0000000000000000 > > > > > [ 1.988805] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > [ 1.989497] CR2: 00000000004d1829 CR3: 0000000001dd2000 CR4: 0000000000350ef0 > > > > > [ 1.990418] note: cat[315] exited with preempt_count 2 > > > > > > > > > > I bisected it back to ee4cdf7ba857a894ad1650d6ab77669cbbfa329e which > > > > > also seems to touch part of the 9p VFS code. > > > > > > > > > > Unfortunately the revert didn't apply cleanly on 6.12-rc1, so I couldn't > > > > > meaningfully test whether a simple revert solves the problem. > > > > > > > > > > The VMs get the Nix store mounted via 9p. In the store are basically all > > > > > build artifacts including the stage-2 init script of the system that is > > > > > booted into in the VM test. > > > > > > > > > > The invocation basically looks like this: > > > > > > > > > > qemu-system-x86_64 -cpu max \ > > > > > -name machine \ > > > > > -m 1024 \ > > > > > -smp 1 \ > > > > > -device virtio-rng-pci \ > > > > > -net nic,netdev=user.0,model=virtio -netdev user,id=user.0,"$QEMU_NET_OPTS" \ > > > > > -virtfs local,path=/nix/store,security_model=none,mount_tag=nix-store \ > > > > > -virtfs local,path="${SHARED_DIR:-$TMPDIR/xchg}",security_model=none,mount_tag=shared \ > > > > > -virtfs local,path="$TMPDIR"/xchg,security_model=none,mount_tag=xchg \ > > > > > -drive cache=writeback,file="$NIX_DISK_IMAGE",id=drive1,if=none,index=1,werror=report -device virtio-blk-pci,bootindex=1,drive=drive1,serial=root \ > > > > > -device virtio-net-pci,netdev=vlan1,mac=52:54:00:12:01:01 \ > > > > > -netdev vde,id=vlan1,sock="$QEMU_VDE_SOCKET_1" \ > > > > > -device virtio-keyboard \ > > > > > -usb \ > > > > > -device usb-tablet,bus=usb-bus.0 \ > > > > > -kernel ${NIXPKGS_QEMU_KERNEL_machine:-/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel} \ > > > > > -initrd /nix/store/qqalw1iq1wbgq3ndx0cvqn3bfypn56w2-initrd-linux-6.12-rc1/initrd \ > > > > > -append "$(cat /nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/kernel-params) init=/nix/store/zv87gw0yxfsslq0mcc35a99k54da9a4z-nixos-system-machine-test/init regInfo=/nix/store/5izvfal6xm2rk51v0r1h2cxcng33paby-closure-info/registration console=ttyS0 $QEMU_KERNEL_PARAMS" \ > > > > > $QEMU_OPTS > > > > > > > > > > If you're using Nix, you can also reproduce this by running > > > > > > > > > > nix-build nixos/tests/kernel-generic.nix -A linux_testing > > > > > > > > > > on 5c19646b81db43dd7f4b6954f17d71a523009706 from https://github.com/nixos/nixpkgs. > > > > > > > > > > To me, this seems like a regression in rc1. > > > > > > > > > > Is there anything else I can do to help troubleshooting this? > > > > > > > > > > With best regards > > > > > > > > > > Maximilian > > > > > > > > > > > > > > > > With best regards Maximilian ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2025-08-11 12:43 UTC | newest] Thread overview: 28+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-02 17:08 [REGRESSION] 9pfs issues on 6.12-rc1 Maximilian Bosch 2024-10-02 17:31 ` Linux regression tracking (Thorsten Leemhuis) 2024-10-02 21:48 ` Maximilian Bosch 2024-10-03 1:12 ` Sedat Dilek 2024-10-17 18:00 ` Antony Antony 2024-10-21 13:23 ` Christian Brauner 2024-10-21 14:12 ` David Howells 2024-10-21 15:33 ` Antony Antony 2024-10-21 14:45 ` David Howells 2024-10-21 15:53 ` Antony Antony 2024-10-21 19:48 ` David Howells 2025-08-10 5:10 ` Arnout Engelen 2024-10-21 20:38 ` [PATCH] 9p: Don't revert the I/O iterator after reading David Howells 2024-10-21 23:53 ` Antony Antony 2024-10-22 8:56 ` Christian Brauner 2024-10-23 10:07 ` [REGRESSION] 9pfs issues on 6.12-rc1 David Howells 2024-10-23 19:38 ` Antony Antony 2025-06-12 22:24 ` Ryan Lahfa 2025-06-27 5:44 ` Christian Theune 2025-06-27 6:44 ` Dominique Martinet 2025-06-27 8:19 ` Christian Theune 2025-06-27 10:00 ` David Howells 2025-06-27 10:33 ` Ryan Lahfa 2025-08-10 17:57 ` Arnout Engelen 2025-08-11 0:57 ` asmadeus 2025-08-11 7:43 ` Dominique Martinet 2025-08-11 12:43 ` Arnout Engelen 2024-10-23 18:35 ` Maximilian Bosch
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.