* Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
@ 2023-05-22 13:25 Bagas Sanjaya
2023-05-22 16:00 ` Uladzislau Rezki
2023-05-22 19:04 ` Forza
0 siblings, 2 replies; 14+ messages in thread
From: Bagas Sanjaya @ 2023-05-22 13:25 UTC (permalink / raw)
To: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
Linux Stable
Cc: Chris Mason, Josef Bacik, David Sterba, a1bert
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg
>
>
>
> # free
> total used free shared buff/cache available
> Mem: 16183724 1473068 205664 33472 14504992 14335700
> Swap: 16777212 703596 16073616
>
>
> (zswap enabled)
See bugzilla for the full thread and attached dmesg.
On the report, the reporter can't perform the required bisection,
unfortunately.
Anyway, I'm adding it to regzbot:
#regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466
#regzbot title: btrfs_work_helper dealloc error in v6.3.x
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466
--
An old man doll... just what I always wanted! - Clara
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya @ 2023-05-22 16:00 ` Uladzislau Rezki 2023-05-22 19:09 ` David Sterba 2023-05-22 19:04 ` Forza 1 sibling, 1 reply; 14+ messages in thread From: Uladzislau Rezki @ 2023-05-22 16:00 UTC (permalink / raw) To: Bagas Sanjaya Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert > Hi, > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg > > > > > > > > # free > > total used free shared buff/cache available > > Mem: 16183724 1473068 205664 33472 14504992 14335700 > > Swap: 16777212 703596 16073616 > > > > > > (zswap enabled) > > See bugzilla for the full thread and attached dmesg. > > On the report, the reporter can't perform the required bisection, > unfortunately. > > Anyway, I'm adding it to regzbot: > > #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466 > #regzbot title: btrfs_work_helper dealloc error in v6.3.x > > Thanks. > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466 > According to dmesg output from the bugzilla, the vmalloc tries to allocate high order pages: 1 << 9. Since it fails to get a order-9 page you get the warning: <snip> if (area->nr_pages != nr_small_pages) { /* vm_area_alloc_pages() can also fail due to a fatal signal */ if (!fatal_signal_pending(current)) warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, page order %u, failed to allocate pages", area->nr_pages * PAGE_SIZE, page_order); goto fail; } <snip> and it fails. If the __GFP_NOFAIL is passed, the vm_area_alloc_pages() function switches to allocate 0-order pages instead. I think the fix is to call the kvmalloc_node() with __GFP_NOFAIL flag. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-22 16:00 ` Uladzislau Rezki @ 2023-05-22 19:09 ` David Sterba 0 siblings, 0 replies; 14+ messages in thread From: David Sterba @ 2023-05-22 19:09 UTC (permalink / raw) To: Uladzislau Rezki Cc: Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert, linux-mm On Mon, May 22, 2023 at 06:00:42PM +0200, Uladzislau Rezki wrote: > > Hi, > > > > I notice a regression report on Bugzilla [1]. Quoting from it: > > > > > after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg > > > > > > > > > > > > # free > > > total used free shared buff/cache available > > > Mem: 16183724 1473068 205664 33472 14504992 14335700 > > > Swap: 16777212 703596 16073616 > > > > > > > > > (zswap enabled) > > > > See bugzilla for the full thread and attached dmesg. > > > > On the report, the reporter can't perform the required bisection, > > unfortunately. > > > > Anyway, I'm adding it to regzbot: > > > > #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466 > > #regzbot title: btrfs_work_helper dealloc error in v6.3.x > > > > Thanks. > > > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466 > > > According to dmesg output from the bugzilla, the vmalloc tries to > allocate high order pages: 1 << 9. Since it fails to get a order-9 page > you get the warning: That we want a order 9 is intentional, it's for a compression workspace (bugzilla comment 5). It's allocated as kvzalloc i.e. with the fallback to vmalloc in case the first one fails. > <snip> > if (area->nr_pages != nr_small_pages) { > /* vm_area_alloc_pages() can also fail due to a fatal signal */ > if (!fatal_signal_pending(current)) > warn_alloc(gfp_mask, NULL, > "vmalloc error: size %lu, page order %u, failed to allocate pages", > area->nr_pages * PAGE_SIZE, page_order); > goto fail; > } > <snip> > > and it fails. > > If the __GFP_NOFAIL is passed, the vm_area_alloc_pages() function switches > to allocate 0-order pages instead. I think the fix is to call the > kvmalloc_node() with __GFP_NOFAIL flag. __GFP_NOFAIL does not make sense here and we've tried hard not to used it anywhere because of the deadlocky effects. Did you mean __GFP_NOWARN? That's a patch I sent today but there's another comment in the bugzilla that we got more allocation warnings for huge (2M) allocations, this time it was for a deduplication ioctl. This seems to be a noticeable change in 6.3, before we disable the warning in our code I think the MM guys could have a look. So far it seems we're about to paper of a problem. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya 2023-05-22 16:00 ` Uladzislau Rezki @ 2023-05-22 19:04 ` Forza 2023-05-23 1:52 ` Bagas Sanjaya 1 sibling, 1 reply; 14+ messages in thread From: Forza @ 2023-05-22 19:04 UTC (permalink / raw) To: Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable Cc: Chris Mason, Josef Bacik, David Sterba, a1bert ---- From: Bagas Sanjaya <bagasdotme@gmail.com> -- Sent: 2023-05-22 - 15:25 ---- > Hi, > > I notice a regression report on Bugzilla [1]. Quoting from it: > >> after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg >> >> >> >> # free >> total used free shared buff/cache available >> Mem: 16183724 1473068 205664 33472 14504992 14335700 >> Swap: 16777212 703596 16073616 >> >> >> (zswap enabled) > > See bugzilla for the full thread and attached dmesg. > > On the report, the reporter can't perform the required bisection, > unfortunately. > > Anyway, I'm adding it to regzbot: > > #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466 > #regzbot title: btrfs_work_helper dealloc error in v6.3.x > > Thanks. > > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466 > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-22 19:04 ` Forza @ 2023-05-23 1:52 ` Bagas Sanjaya 2023-05-23 10:28 ` Uladzislau Rezki 0 siblings, 1 reply; 14+ messages in thread From: Bagas Sanjaya @ 2023-05-23 1:52 UTC (permalink / raw) To: Forza, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable Cc: Chris Mason, Josef Bacik, David Sterba, a1bert [-- Attachment #1: Type: text/plain, Size: 515 bytes --] On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote: > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. > > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/ Thanks for your similar report. Telling regzbot about it: #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/ -- An old man doll... just what I always wanted! - Clara [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-23 1:52 ` Bagas Sanjaya @ 2023-05-23 10:28 ` Uladzislau Rezki 2023-05-23 21:25 ` Forza 2023-05-24 5:57 ` Forza 0 siblings, 2 replies; 14+ messages in thread From: Uladzislau Rezki @ 2023-05-23 10:28 UTC (permalink / raw) To: Bagas Sanjaya, Forza Cc: Forza, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert, urezki On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote: > On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote: > > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. > > > > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/ > > Thanks for your similar report. Telling regzbot about it: > > #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/ > It is good that you can reproduce it. Could you please test below patch? <snip> diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 31ff782d368b..7a06452f7807 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages(alloc_gfp, order); else page = alloc_pages_node(nid, alloc_gfp, order); + if (unlikely(!page)) { - if (!nofail) - break; + if (nofail) + alloc_gfp |= __GFP_NOFAIL; - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; + /* Fall back to the zero order allocations. */ + if (order || nofail) { + order = 0; + continue; + } + + break; } /* <snip> Thanks! -- Uladzislau Rezki ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-23 10:28 ` Uladzislau Rezki @ 2023-05-23 21:25 ` Forza 2023-05-24 5:57 ` Forza 1 sibling, 0 replies; 14+ messages in thread From: Forza @ 2023-05-23 21:25 UTC (permalink / raw) To: Uladzislau Rezki, Bagas Sanjaya Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert, urezki ---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-23 - 12:28 ---- > On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote: >> On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote: >> > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. >> > >> > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/ >> >> Thanks for your similar report. Telling regzbot about it: >> >> #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/ >> > It is good that you can reproduce it. Could you please test below patch? Yes, applied it to my test VM and will let it run over night to see how it turns out. I'll post again tomorrow. Thanks. > > <snip> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 31ff782d368b..7a06452f7807 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > page = alloc_pages(alloc_gfp, order); > else > page = alloc_pages_node(nid, alloc_gfp, order); > + > if (unlikely(!page)) { > - if (!nofail) > - break; > + if (nofail) > + alloc_gfp |= __GFP_NOFAIL; > > - /* fall back to the zero order allocations */ > - alloc_gfp |= __GFP_NOFAIL; > - order = 0; > - continue; > + /* Fall back to the zero order allocations. */ > + if (order || nofail) { > + order = 0; > + continue; > + } > + > + break; > } > > /* > <snip> > > Thanks! > > -- > Uladzislau Rezki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-23 10:28 ` Uladzislau Rezki 2023-05-23 21:25 ` Forza @ 2023-05-24 5:57 ` Forza 2023-05-24 9:13 ` David Sterba 1 sibling, 1 reply; 14+ messages in thread From: Forza @ 2023-05-24 5:57 UTC (permalink / raw) To: Uladzislau Rezki, Bagas Sanjaya Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert, urezki ---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-23 - 12:28 ---- > On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote: >> On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote: >> > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. >> > >> > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/ >> >> Thanks for your similar report. Telling regzbot about it: >> >> #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/ >> > It is good that you can reproduce it. Could you please test below patch? > > <snip> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 31ff782d368b..7a06452f7807 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid, > page = alloc_pages(alloc_gfp, order); > else > page = alloc_pages_node(nid, alloc_gfp, order); > + > if (unlikely(!page)) { > - if (!nofail) > - break; > + if (nofail) > + alloc_gfp |= __GFP_NOFAIL; > > - /* fall back to the zero order allocations */ > - alloc_gfp |= __GFP_NOFAIL; > - order = 0; > - continue; > + /* Fall back to the zero order allocations. */ > + if (order || nofail) { > + order = 0; > + continue; > + } > + > + break; > } > > /* > <snip> > > Thanks! > > -- > Uladzislau Rezki There is a different result now that I have not seen before. The full dmesg is available at https://paste.tnonline.net/files/pnnW6gYASxWX_dmesg-mm-patch.txt [ 8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0 [ 13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. [ 13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. [13917.280527] ------------[ cut here ]------------ [13917.280753] default_enter_idle leaked IRQ state [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430 [13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg [13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4 [13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430 [13917.281162] Code: 0f 1f 44 00 00 e9 a7 fd ff ff 80 3d 3a 3b d7 00 00 75 19 49 8b 75 50 48 c7 c7 ab b6 79 ac c6 05 26 3b d7 00 01 e8 a5 c4 20 ff <0f> 0b fa 0f 1f 44 00 00 e9 ca fc ff ff 83 c0 01 48 83 c1 40 39 f8 [13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286 [13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f [13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f [13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60 [13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320 [13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000 [13917.281202] FS: 0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000 [13917.281206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0 [13917.281217] Call Trace: [13917.281221] <TASK> [13917.281228] cpuidle_enter+0x29/0x40 [13917.281244] do_idle+0x19b/0x200 [13917.281292] cpu_startup_entry+0x19/0x20 [13917.281297] start_secondary+0x101/0x120 [13917.281324] secondary_startup_64_no_verify+0xe5/0xeb [13917.281343] </TASK> [13917.281346] ---[ end trace 0000000000000000 ]--- [17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm [17206.750190] BTRFS info (device vdb): using free space tree [17206.904010] BTRFS info (device vdb): auto enabling async discard [17206.933302] BTRFS info (device vdb): checking UUID tree [17344.541839] sched: RT throttling activated [18284.216538] hrtimer: interrupt took 23434934 ns [18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0 [18737.100883] #PF: supervisor read access in kernel mode [18737.101155] #PF: error_code(0x0000) - not-present page [18737.101462] PGD 0 P4D 0 [18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI [18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G W 6.3.1-gentoo-mm-patched #4 [18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper [18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0 [18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 [18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 [18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 [18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 [18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 [18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 [18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 [18737.107676] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 [18737.107971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 [18737.108606] Call Trace: [18737.108964] <TASK> [18737.109273] btrfs_reserve_extent+0x148/0x260 [18737.109601] submit_compressed_extents+0x14f/0x490 [18737.109934] async_cow_submit+0x37/0x90 [18737.110237] btrfs_work_helper+0x13d/0x360 [18737.110542] process_one_work+0x20f/0x410 [18737.110883] worker_thread+0x4a/0x3b0 [18737.111185] ? __pfx_worker_thread+0x10/0x10 [18737.111482] kthread+0xda/0x100 [18737.111800] ? __pfx_kthread+0x10/0x10 [18737.112097] ret_from_fork+0x2c/0x50 [18737.112387] </TASK> [18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg [18737.114021] CR2: 0000000079e0afc0 [18737.114366] ---[ end trace 0000000000000000 ]--- [18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0 [18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 [18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 [18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 [18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 [18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 [18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 [18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 [18737.120221] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 [18737.120994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 [18737.122624] note: kworker/u8:7[25287] exited with irqs disabled [19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0 [19006.922015] #PF: supervisor read access in kernel mode [19006.923354] #PF: error_code(0x0000) - not-present page [19006.924636] PGD 0 P4D 0 [19006.925868] Oops: 0000 [#2] PREEMPT SMP NOPTI [19006.927066] CPU: 0 PID: 24329 Comm: crawl_writeback Tainted: G D W 6.3.1-gentoo-mm-patched #4 [19006.928510] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 [19006.929817] RIP: 0010:find_free_extent+0x20a/0x15c0 [19006.931050] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 [19006.933653] RSP: 0018:ffffa153c0d0f568 EFLAGS: 00010203 [19006.934972] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 [19006.936236] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 [19006.937480] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 [19006.938750] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0d0f757 [19006.939986] R13: ffffa153c0d0f628 R14: 0000000000000001 R15: 0000000079e0af10 [19006.941255] FS: 00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000 [19006.942579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19006.943830] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0 [19006.945278] Call Trace: [19006.946730] <TASK> [19006.947792] ? release_pages+0x13e/0x490 [19006.948741] btrfs_reserve_extent+0x148/0x260 [19006.949637] cow_file_range+0x199/0x610 [19006.950396] btrfs_run_delalloc_range+0x103/0x520 [19006.951135] ? find_lock_delalloc_range+0x1ea/0x210 [19006.951802] writepage_delalloc+0xb9/0x180 [19006.952401] __extent_writepage+0xeb/0x410 [19006.952985] extent_write_cache_pages+0x152/0x3d0 [19006.953552] extent_writepages+0x4c/0x100 [19006.954116] do_writepages+0xbe/0x1d0 [19006.954672] ? memcmp_extent_buffer+0xa2/0xe0 [19006.955199] filemap_fdatawrite_wbc+0x5f/0x80 [19006.955726] __filemap_fdatawrite_range+0x4a/0x60 [19006.956219] btrfs_rename+0x529/0xb60 [19006.956711] ? psi_group_change+0x168/0x400 [19006.957280] btrfs_rename2+0x2a/0x60 [19006.957799] vfs_rename+0x5d4/0xeb0 [19006.958308] ? lookup_dcache+0x17/0x60 [19006.958784] ? do_renameat2+0x507/0x580 [19006.959239] do_renameat2+0x507/0x580 [19006.959702] __x64_sys_renameat+0x45/0x60 [19006.960293] do_syscall_64+0x5b/0xc0 [19006.960848] ? syscall_exit_to_user_mode+0x17/0x40 [19006.961331] ? do_syscall_64+0x67/0xc0 [19006.961812] ? syscall_exit_to_user_mode+0x17/0x40 [19006.962401] ? do_syscall_64+0x67/0xc0 [19006.963371] ? do_syscall_64+0x67/0xc0 [19006.964020] ? do_syscall_64+0x67/0xc0 [19006.965001] entry_SYSCALL_64_after_hwframe+0x72/0xdc [19006.965952] RIP: 0033:0x7fb25eba492a [19006.966485] Code: 48 8b 15 d9 44 17 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 08 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 a1 44 17 00 f7 [19006.967545] RSP: 002b:00007fb245ff8a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000108 [19006.968076] RAX: ffffffffffffffda RBX: 0000559a70a039f0 RCX: 00007fb25eba492a [19006.968623] RDX: 0000000000000004 RSI: 00007fb134000fc0 RDI: 0000000000000004 [19006.977319] RBP: 00007fb245ff8c60 R08: 0000000000000000 R09: 0000000000000000 [19006.977877] R10: 0000559a70a03a00 R11: 0000000000000246 R12: 00007fb245ff8c80 [19006.978301] R13: 0000000000000004 R14: 00007fb245ff8c60 R15: 00000000000070b5 [19006.978727] </TASK> [19006.979118] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg [19006.981463] CR2: 0000000079e0afc0 [19006.982193] ---[ end trace 0000000000000000 ]--- [19006.982938] RIP: 0010:find_free_extent+0x20a/0x15c0 [19006.983565] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 [19006.984863] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 [19006.985500] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 [19006.986195] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 [19006.986877] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 [19006.987581] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 [19006.988252] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 [19006.988984] FS: 00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000 [19006.989646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [19006.990336] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0 [19006.991037] note: crawl_writeback[24329] exited with irqs disabled ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-24 5:57 ` Forza @ 2023-05-24 9:13 ` David Sterba 2023-05-26 12:24 ` Uladzislau Rezki 0 siblings, 1 reply; 14+ messages in thread From: David Sterba @ 2023-05-24 9:13 UTC (permalink / raw) To: Forza Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert This looks like a different set of problems, though all of them seem to start on the compression write path in btrfs. On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote: > [ 8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0 > [ 13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. > [ 13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. > [13917.280527] ------------[ cut here ]------------ > [13917.280753] default_enter_idle leaked IRQ state > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430 Warning in cpuilde > [13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg > [13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4 > [13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > [13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430 > [13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286 > [13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f > [13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f > [13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60 > [13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320 > [13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000 > [13917.281202] FS: 0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000 > [13917.281206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0 > [13917.281217] Call Trace: > [13917.281221] <TASK> > [13917.281228] cpuidle_enter+0x29/0x40 > [13917.281244] do_idle+0x19b/0x200 > [13917.281292] cpu_startup_entry+0x19/0x20 > [13917.281297] start_secondary+0x101/0x120 > [13917.281324] secondary_startup_64_no_verify+0xe5/0xeb > [13917.281343] </TASK> > [13917.281346] ---[ end trace 0000000000000000 ]--- > [17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm > [17206.750190] BTRFS info (device vdb): using free space tree > [17206.904010] BTRFS info (device vdb): auto enabling async discard > [17206.933302] BTRFS info (device vdb): checking UUID tree > [17344.541839] sched: RT throttling activated > [18284.216538] hrtimer: interrupt took 23434934 ns > [18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0 BUG > [18737.100883] #PF: supervisor read access in kernel mode > [18737.101155] #PF: error_code(0x0000) - not-present page > [18737.101462] PGD 0 P4D 0 > [18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI > [18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G W 6.3.1-gentoo-mm-patched #4 > [18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > [18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper > [18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0 > [18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > [18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 > [18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > [18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > [18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > [18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 > [18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 > [18737.107676] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 > [18737.107971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 > [18737.108606] Call Trace: > [18737.108964] <TASK> > [18737.109273] btrfs_reserve_extent+0x148/0x260 > [18737.109601] submit_compressed_extents+0x14f/0x490 > [18737.109934] async_cow_submit+0x37/0x90 > [18737.110237] btrfs_work_helper+0x13d/0x360 > [18737.110542] process_one_work+0x20f/0x410 > [18737.110883] worker_thread+0x4a/0x3b0 > [18737.111185] ? __pfx_worker_thread+0x10/0x10 > [18737.111482] kthread+0xda/0x100 > [18737.111800] ? __pfx_kthread+0x10/0x10 > [18737.112097] ret_from_fork+0x2c/0x50 > [18737.112387] </TASK> > [18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg > [18737.114021] CR2: 0000000079e0afc0 > [18737.114366] ---[ end trace 0000000000000000 ]--- > [18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0 > [18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > [18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 > [18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > [18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > [18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > [18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 > [18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 > [18737.120221] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 > [18737.120994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 > [18737.122624] note: kworker/u8:7[25287] exited with irqs disabled > [19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0 And again, so something is going wrong > [19006.922015] #PF: supervisor read access in kernel mode > [19006.923354] #PF: error_code(0x0000) - not-present page > [19006.924636] PGD 0 P4D 0 > [19006.925868] Oops: 0000 [#2] PREEMPT SMP NOPTI > [19006.927066] CPU: 0 PID: 24329 Comm: crawl_writeback Tainted: G D W 6.3.1-gentoo-mm-patched #4 > [19006.928510] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > [19006.929817] RIP: 0010:find_free_extent+0x20a/0x15c0 > [19006.931050] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > [19006.933653] RSP: 0018:ffffa153c0d0f568 EFLAGS: 00010203 > [19006.934972] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > [19006.936236] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > [19006.937480] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > [19006.938750] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0d0f757 > [19006.939986] R13: ffffa153c0d0f628 R14: 0000000000000001 R15: 0000000079e0af10 > [19006.941255] FS: 00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000 > [19006.942579] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19006.943830] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0 > [19006.945278] Call Trace: > [19006.946730] <TASK> > [19006.947792] ? release_pages+0x13e/0x490 > [19006.948741] btrfs_reserve_extent+0x148/0x260 > [19006.949637] cow_file_range+0x199/0x610 > [19006.950396] btrfs_run_delalloc_range+0x103/0x520 > [19006.951135] ? find_lock_delalloc_range+0x1ea/0x210 > [19006.951802] writepage_delalloc+0xb9/0x180 > [19006.952401] __extent_writepage+0xeb/0x410 > [19006.952985] extent_write_cache_pages+0x152/0x3d0 > [19006.953552] extent_writepages+0x4c/0x100 > [19006.954116] do_writepages+0xbe/0x1d0 > [19006.954672] ? memcmp_extent_buffer+0xa2/0xe0 > [19006.955199] filemap_fdatawrite_wbc+0x5f/0x80 > [19006.955726] __filemap_fdatawrite_range+0x4a/0x60 > [19006.956219] btrfs_rename+0x529/0xb60 > [19006.956711] ? psi_group_change+0x168/0x400 > [19006.957280] btrfs_rename2+0x2a/0x60 > [19006.957799] vfs_rename+0x5d4/0xeb0 > [19006.958308] ? lookup_dcache+0x17/0x60 > [19006.958784] ? do_renameat2+0x507/0x580 > [19006.959239] do_renameat2+0x507/0x580 > [19006.959702] __x64_sys_renameat+0x45/0x60 > [19006.960293] do_syscall_64+0x5b/0xc0 > [19006.960848] ? syscall_exit_to_user_mode+0x17/0x40 > [19006.961331] ? do_syscall_64+0x67/0xc0 > [19006.961812] ? syscall_exit_to_user_mode+0x17/0x40 > [19006.962401] ? do_syscall_64+0x67/0xc0 > [19006.963371] ? do_syscall_64+0x67/0xc0 > [19006.964020] ? do_syscall_64+0x67/0xc0 > [19006.965001] entry_SYSCALL_64_after_hwframe+0x72/0xdc > [19006.965952] RIP: 0033:0x7fb25eba492a > [19006.966485] Code: 48 8b 15 d9 44 17 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 08 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 a1 44 17 00 f7 > [19006.967545] RSP: 002b:00007fb245ff8a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000108 > [19006.968076] RAX: ffffffffffffffda RBX: 0000559a70a039f0 RCX: 00007fb25eba492a > [19006.968623] RDX: 0000000000000004 RSI: 00007fb134000fc0 RDI: 0000000000000004 > [19006.977319] RBP: 00007fb245ff8c60 R08: 0000000000000000 R09: 0000000000000000 > [19006.977877] R10: 0000559a70a03a00 R11: 0000000000000246 R12: 00007fb245ff8c80 > [19006.978301] R13: 0000000000000004 R14: 00007fb245ff8c60 R15: 00000000000070b5 > [19006.978727] </TASK> > [19006.979118] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg > [19006.981463] CR2: 0000000079e0afc0 > [19006.982193] ---[ end trace 0000000000000000 ]--- > [19006.982938] RIP: 0010:find_free_extent+0x20a/0x15c0 > [19006.983565] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > [19006.984863] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 > [19006.985500] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > [19006.986195] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > [19006.986877] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > [19006.987581] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 > [19006.988252] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 > [19006.988984] FS: 00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000 > [19006.989646] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [19006.990336] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0 > [19006.991037] note: crawl_writeback[24329] exited with irqs disabled > > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-24 9:13 ` David Sterba @ 2023-05-26 12:24 ` Uladzislau Rezki 2023-07-02 23:28 ` Forza 0 siblings, 1 reply; 14+ messages in thread From: Uladzislau Rezki @ 2023-05-26 12:24 UTC (permalink / raw) To: Forza, Bagas Sanjaya Cc: Forza, Uladzislau Rezki, Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote: > This looks like a different set of problems, though all of them seem to > start on the compression write path in btrfs. > > On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote: > > [ 8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0 > > [ 13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. > > [ 13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. > > [13917.280527] ------------[ cut here ]------------ > > [13917.280753] default_enter_idle leaked IRQ state > > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430 > > Warning in cpuilde > > > [13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg > > [13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4 > > [13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > > [13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430 > > [13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286 > > [13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f > > [13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f > > [13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60 > > [13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320 > > [13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000 > > [13917.281202] FS: 0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000 > > [13917.281206] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0 > > [13917.281217] Call Trace: > > [13917.281221] <TASK> > > [13917.281228] cpuidle_enter+0x29/0x40 > > [13917.281244] do_idle+0x19b/0x200 > > [13917.281292] cpu_startup_entry+0x19/0x20 > > [13917.281297] start_secondary+0x101/0x120 > > [13917.281324] secondary_startup_64_no_verify+0xe5/0xeb > > [13917.281343] </TASK> > > [13917.281346] ---[ end trace 0000000000000000 ]--- > > [17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm > > [17206.750190] BTRFS info (device vdb): using free space tree > > [17206.904010] BTRFS info (device vdb): auto enabling async discard > > [17206.933302] BTRFS info (device vdb): checking UUID tree > > [17344.541839] sched: RT throttling activated > > [18284.216538] hrtimer: interrupt took 23434934 ns > > [18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0 > > BUG > > > [18737.100883] #PF: supervisor read access in kernel mode > > [18737.101155] #PF: error_code(0x0000) - not-present page > > [18737.101462] PGD 0 P4D 0 > > [18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI > > [18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G W 6.3.1-gentoo-mm-patched #4 > > [18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > > [18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper > > [18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0 > > [18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > > [18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 > > [18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > > [18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > > [18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > > [18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 > > [18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 > > [18737.107676] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 > > [18737.107971] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 > > [18737.108606] Call Trace: > > [18737.108964] <TASK> > > [18737.109273] btrfs_reserve_extent+0x148/0x260 > > [18737.109601] submit_compressed_extents+0x14f/0x490 > > [18737.109934] async_cow_submit+0x37/0x90 > > [18737.110237] btrfs_work_helper+0x13d/0x360 > > [18737.110542] process_one_work+0x20f/0x410 > > [18737.110883] worker_thread+0x4a/0x3b0 > > [18737.111185] ? __pfx_worker_thread+0x10/0x10 > > [18737.111482] kthread+0xda/0x100 > > [18737.111800] ? __pfx_kthread+0x10/0x10 > > [18737.112097] ret_from_fork+0x2c/0x50 > > [18737.112387] </TASK> > > [18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg > > [18737.114021] CR2: 0000000079e0afc0 > > [18737.114366] ---[ end trace 0000000000000000 ]--- > > [18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0 > > [18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89 > > [18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203 > > [18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001 > > [18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00 > > [18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000 > > [18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7 > > [18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10 > > [18737.120221] FS: 0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000 > > [18737.120994] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0 > > [18737.122624] note: kworker/u8:7[25287] exited with irqs disabled > > [19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0 > > And again, so something is going wrong > Indeed. I suggest you run your kernel with CONFIG_KASAN=y to see if there are any use-after-free or out-of-bounds bugs. -- Uladzislau Rezki ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-05-26 12:24 ` Uladzislau Rezki @ 2023-07-02 23:28 ` Forza 2023-07-06 8:08 ` Forza 0 siblings, 1 reply; 14+ messages in thread From: Forza @ 2023-07-02 23:28 UTC (permalink / raw) To: Uladzislau Rezki, Bagas Sanjaya Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert ---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-26 - 14:24 ---- > On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote: >> This looks like a different set of problems, though all of them seem to >> start on the compression write path in btrfs. >> >> On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote: >> > [ 8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0 >> > [ 13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. >> > [ 13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. >> > [13917.280527] ------------[ cut here ]------------ >> > [13917.280753] default_enter_idle leaked IRQ state >> > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430 >> >> ... Snip >> >> And again, so something is going wrong >> > Indeed. > > I suggest you run your kernel with CONFIG_KASAN=y to see if there are > any use-after-free or out-of-bounds bugs. > > -- > Uladzislau Rezki Pardon the delay... I have enabled KASAN and UBSAN on this kernel. It produced a lot of output and plenty of warnings for misalignment. The full dmesg is at https://paste.tnonline.net/files/aBoUMuTd5KBC_dmesg.ubsan.txt (approx 1.7MiB) The full kernel .conf is. at https://paste.tnonline.net/files/z1mX8TWFgZQ3_kernel.conf-kasan-ubsan.txt A small exctract around what I think is the default_enter_idle leaked IRQ event. Is this helpful? ================================================================================ Jul 03 00:33:57 git kernel: UBSAN: misaligned-access in net/ipv4/tcp_ipv4.c:1848:13 Jul 03 00:33:57 git kernel: member access within misaligned address 000000007604d82f for type 'const struct tcphdr' Jul 03 00:33:57 git kernel: which requires 4 byte alignment Jul 03 00:33:57 git kernel: CPU: 2 PID: 29 Comm: ksoftirqd/2 Not tainted 6.3.10-ksan-ubsan #8 Jul 03 00:33:57 git kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 Jul 03 00:33:57 git kernel: Call Trace: Jul 03 00:33:57 git kernel: <TASK> Jul 03 00:33:57 git kernel: dump_stack_lvl+0x86/0xd0 Jul 03 00:33:57 git kernel: ubsan_type_mismatch_common+0xdf/0x240 Jul 03 00:33:57 git kernel: __ubsan_handle_type_mismatch_v1+0x44/0x60 Jul 03 00:33:57 git kernel: tcp_add_backlog+0x1fac/0x3ab0 Jul 03 00:33:57 git kernel: ? sk_filter_trim_cap+0xcc/0xb60 Jul 03 00:33:57 git kernel: ? __pfx_tcp_add_backlog+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock+0x10/0x10 Jul 03 00:33:57 git kernel: tcp_v4_rcv+0x3583/0x4c40 Jul 03 00:33:57 git kernel: ? __pfx_tcp_v4_rcv+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Jul 03 00:33:57 git kernel: ip_protocol_deliver_rcu+0x6c/0x480 Jul 03 00:33:57 git kernel: ip_local_deliver_finish+0x2ae/0x4d0 Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 Jul 03 00:33:57 git kernel: ip_local_deliver+0x1ba/0x380 Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 Jul 03 00:33:57 git kernel: ? ipv4_dst_check+0x104/0x250 Jul 03 00:33:57 git kernel: ? __ubsan_handle_type_mismatch_v1+0x44/0x60 Jul 03 00:33:57 git kernel: ip_sublist_rcv_finish+0x172/0x380 Jul 03 00:33:57 git kernel: ------------[ cut here ]------------ Jul 03 00:33:57 git kernel: ip_sublist_rcv+0x3cd/0x900 Jul 03 00:33:57 git kernel: default_enter_idle leaked IRQ state Jul 03 00:33:57 git kernel: ? __pfx_ip_sublist_rcv+0x10/0x10 Jul 03 00:33:57 git kernel: ? __ubsan_handle_type_mismatch_v1+0x44/0x60 Jul 03 00:33:57 git kernel: ? ip_rcv_core+0x972/0x1b20 Jul 03 00:33:57 git kernel: ip_list_rcv+0x318/0x750 Jul 03 00:33:57 git kernel: ? __pfx_ip_list_rcv+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx_ip_list_rcv+0x10/0x10 Jul 03 00:33:57 git kernel: __netif_receive_skb_list_core+0x5ad/0x1170 Jul 03 00:33:57 git kernel: ? tcp_gro_receive+0x1f45/0x2990 Jul 03 00:33:57 git kernel: ? __pfx___netif_receive_skb_list_core+0x10/0x10 Jul 03 00:33:57 git kernel: ? kvm_clock_read+0x16/0x40 Jul 03 00:33:57 git kernel: ? ktime_get_with_offset+0xd0/0x1f0 Jul 03 00:33:57 git kernel: netif_receive_skb_list_internal+0x76f/0x1530 Jul 03 00:33:57 git kernel: ? __pfx_netif_receive_skb_list_internal+0x10/0x10 Jul 03 00:33:57 git kernel: ? dev_gro_receive+0x67f/0x4900 Jul 03 00:33:57 git kernel: ? free_unref_page+0x2fd/0x680 Jul 03 00:33:57 git kernel: ? put_page+0x69/0x2b0 Jul 03 00:33:57 git kernel: ? __pfx_eth_type_trans+0x10/0x10 Jul 03 00:33:57 git kernel: napi_gro_receive+0x77b/0xdc0 Jul 03 00:33:57 git kernel: receive_buf+0x1001/0xac40 Jul 03 00:33:57 git kernel: ? _raw_spin_lock_irqsave+0xaa/0x180 Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock_irqsave+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx_receive_buf+0x10/0x10 Jul 03 00:33:57 git kernel: ? _raw_spin_unlock_irqrestore+0x40/0x80 Jul 03 00:33:57 git kernel: ? trace_hardirqs_on+0x2d/0xd0 Jul 03 00:33:57 git kernel: ? detach_buf_split+0x27e/0xa70 Jul 03 00:33:57 git kernel: ? virtqueue_get_buf_ctx_split+0x3c3/0x1400 Jul 03 00:33:57 git kernel: ? virtqueue_enable_cb_delayed+0x5d0/0x1180 Jul 03 00:33:57 git kernel: virtnet_poll+0x7c7/0x2030 Jul 03 00:33:57 git kernel: ? __pfx_virtnet_poll+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock+0x10/0x10 Jul 03 00:33:57 git kernel: ? __run_timers+0x43d/0xf70 Jul 03 00:33:57 git kernel: __napi_poll.constprop.0+0xd4/0x840 Jul 03 00:33:57 git kernel: net_rx_action+0x7a0/0x26e0 Jul 03 00:33:57 git kernel: ? __pfx_net_rx_action+0x10/0x10 Jul 03 00:33:57 git kernel: __do_softirq+0x277/0x95d Jul 03 00:33:57 git kernel: ? __pfx___do_softirq+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx_run_ksoftirqd+0x10/0x10 Jul 03 00:33:57 git kernel: ? __pfx_run_ksoftirqd+0x10/0x10 Jul 03 00:33:57 git kernel: run_ksoftirqd+0x2c/0x40 Jul 03 00:33:57 git kernel: smpboot_thread_fn+0x380/0xbc0 Jul 03 00:33:57 git kernel: ? __kthread_parkme+0xdc/0x280 Jul 03 00:33:57 git kernel: ? schedule+0x158/0x360 Jul 03 00:33:57 git kernel: ? __pfx_smpboot_thread_fn+0x10/0x10 Jul 03 00:33:57 git kernel: kthread+0x259/0x3d0 Jul 03 00:33:57 git kernel: ? __pfx_kthread+0x10/0x10 Jul 03 00:33:57 git kernel: ret_from_fork+0x2c/0x50 Jul 03 00:33:57 git kernel: </TASK> Jul 03 00:33:57 git kernel: ================================================================================ ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-07-02 23:28 ` Forza @ 2023-07-06 8:08 ` Forza 2023-07-06 10:54 ` Linux regression tracking (Thorsten Leemhuis) 0 siblings, 1 reply; 14+ messages in thread From: Forza @ 2023-07-06 8:08 UTC (permalink / raw) To: Uladzislau Rezki, Bagas Sanjaya Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert ---- From: Forza <forza@tnonline.net> -- Sent: 2023-07-03 - 01:28 ---- > > > ---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-26 - 14:24 ---- > >> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote: >>> This looks like a different set of problems, though all of them seem to >>> start on the compression write path in btrfs. >>> >>> On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote: >>> > [ 8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0 >>> > [ 13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information. >>> > [ 13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved. >>> > [13917.280527] ------------[ cut here ]------------ >>> > [13917.280753] default_enter_idle leaked IRQ state >>> > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430 >>> >>> > ... Snip >>> >>> And again, so something is going wrong >>> >> Indeed. >> >> I suggest you run your kernel with CONFIG_KASAN=y to see if there are >> any use-after-free or out-of-bounds bugs. >> >> -- >> Uladzislau Rezki > > > Pardon the delay... I have enabled KASAN and UBSAN on this kernel. It produced a lot of output and plenty of warnings for misalignment. > > The full dmesg is at https://paste.tnonline.net/files/aBoUMuTd5KBC_dmesg.ubsan.txt (approx 1.7MiB) > > The full kernel .conf is. at https://paste.tnonline.net/files/z1mX8TWFgZQ3_kernel.conf-kasan-ubsan.txt > > A small exctract around what I think is the default_enter_idle leaked IRQ event. Is this helpful? > > ================================================================================ > Jul 03 00:33:57 git kernel: UBSAN: misaligned-access in net/ipv4/tcp_ipv4.c:1848:13 > Jul 03 00:33:57 git kernel: member access within misaligned address 000000007604d82f for type 'const struct tcphdr' > Jul 03 00:33:57 git kernel: which requires 4 byte alignment > Jul 03 00:33:57 git kernel: CPU: 2 PID: 29 Comm: ksoftirqd/2 Not tainted 6.3.10-ksan-ubsan #8 > Jul 03 00:33:57 git kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014 > Jul 03 00:33:57 git kernel: Call Trace: > Jul 03 00:33:57 git kernel: <TASK> > Jul 03 00:33:57 git kernel: dump_stack_lvl+0x86/0xd0 > Jul 03 00:33:57 git kernel: ubsan_type_mismatch_common+0xdf/0x240 > Jul 03 00:33:57 git kernel: __ubsan_handle_type_mismatch_v1+0x44/0x60 > Jul 03 00:33:57 git kernel: tcp_add_backlog+0x1fac/0x3ab0 > Jul 03 00:33:57 git kernel: ? sk_filter_trim_cap+0xcc/0xb60 > Jul 03 00:33:57 git kernel: ? __pfx_tcp_add_backlog+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock+0x10/0x10 > Jul 03 00:33:57 git kernel: tcp_v4_rcv+0x3583/0x4c40 > Jul 03 00:33:57 git kernel: ? __pfx_tcp_v4_rcv+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > Jul 03 00:33:57 git kernel: ip_protocol_deliver_rcu+0x6c/0x480 > Jul 03 00:33:57 git kernel: ip_local_deliver_finish+0x2ae/0x4d0 > Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 > Jul 03 00:33:57 git kernel: ip_local_deliver+0x1ba/0x380 > Jul 03 00:33:57 git kernel: ? __pfx_ip_local_deliver+0x10/0x10 > Jul 03 00:33:57 git kernel: ? ipv4_dst_check+0x104/0x250 > Jul 03 00:33:57 git kernel: ? __ubsan_handle_type_mismatch_v1+0x44/0x60 > Jul 03 00:33:57 git kernel: ip_sublist_rcv_finish+0x172/0x380 > Jul 03 00:33:57 git kernel: ------------[ cut here ]------------ > Jul 03 00:33:57 git kernel: ip_sublist_rcv+0x3cd/0x900 > Jul 03 00:33:57 git kernel: default_enter_idle leaked IRQ state > Jul 03 00:33:57 git kernel: ? __pfx_ip_sublist_rcv+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __ubsan_handle_type_mismatch_v1+0x44/0x60 > Jul 03 00:33:57 git kernel: ? ip_rcv_core+0x972/0x1b20 > Jul 03 00:33:57 git kernel: ip_list_rcv+0x318/0x750 > Jul 03 00:33:57 git kernel: ? __pfx_ip_list_rcv+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx_ip_list_rcv+0x10/0x10 > Jul 03 00:33:57 git kernel: __netif_receive_skb_list_core+0x5ad/0x1170 > Jul 03 00:33:57 git kernel: ? tcp_gro_receive+0x1f45/0x2990 > Jul 03 00:33:57 git kernel: ? __pfx___netif_receive_skb_list_core+0x10/0x10 > Jul 03 00:33:57 git kernel: ? kvm_clock_read+0x16/0x40 > Jul 03 00:33:57 git kernel: ? ktime_get_with_offset+0xd0/0x1f0 > Jul 03 00:33:57 git kernel: netif_receive_skb_list_internal+0x76f/0x1530 > Jul 03 00:33:57 git kernel: ? __pfx_netif_receive_skb_list_internal+0x10/0x10 > Jul 03 00:33:57 git kernel: ? dev_gro_receive+0x67f/0x4900 > Jul 03 00:33:57 git kernel: ? free_unref_page+0x2fd/0x680 > Jul 03 00:33:57 git kernel: ? put_page+0x69/0x2b0 > Jul 03 00:33:57 git kernel: ? __pfx_eth_type_trans+0x10/0x10 > Jul 03 00:33:57 git kernel: napi_gro_receive+0x77b/0xdc0 > Jul 03 00:33:57 git kernel: receive_buf+0x1001/0xac40 > Jul 03 00:33:57 git kernel: ? _raw_spin_lock_irqsave+0xaa/0x180 > Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock_irqsave+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx_receive_buf+0x10/0x10 > Jul 03 00:33:57 git kernel: ? _raw_spin_unlock_irqrestore+0x40/0x80 > Jul 03 00:33:57 git kernel: ? trace_hardirqs_on+0x2d/0xd0 > Jul 03 00:33:57 git kernel: ? detach_buf_split+0x27e/0xa70 > Jul 03 00:33:57 git kernel: ? virtqueue_get_buf_ctx_split+0x3c3/0x1400 > Jul 03 00:33:57 git kernel: ? virtqueue_enable_cb_delayed+0x5d0/0x1180 > Jul 03 00:33:57 git kernel: virtnet_poll+0x7c7/0x2030 > Jul 03 00:33:57 git kernel: ? __pfx_virtnet_poll+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx__raw_spin_lock+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __run_timers+0x43d/0xf70 > Jul 03 00:33:57 git kernel: __napi_poll.constprop.0+0xd4/0x840 > Jul 03 00:33:57 git kernel: net_rx_action+0x7a0/0x26e0 > Jul 03 00:33:57 git kernel: ? __pfx_net_rx_action+0x10/0x10 > Jul 03 00:33:57 git kernel: __do_softirq+0x277/0x95d > Jul 03 00:33:57 git kernel: ? __pfx___do_softirq+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx_run_ksoftirqd+0x10/0x10 > Jul 03 00:33:57 git kernel: ? __pfx_run_ksoftirqd+0x10/0x10 > Jul 03 00:33:57 git kernel: run_ksoftirqd+0x2c/0x40 > Jul 03 00:33:57 git kernel: smpboot_thread_fn+0x380/0xbc0 > Jul 03 00:33:57 git kernel: ? __kthread_parkme+0xdc/0x280 > Jul 03 00:33:57 git kernel: ? schedule+0x158/0x360 > Jul 03 00:33:57 git kernel: ? __pfx_smpboot_thread_fn+0x10/0x10 > Jul 03 00:33:57 git kernel: kthread+0x259/0x3d0 > Jul 03 00:33:57 git kernel: ? __pfx_kthread+0x10/0x10 > Jul 03 00:33:57 git kernel: ret_from_fork+0x2c/0x50 > Jul 03 00:33:57 git kernel: </TASK> > Jul 03 00:33:57 git kernel: ================================================================================ > A small update. I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show the same issue. I am now trying 6.1.37 since two days and have not been able to reproduce this issue on any of my virtual qemu/kvm machines. Perhaps this information is helpful in finding the root cause? ~Forza ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-07-06 8:08 ` Forza @ 2023-07-06 10:54 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-07 10:13 ` Forza 0 siblings, 1 reply; 14+ messages in thread From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-06 10:54 UTC (permalink / raw) To: Forza, Uladzislau Rezki, Bagas Sanjaya Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert On 06.07.23 10:08, Forza wrote: >>> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote: > [...] > A small update. Thx for this. > I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show > the same issue. > > I am now trying 6.1.37 since two days and have not been able to > reproduce this issue on any of my virtual qemu/kvm machines. Perhaps > this information is helpful in finding the root cause? That means it's most likely a regression between v6.1..v6.2 (or v6.1..v6.2.16 if we are unlucky) somewhere (from earlier in the thread it sounds like it might not be Btrfs). Which makes we wonder: how long do you usually need to reproduce the issue? If it's not too long it might mean that a bisection is the best way forward, unless some developer sits down and looks closely at the logs. With a bit of luck some dev will do that; but if we are unlucky we likely will need a bisection. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x 2023-07-06 10:54 ` Linux regression tracking (Thorsten Leemhuis) @ 2023-07-07 10:13 ` Forza 0 siblings, 0 replies; 14+ messages in thread From: Forza @ 2023-07-07 10:13 UTC (permalink / raw) To: Linux regressions mailing list, Uladzislau Rezki, Bagas Sanjaya Cc: Linux btrfs, Linux Kernel Mailing List, Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert On 2023-07-06 12:54, Linux regression tracking (Thorsten Leemhuis) wrote: > On 06.07.23 10:08, Forza wrote: >>>> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote: >> [...] >> A small update. > > Thx for this. > >> I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show >> the same issue. >> >> I am now trying 6.1.37 since two days and have not been able to >> reproduce this issue on any of my virtual qemu/kvm machines. Perhaps >> this information is helpful in finding the root cause? > > That means it's most likely a regression between v6.1..v6.2 (or > v6.1..v6.2.16 if we are unlucky) somewhere (from earlier in the thread > it sounds like it might not be Btrfs). Agreed, I do not think this specific bug (cpuidle / default_enter_idle leaked IRQ state) is Btrfs related. Some of the virtual machines I test on do not use Btrfs. > > Which makes we wonder: how long do you usually need to reproduce the > issue? If it's not too long it might mean that a bisection is the best > way forward, unless some developer sits down and looks closely at the > logs. With a bit of luck some dev will do that; but if we are unlucky we > likely will need a bisection. > It has varied. Sometimes immediately upon boot, but can take several hours or a day before showing up. Also, I forgot to say I was basing my kernels on gentoo-kernels, which has some patches against vanilla. Therefore I will I will compile a set of vanilla kernels from 6.1.37 until 6.4.2 and run them in my testing machines to see where the problem is happening. This is not a fast system, so it will likely take several days. But I will keep you posted. Meanwhile, if you think of any specific kernel debug options, tracing, etc, that I should enable, let me know Should we change the Subject line of this email thread? Thanks ~Forza > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-07-07 10:19 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya 2023-05-22 16:00 ` Uladzislau Rezki 2023-05-22 19:09 ` David Sterba 2023-05-22 19:04 ` Forza 2023-05-23 1:52 ` Bagas Sanjaya 2023-05-23 10:28 ` Uladzislau Rezki 2023-05-23 21:25 ` Forza 2023-05-24 5:57 ` Forza 2023-05-24 9:13 ` David Sterba 2023-05-26 12:24 ` Uladzislau Rezki 2023-07-02 23:28 ` Forza 2023-07-06 8:08 ` Forza 2023-07-06 10:54 ` Linux regression tracking (Thorsten Leemhuis) 2023-07-07 10:13 ` Forza
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox