Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x

public inbox for linux-btrfs@vger.kernel.org
 help / color / mirror / Atom feed

* Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
@ 2023-05-22 13:25 Bagas Sanjaya
  2023-05-22 16:00 ` Uladzislau Rezki
  2023-05-22 19:04 ` Forza
  0 siblings, 2 replies; 14+ messages in thread
From: Bagas Sanjaya @ 2023-05-22 13:25 UTC (permalink / raw)
  To: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable
  Cc: Chris Mason, Josef Bacik, David Sterba, a1bert

Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg
> 
> 
> 
> # free 
>                total        used        free      shared  buff/cache   available
> Mem:        16183724     1473068      205664       33472    14504992    14335700
> Swap:       16777212      703596    16073616
> 
> 
> (zswap enabled)

See bugzilla for the full thread and attached dmesg.

On the report, the reporter can't perform the required bisection,
unfortunately.

Anyway, I'm adding it to regzbot:

#regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466
#regzbot title: btrfs_work_helper dealloc error in v6.3.x

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya
@ 2023-05-22 16:00 ` Uladzislau Rezki
  2023-05-22 19:09   ` David Sterba
  2023-05-22 19:04 ` Forza
  1 sibling, 1 reply; 14+ messages in thread
From: Uladzislau Rezki @ 2023-05-22 16:00 UTC (permalink / raw)
  To: Bagas Sanjaya
  Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert

> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
> > after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg
> > 
> > 
> > 
> > # free 
> >                total        used        free      shared  buff/cache   available
> > Mem:        16183724     1473068      205664       33472    14504992    14335700
> > Swap:       16777212      703596    16073616
> > 
> > 
> > (zswap enabled)
> 
> See bugzilla for the full thread and attached dmesg.
> 
> On the report, the reporter can't perform the required bisection,
> unfortunately.
> 
> Anyway, I'm adding it to regzbot:
> 
> #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466
> #regzbot title: btrfs_work_helper dealloc error in v6.3.x
> 
> Thanks.
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466
> 
According to dmesg output from the bugzilla, the vmalloc tries to
allocate high order pages: 1 << 9. Since it fails to get a order-9 page
you get the warning:

<snip>
	if (area->nr_pages != nr_small_pages) {
		/* vm_area_alloc_pages() can also fail due to a fatal signal */
		if (!fatal_signal_pending(current))
			warn_alloc(gfp_mask, NULL,
				"vmalloc error: size %lu, page order %u, failed to allocate pages",
				area->nr_pages * PAGE_SIZE, page_order);
		goto fail;
	}
<snip>

and it fails.

If the __GFP_NOFAIL is passed, the vm_area_alloc_pages() function switches
to allocate 0-order pages instead. I think the fix is to call the
kvmalloc_node() with __GFP_NOFAIL flag.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-22 16:00 ` Uladzislau Rezki
@ 2023-05-22 19:09   ` David Sterba
  0 siblings, 0 replies; 14+ messages in thread
From: David Sterba @ 2023-05-22 19:09 UTC (permalink / raw)
  To: Uladzislau Rezki
  Cc: Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List,
	Linux Regressions, Linux Stable, Chris Mason, Josef Bacik,
	David Sterba, a1bert, linux-mm

On Mon, May 22, 2023 at 06:00:42PM +0200, Uladzislau Rezki wrote:
> > Hi,
> > 
> > I notice a regression report on Bugzilla [1]. Quoting from it:
> > 
> > > after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg
> > > 
> > > 
> > > 
> > > # free 
> > >                total        used        free      shared  buff/cache   available
> > > Mem:        16183724     1473068      205664       33472    14504992    14335700
> > > Swap:       16777212      703596    16073616
> > > 
> > > 
> > > (zswap enabled)
> > 
> > See bugzilla for the full thread and attached dmesg.
> > 
> > On the report, the reporter can't perform the required bisection,
> > unfortunately.
> > 
> > Anyway, I'm adding it to regzbot:
> > 
> > #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466
> > #regzbot title: btrfs_work_helper dealloc error in v6.3.x
> > 
> > Thanks.
> > 
> > [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466
> > 
> According to dmesg output from the bugzilla, the vmalloc tries to
> allocate high order pages: 1 << 9. Since it fails to get a order-9 page
> you get the warning:

That we want a order 9 is intentional, it's for a compression workspace
(bugzilla comment 5). It's allocated as kvzalloc i.e. with the fallback
to vmalloc in case the first one fails.

> <snip>
> 	if (area->nr_pages != nr_small_pages) {
> 		/* vm_area_alloc_pages() can also fail due to a fatal signal */
> 		if (!fatal_signal_pending(current))
> 			warn_alloc(gfp_mask, NULL,
> 				"vmalloc error: size %lu, page order %u, failed to allocate pages",
> 				area->nr_pages * PAGE_SIZE, page_order);
> 		goto fail;
> 	}
> <snip>
> 
> and it fails.
> 
> If the __GFP_NOFAIL is passed, the vm_area_alloc_pages() function switches
> to allocate 0-order pages instead. I think the fix is to call the
> kvmalloc_node() with __GFP_NOFAIL flag.

__GFP_NOFAIL does not make sense here and we've tried hard not to used
it anywhere because of the deadlocky effects. Did you mean __GFP_NOWARN?
That's a patch I sent today but there's another comment in the bugzilla
that we got more allocation warnings for huge (2M) allocations, this
time it was for a deduplication ioctl.

This seems to be a noticeable change in 6.3, before we disable the
warning in our code I think the MM guys could have a look. So far it
seems we're about to paper of a problem.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya
  2023-05-22 16:00 ` Uladzislau Rezki
@ 2023-05-22 19:04 ` Forza
  2023-05-23  1:52   ` Bagas Sanjaya
  1 sibling, 1 reply; 14+ messages in thread
From: Forza @ 2023-05-22 19:04 UTC (permalink / raw)
  To: Bagas Sanjaya, Linux btrfs, Linux Kernel Mailing List,
	Linux Regressions, Linux Stable
  Cc: Chris Mason, Josef Bacik, David Sterba, a1bert



---- From: Bagas Sanjaya <bagasdotme@gmail.com> -- Sent: 2023-05-22 - 15:25 ----

> Hi,
> 
> I notice a regression report on Bugzilla [1]. Quoting from it:
> 
>> after updating from 6.2.x to 6.3.x, vmalloc error messages started to appear in the dmesg
>> 
>> 
>> 
>> # free 
>>                total        used        free      shared  buff/cache   available
>> Mem:        16183724     1473068      205664       33472    14504992    14335700
>> Swap:       16777212      703596    16073616
>> 
>> 
>> (zswap enabled)
> 
> See bugzilla for the full thread and attached dmesg.
> 
> On the report, the reporter can't perform the required bisection,
> unfortunately.
> 
> Anyway, I'm adding it to regzbot:
> 
> #regzbot introduced: v6.2..v6.3 https://bugzilla.kernel.org/show_bug.cgi?id=217466
> #regzbot title: btrfs_work_helper dealloc error in v6.3.x
> 
> Thanks.
> 
> [1]: https://bugzilla.kernel.org/show_bug.cgi?id=217466
> 

I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. 

https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-22 19:04 ` Forza
@ 2023-05-23  1:52   ` Bagas Sanjaya
  2023-05-23 10:28     ` Uladzislau Rezki
  0 siblings, 1 reply; 14+ messages in thread
From: Bagas Sanjaya @ 2023-05-23  1:52 UTC (permalink / raw)
  To: Forza, Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable
  Cc: Chris Mason, Josef Bacik, David Sterba, a1bert

[-- Attachment #1: Type: text/plain, Size: 515 bytes --]

On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote:
> I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. 
> 
> https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/

Thanks for your similar report. Telling regzbot about it:

#regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/

-- 
An old man doll... just what I always wanted! - Clara

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-23  1:52   ` Bagas Sanjaya
@ 2023-05-23 10:28     ` Uladzislau Rezki
  2023-05-23 21:25       ` Forza
  2023-05-24  5:57       ` Forza
  0 siblings, 2 replies; 14+ messages in thread
From: Uladzislau Rezki @ 2023-05-23 10:28 UTC (permalink / raw)
  To: Bagas Sanjaya, Forza
  Cc: Forza, Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert,
	urezki

On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote:
> On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote:
> > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. 
> > 
> > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/
> 
> Thanks for your similar report. Telling regzbot about it:
> 
> #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/
> 
It is good that you can reproduce it. Could you please test below patch?

<snip>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 31ff782d368b..7a06452f7807 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
                        page = alloc_pages(alloc_gfp, order);
                else
                        page = alloc_pages_node(nid, alloc_gfp, order);
+
                if (unlikely(!page)) {
-                       if (!nofail)
-                               break;
+                       if (nofail)
+                               alloc_gfp |= __GFP_NOFAIL;

-                       /* fall back to the zero order allocations */
-                       alloc_gfp |= __GFP_NOFAIL;
-                       order = 0;
-                       continue;
+                       /* Fall back to the zero order allocations. */
+                       if (order || nofail) {
+                               order = 0;
+                               continue;
+                       }
+
+                       break;
                }

                /*
<snip>

Thanks!

--
Uladzislau Rezki

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-23 10:28     ` Uladzislau Rezki
@ 2023-05-23 21:25       ` Forza
  2023-05-24  5:57       ` Forza
  1 sibling, 0 replies; 14+ messages in thread
From: Forza @ 2023-05-23 21:25 UTC (permalink / raw)
  To: Uladzislau Rezki, Bagas Sanjaya
  Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert,
	urezki



---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-23 - 12:28 ----

> On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote:
>> On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote:
>> > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. 
>> > 
>> > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/
>> 
>> Thanks for your similar report. Telling regzbot about it:
>> 
>> #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/
>> 
> It is good that you can reproduce it. Could you please test below patch?

Yes, applied it to my test VM and will let it run over night to see how it turns out. I'll post again tomorrow. 

Thanks. 
> 
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..7a06452f7807 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>                         page = alloc_pages(alloc_gfp, order);
>                 else
>                         page = alloc_pages_node(nid, alloc_gfp, order);
> +
>                 if (unlikely(!page)) {
> -                       if (!nofail)
> -                               break;
> +                       if (nofail)
> +                               alloc_gfp |= __GFP_NOFAIL;
> 
> -                       /* fall back to the zero order allocations */
> -                       alloc_gfp |= __GFP_NOFAIL;
> -                       order = 0;
> -                       continue;
> +                       /* Fall back to the zero order allocations. */
> +                       if (order || nofail) {
> +                               order = 0;
> +                               continue;
> +                       }
> +
> +                       break;
>                 }
> 
>                 /*
> <snip>
> 
> Thanks!
> 
> --
> Uladzislau Rezki



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-23 10:28     ` Uladzislau Rezki
  2023-05-23 21:25       ` Forza
@ 2023-05-24  5:57       ` Forza
  2023-05-24  9:13         ` David Sterba
  1 sibling, 1 reply; 14+ messages in thread
From: Forza @ 2023-05-24  5:57 UTC (permalink / raw)
  To: Uladzislau Rezki, Bagas Sanjaya
  Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert,
	urezki



---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-23 - 12:28 ----

> On Tue, May 23, 2023 at 08:52:21AM +0700, Bagas Sanjaya wrote:
>> On Mon, May 22, 2023 at 09:04:05PM +0200, Forza wrote:
>> > I have a similar experience with kernel 6.3 where vmalloc fails in a similar way. I was able to reproduce it in a QEMU VM as well as on my system. 
>> > 
>> > https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/T/
>> 
>> Thanks for your similar report. Telling regzbot about it:
>> 
>> #regzbot link: https://lore.kernel.org/all/d11418b6-38e5-eb78-1537-c39245dc0b78@tnonline.net/
>> 
> It is good that you can reproduce it. Could you please test below patch?
> 
> <snip>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 31ff782d368b..7a06452f7807 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2957,14 +2957,18 @@ vm_area_alloc_pages(gfp_t gfp, int nid,
>                         page = alloc_pages(alloc_gfp, order);
>                 else
>                         page = alloc_pages_node(nid, alloc_gfp, order);
> +
>                 if (unlikely(!page)) {
> -                       if (!nofail)
> -                               break;
> +                       if (nofail)
> +                               alloc_gfp |= __GFP_NOFAIL;
> 
> -                       /* fall back to the zero order allocations */
> -                       alloc_gfp |= __GFP_NOFAIL;
> -                       order = 0;
> -                       continue;
> +                       /* Fall back to the zero order allocations. */
> +                       if (order || nofail) {
> +                               order = 0;
> +                               continue;
> +                       }
> +
> +                       break;
>                 }
> 
>                 /*
> <snip>
> 
> Thanks!
> 
> --
> Uladzislau Rezki


There is a different result now that I have not seen before. The full dmesg is available at https://paste.tnonline.net/files/pnnW6gYASxWX_dmesg-mm-patch.txt


[   8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0
[   13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
[   13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
[13917.280527] ------------[ cut here ]------------
[13917.280753] default_enter_idle leaked IRQ state
[13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430
[13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
[13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4
[13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430
[13917.281162] Code: 0f 1f 44 00 00 e9 a7 fd ff ff 80 3d 3a 3b d7 00 00 75 19 49 8b 75 50 48 c7 c7 ab b6 79 ac c6 05 26 3b d7 00 01 e8 a5 c4 20 ff <0f> 0b fa 0f 1f 44 00 00 e9 ca fc ff ff 83 c0 01 48 83 c1 40 39 f8
[13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286
[13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f
[13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f
[13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60
[13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320
[13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000
[13917.281202] FS:  0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000
[13917.281206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0
[13917.281217] Call Trace:
[13917.281221]  <TASK>
[13917.281228]  cpuidle_enter+0x29/0x40
[13917.281244]  do_idle+0x19b/0x200
[13917.281292]  cpu_startup_entry+0x19/0x20
[13917.281297]  start_secondary+0x101/0x120
[13917.281324]  secondary_startup_64_no_verify+0xe5/0xeb
[13917.281343]  </TASK>
[13917.281346] ---[ end trace 0000000000000000 ]---
[17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm
[17206.750190] BTRFS info (device vdb): using free space tree
[17206.904010] BTRFS info (device vdb): auto enabling async discard
[17206.933302] BTRFS info (device vdb): checking UUID tree
[17344.541839] sched: RT throttling activated
[18284.216538] hrtimer: interrupt took 23434934 ns
[18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0
[18737.100883] #PF: supervisor read access in kernel mode
[18737.101155] #PF: error_code(0x0000) - not-present page
[18737.101462] PGD 0 P4D 0 
[18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI
[18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G        W          6.3.1-gentoo-mm-patched #4
[18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper
[18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0
[18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
[18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
[18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
[18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
[18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
[18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
[18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
[18737.107676] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
[18737.107971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
[18737.108606] Call Trace:
[18737.108964]  <TASK>
[18737.109273]  btrfs_reserve_extent+0x148/0x260
[18737.109601]  submit_compressed_extents+0x14f/0x490
[18737.109934]  async_cow_submit+0x37/0x90
[18737.110237]  btrfs_work_helper+0x13d/0x360
[18737.110542]  process_one_work+0x20f/0x410
[18737.110883]  worker_thread+0x4a/0x3b0
[18737.111185]  ? __pfx_worker_thread+0x10/0x10
[18737.111482]  kthread+0xda/0x100
[18737.111800]  ? __pfx_kthread+0x10/0x10
[18737.112097]  ret_from_fork+0x2c/0x50
[18737.112387]  </TASK>
[18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
[18737.114021] CR2: 0000000079e0afc0
[18737.114366] ---[ end trace 0000000000000000 ]---
[18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0
[18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
[18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
[18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
[18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
[18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
[18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
[18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
[18737.120221] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
[18737.120994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
[18737.122624] note: kworker/u8:7[25287] exited with irqs disabled
[19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0
[19006.922015] #PF: supervisor read access in kernel mode
[19006.923354] #PF: error_code(0x0000) - not-present page
[19006.924636] PGD 0 P4D 0 
[19006.925868] Oops: 0000 [#2] PREEMPT SMP NOPTI
[19006.927066] CPU: 0 PID: 24329 Comm: crawl_writeback Tainted: G      D W          6.3.1-gentoo-mm-patched #4
[19006.928510] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[19006.929817] RIP: 0010:find_free_extent+0x20a/0x15c0
[19006.931050] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
[19006.933653] RSP: 0018:ffffa153c0d0f568 EFLAGS: 00010203
[19006.934972] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
[19006.936236] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
[19006.937480] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
[19006.938750] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0d0f757
[19006.939986] R13: ffffa153c0d0f628 R14: 0000000000000001 R15: 0000000079e0af10
[19006.941255] FS:  00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000
[19006.942579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19006.943830] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0
[19006.945278] Call Trace:
[19006.946730]  <TASK>
[19006.947792]  ? release_pages+0x13e/0x490
[19006.948741]  btrfs_reserve_extent+0x148/0x260
[19006.949637]  cow_file_range+0x199/0x610
[19006.950396]  btrfs_run_delalloc_range+0x103/0x520
[19006.951135]  ? find_lock_delalloc_range+0x1ea/0x210
[19006.951802]  writepage_delalloc+0xb9/0x180
[19006.952401]  __extent_writepage+0xeb/0x410
[19006.952985]  extent_write_cache_pages+0x152/0x3d0
[19006.953552]  extent_writepages+0x4c/0x100
[19006.954116]  do_writepages+0xbe/0x1d0
[19006.954672]  ? memcmp_extent_buffer+0xa2/0xe0
[19006.955199]  filemap_fdatawrite_wbc+0x5f/0x80
[19006.955726]  __filemap_fdatawrite_range+0x4a/0x60
[19006.956219]  btrfs_rename+0x529/0xb60
[19006.956711]  ? psi_group_change+0x168/0x400
[19006.957280]  btrfs_rename2+0x2a/0x60
[19006.957799]  vfs_rename+0x5d4/0xeb0
[19006.958308]  ? lookup_dcache+0x17/0x60
[19006.958784]  ? do_renameat2+0x507/0x580
[19006.959239]  do_renameat2+0x507/0x580
[19006.959702]  __x64_sys_renameat+0x45/0x60
[19006.960293]  do_syscall_64+0x5b/0xc0
[19006.960848]  ? syscall_exit_to_user_mode+0x17/0x40
[19006.961331]  ? do_syscall_64+0x67/0xc0
[19006.961812]  ? syscall_exit_to_user_mode+0x17/0x40
[19006.962401]  ? do_syscall_64+0x67/0xc0
[19006.963371]  ? do_syscall_64+0x67/0xc0
[19006.964020]  ? do_syscall_64+0x67/0xc0
[19006.965001]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
[19006.965952] RIP: 0033:0x7fb25eba492a
[19006.966485] Code: 48 8b 15 d9 44 17 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 08 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 a1 44 17 00 f7
[19006.967545] RSP: 002b:00007fb245ff8a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000108
[19006.968076] RAX: ffffffffffffffda RBX: 0000559a70a039f0 RCX: 00007fb25eba492a
[19006.968623] RDX: 0000000000000004 RSI: 00007fb134000fc0 RDI: 0000000000000004
[19006.977319] RBP: 00007fb245ff8c60 R08: 0000000000000000 R09: 0000000000000000
[19006.977877] R10: 0000559a70a03a00 R11: 0000000000000246 R12: 00007fb245ff8c80
[19006.978301] R13: 0000000000000004 R14: 00007fb245ff8c60 R15: 00000000000070b5
[19006.978727]  </TASK>
[19006.979118] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
[19006.981463] CR2: 0000000079e0afc0
[19006.982193] ---[ end trace 0000000000000000 ]---
[19006.982938] RIP: 0010:find_free_extent+0x20a/0x15c0
[19006.983565] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
[19006.984863] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
[19006.985500] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
[19006.986195] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
[19006.986877] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
[19006.987581] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
[19006.988252] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
[19006.988984] FS:  00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000
[19006.989646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[19006.990336] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0
[19006.991037] note: crawl_writeback[24329] exited with irqs disabled

 






^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-24  5:57       ` Forza
@ 2023-05-24  9:13         ` David Sterba
  2023-05-26 12:24           ` Uladzislau Rezki
  0 siblings, 1 reply; 14+ messages in thread
From: David Sterba @ 2023-05-24  9:13 UTC (permalink / raw)
  To: Forza
  Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable,
	Chris Mason, Josef Bacik, David Sterba, a1bert

This looks like a different set of problems, though all of them seem to
start on the compression write path in btrfs.

On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote:
> [   8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0
> [   13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
> [   13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> [13917.280527] ------------[ cut here ]------------
> [13917.280753] default_enter_idle leaked IRQ state
> [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430

Warning in cpuilde

> [13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
> [13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4
> [13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> [13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430
> [13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286
> [13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f
> [13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f
> [13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60
> [13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320
> [13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000
> [13917.281202] FS:  0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000
> [13917.281206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0
> [13917.281217] Call Trace:
> [13917.281221]  <TASK>
> [13917.281228]  cpuidle_enter+0x29/0x40
> [13917.281244]  do_idle+0x19b/0x200
> [13917.281292]  cpu_startup_entry+0x19/0x20
> [13917.281297]  start_secondary+0x101/0x120
> [13917.281324]  secondary_startup_64_no_verify+0xe5/0xeb
> [13917.281343]  </TASK>
> [13917.281346] ---[ end trace 0000000000000000 ]---
> [17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm
> [17206.750190] BTRFS info (device vdb): using free space tree
> [17206.904010] BTRFS info (device vdb): auto enabling async discard
> [17206.933302] BTRFS info (device vdb): checking UUID tree
> [17344.541839] sched: RT throttling activated
> [18284.216538] hrtimer: interrupt took 23434934 ns
> [18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0

BUG

> [18737.100883] #PF: supervisor read access in kernel mode
> [18737.101155] #PF: error_code(0x0000) - not-present page
> [18737.101462] PGD 0 P4D 0 
> [18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G        W          6.3.1-gentoo-mm-patched #4
> [18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> [18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper
> [18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0
> [18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> [18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
> [18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> [18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> [18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> [18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
> [18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
> [18737.107676] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
> [18737.107971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
> [18737.108606] Call Trace:
> [18737.108964]  <TASK>
> [18737.109273]  btrfs_reserve_extent+0x148/0x260
> [18737.109601]  submit_compressed_extents+0x14f/0x490
> [18737.109934]  async_cow_submit+0x37/0x90
> [18737.110237]  btrfs_work_helper+0x13d/0x360
> [18737.110542]  process_one_work+0x20f/0x410
> [18737.110883]  worker_thread+0x4a/0x3b0
> [18737.111185]  ? __pfx_worker_thread+0x10/0x10
> [18737.111482]  kthread+0xda/0x100
> [18737.111800]  ? __pfx_kthread+0x10/0x10
> [18737.112097]  ret_from_fork+0x2c/0x50
> [18737.112387]  </TASK>
> [18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
> [18737.114021] CR2: 0000000079e0afc0
> [18737.114366] ---[ end trace 0000000000000000 ]---
> [18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0
> [18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> [18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
> [18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> [18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> [18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> [18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
> [18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
> [18737.120221] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
> [18737.120994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
> [18737.122624] note: kworker/u8:7[25287] exited with irqs disabled
> [19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0

And again, so something is going wrong

> [19006.922015] #PF: supervisor read access in kernel mode
> [19006.923354] #PF: error_code(0x0000) - not-present page
> [19006.924636] PGD 0 P4D 0 
> [19006.925868] Oops: 0000 [#2] PREEMPT SMP NOPTI
> [19006.927066] CPU: 0 PID: 24329 Comm: crawl_writeback Tainted: G      D W          6.3.1-gentoo-mm-patched #4
> [19006.928510] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> [19006.929817] RIP: 0010:find_free_extent+0x20a/0x15c0
> [19006.931050] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> [19006.933653] RSP: 0018:ffffa153c0d0f568 EFLAGS: 00010203
> [19006.934972] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> [19006.936236] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> [19006.937480] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> [19006.938750] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0d0f757
> [19006.939986] R13: ffffa153c0d0f628 R14: 0000000000000001 R15: 0000000079e0af10
> [19006.941255] FS:  00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000
> [19006.942579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19006.943830] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0
> [19006.945278] Call Trace:
> [19006.946730]  <TASK>
> [19006.947792]  ? release_pages+0x13e/0x490
> [19006.948741]  btrfs_reserve_extent+0x148/0x260
> [19006.949637]  cow_file_range+0x199/0x610
> [19006.950396]  btrfs_run_delalloc_range+0x103/0x520
> [19006.951135]  ? find_lock_delalloc_range+0x1ea/0x210
> [19006.951802]  writepage_delalloc+0xb9/0x180
> [19006.952401]  __extent_writepage+0xeb/0x410
> [19006.952985]  extent_write_cache_pages+0x152/0x3d0
> [19006.953552]  extent_writepages+0x4c/0x100
> [19006.954116]  do_writepages+0xbe/0x1d0
> [19006.954672]  ? memcmp_extent_buffer+0xa2/0xe0
> [19006.955199]  filemap_fdatawrite_wbc+0x5f/0x80
> [19006.955726]  __filemap_fdatawrite_range+0x4a/0x60
> [19006.956219]  btrfs_rename+0x529/0xb60
> [19006.956711]  ? psi_group_change+0x168/0x400
> [19006.957280]  btrfs_rename2+0x2a/0x60
> [19006.957799]  vfs_rename+0x5d4/0xeb0
> [19006.958308]  ? lookup_dcache+0x17/0x60
> [19006.958784]  ? do_renameat2+0x507/0x580
> [19006.959239]  do_renameat2+0x507/0x580
> [19006.959702]  __x64_sys_renameat+0x45/0x60
> [19006.960293]  do_syscall_64+0x5b/0xc0
> [19006.960848]  ? syscall_exit_to_user_mode+0x17/0x40
> [19006.961331]  ? do_syscall_64+0x67/0xc0
> [19006.961812]  ? syscall_exit_to_user_mode+0x17/0x40
> [19006.962401]  ? do_syscall_64+0x67/0xc0
> [19006.963371]  ? do_syscall_64+0x67/0xc0
> [19006.964020]  ? do_syscall_64+0x67/0xc0
> [19006.965001]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> [19006.965952] RIP: 0033:0x7fb25eba492a
> [19006.966485] Code: 48 8b 15 d9 44 17 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 49 89 ca b8 08 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 06 c3 0f 1f 44 00 00 48 8b 15 a1 44 17 00 f7
> [19006.967545] RSP: 002b:00007fb245ff8a08 EFLAGS: 00000246 ORIG_RAX: 0000000000000108
> [19006.968076] RAX: ffffffffffffffda RBX: 0000559a70a039f0 RCX: 00007fb25eba492a
> [19006.968623] RDX: 0000000000000004 RSI: 00007fb134000fc0 RDI: 0000000000000004
> [19006.977319] RBP: 00007fb245ff8c60 R08: 0000000000000000 R09: 0000000000000000
> [19006.977877] R10: 0000559a70a03a00 R11: 0000000000000246 R12: 00007fb245ff8c80
> [19006.978301] R13: 0000000000000004 R14: 00007fb245ff8c60 R15: 00000000000070b5
> [19006.978727]  </TASK>
> [19006.979118] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
> [19006.981463] CR2: 0000000079e0afc0
> [19006.982193] ---[ end trace 0000000000000000 ]---
> [19006.982938] RIP: 0010:find_free_extent+0x20a/0x15c0
> [19006.983565] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> [19006.984863] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
> [19006.985500] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> [19006.986195] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> [19006.986877] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> [19006.987581] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
> [19006.988252] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
> [19006.988984] FS:  00007fb245ffb6c0(0000) GS:ffff8c15ebe00000(0000) knlGS:0000000000000000
> [19006.989646] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [19006.990336] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506f0
> [19006.991037] note: crawl_writeback[24329] exited with irqs disabled
> 
>  
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-24  9:13         ` David Sterba
@ 2023-05-26 12:24           ` Uladzislau Rezki
  2023-07-02 23:28             ` Forza
  0 siblings, 1 reply; 14+ messages in thread
From: Uladzislau Rezki @ 2023-05-26 12:24 UTC (permalink / raw)
  To: Forza, Bagas Sanjaya
  Cc: Forza, Uladzislau Rezki, Bagas Sanjaya, Linux btrfs,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable,
	Chris Mason, Josef Bacik, David Sterba, a1bert

On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote:
> This looks like a different set of problems, though all of them seem to
> start on the compression write path in btrfs.
> 
> On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote:
> > [   8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0
> > [   13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
> > [   13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> > [13917.280527] ------------[ cut here ]------------
> > [13917.280753] default_enter_idle leaked IRQ state
> > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430
> 
> Warning in cpuilde
> 
> > [13917.281046] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
> > [13917.281140] CPU: 3 PID: 0 Comm: swapper/3 Not tainted 6.3.1-gentoo-mm-patched #4
> > [13917.281150] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > [13917.281154] RIP: 0010:cpuidle_enter_state+0x3bb/0x430
> > [13917.281176] RSP: 0018:ffffa153c00b7ea0 EFLAGS: 00010286
> > [13917.281182] RAX: ffff8c15ebfafa28 RBX: ffffc153bfd80900 RCX: 000000000000083f
> > [13917.281186] RDX: 000000000118feed RSI: 00000000000000f6 RDI: 000000000000083f
> > [13917.281189] RBP: 0000000000000001 R08: 0000000000000000 R09: ffffa153c00b7d60
> > [13917.281193] R10: 0000000000000003 R11: ffffffffacb399e8 R12: ffffffffacc2e320
> > [13917.281196] R13: ffffffffacc2e3a0 R14: 0000000000000001 R15: 0000000000000000
> > [13917.281202] FS:  0000000000000000(0000) GS:ffff8c15ebf80000(0000) knlGS:0000000000000000
> > [13917.281206] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [13917.281210] CR2: 00007f71840b39c8 CR3: 0000000102998000 CR4: 00000000003506e0
> > [13917.281217] Call Trace:
> > [13917.281221]  <TASK>
> > [13917.281228]  cpuidle_enter+0x29/0x40
> > [13917.281244]  do_idle+0x19b/0x200
> > [13917.281292]  cpu_startup_entry+0x19/0x20
> > [13917.281297]  start_secondary+0x101/0x120
> > [13917.281324]  secondary_startup_64_no_verify+0xe5/0xeb
> > [13917.281343]  </TASK>
> > [13917.281346] ---[ end trace 0000000000000000 ]---
> > [17206.750165] BTRFS info (device vdb): using xxhash64 (xxhash64-generic) checksum algorithm
> > [17206.750190] BTRFS info (device vdb): using free space tree
> > [17206.904010] BTRFS info (device vdb): auto enabling async discard
> > [17206.933302] BTRFS info (device vdb): checking UUID tree
> > [17344.541839] sched: RT throttling activated
> > [18284.216538] hrtimer: interrupt took 23434934 ns
> > [18737.100477] BUG: unable to handle page fault for address: 0000000079e0afc0
> 
> BUG
> 
> > [18737.100883] #PF: supervisor read access in kernel mode
> > [18737.101155] #PF: error_code(0x0000) - not-present page
> > [18737.101462] PGD 0 P4D 0 
> > [18737.101715] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [18737.101968] CPU: 1 PID: 25287 Comm: kworker/u8:7 Tainted: G        W          6.3.1-gentoo-mm-patched #4
> > [18737.102391] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> > [18737.102860] Workqueue: btrfs-delalloc btrfs_work_helper
> > [18737.103346] RIP: 0010:find_free_extent+0x20a/0x15c0
> > [18737.103900] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> > [18737.104851] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
> > [18737.105456] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> > [18737.106044] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> > [18737.106519] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> > [18737.107036] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
> > [18737.107363] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
> > [18737.107676] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
> > [18737.107971] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [18737.108260] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
> > [18737.108606] Call Trace:
> > [18737.108964]  <TASK>
> > [18737.109273]  btrfs_reserve_extent+0x148/0x260
> > [18737.109601]  submit_compressed_extents+0x14f/0x490
> > [18737.109934]  async_cow_submit+0x37/0x90
> > [18737.110237]  btrfs_work_helper+0x13d/0x360
> > [18737.110542]  process_one_work+0x20f/0x410
> > [18737.110883]  worker_thread+0x4a/0x3b0
> > [18737.111185]  ? __pfx_worker_thread+0x10/0x10
> > [18737.111482]  kthread+0xda/0x100
> > [18737.111800]  ? __pfx_kthread+0x10/0x10
> > [18737.112097]  ret_from_fork+0x2c/0x50
> > [18737.112387]  </TASK>
> > [18737.112676] Modules linked in: wireguard curve25519_x86_64 libcurve25519_generic ip6_udp_tunnel udp_tunnel cfg80211 rfkill 8021q garp mrp stp llc binfmt_misc intel_rapl_msr intel_rapl_common kvm_amd iTCO_wdt ccp intel_pmc_bxt iTCO_vendor_support kvm i2c_i801 virtio_gpu irqbypass pcspkr virtio_dma_buf joydev i2c_smbus drm_shmem_helper lpc_ich virtio_balloon drm_kms_helper crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_console virtio_net net_failover virtio_scsi failover serio_raw virtio_blk qemu_fw_cfg
> > [18737.114021] CR2: 0000000079e0afc0
> > [18737.114366] ---[ end trace 0000000000000000 ]---
> > [18737.114712] RIP: 0010:find_free_extent+0x20a/0x15c0
> > [18737.115059] Code: 4d 8d ba 10 ff ff ff 48 83 c0 0f 49 8d 97 f0 00 00 00 48 c1 e0 04 48 01 d8 48 39 c2 0f 84 c5 03 00 00 41 c6 85 84 00 00 00 00 <45> 8b 9f b0 00 00 00 45 85 db 0f 85 d8 0c 00 00 45 8b 75 28 4c 89
> > [18737.115864] RSP: 0018:ffffa153c0923bd0 EFLAGS: 00010203
> > [18737.116415] RAX: ffff8c14869240f0 RBX: ffff8c1486924000 RCX: 0000000000000001
> > [18737.117090] RDX: 0000000079e0b000 RSI: 0000000000000100 RDI: ffff8c14869bcc00
> > [18737.117882] RBP: ffff8c148b100000 R08: 0000000000000000 R09: 0000000000000000
> > [18737.118611] R10: 0000000079e0b000 R11: 000000000000151b R12: ffffa153c0923dd7
> > [18737.119416] R13: ffffa153c0923c90 R14: 0000000000000001 R15: 0000000079e0af10
> > [18737.120221] FS:  0000000000000000(0000) GS:ffff8c15ebe80000(0000) knlGS:0000000000000000
> > [18737.120994] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [18737.121868] CR2: 0000000079e0afc0 CR3: 00000001055e8000 CR4: 00000000003506e0
> > [18737.122624] note: kworker/u8:7[25287] exited with irqs disabled
> > [19006.920558] BUG: unable to handle page fault for address: 0000000079e0afc0
> 
> And again, so something is going wrong
> 
Indeed.

I suggest you run your kernel with CONFIG_KASAN=y to see if there are
any use-after-free or out-of-bounds bugs.

--
Uladzislau Rezki

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-05-26 12:24           ` Uladzislau Rezki
@ 2023-07-02 23:28             ` Forza
  2023-07-06  8:08               ` Forza
  0 siblings, 1 reply; 14+ messages in thread
From: Forza @ 2023-07-02 23:28 UTC (permalink / raw)
  To: Uladzislau Rezki, Bagas Sanjaya
  Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable,
	Chris Mason, Josef Bacik, David Sterba, a1bert



---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-26 - 14:24 ----

> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote:
>> This looks like a different set of problems, though all of them seem to
>> start on the compression write path in btrfs.
>> 
>> On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote:
>> > [   8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0
>> > [   13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
>> > [   13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
>> > [13917.280527] ------------[ cut here ]------------
>> > [13917.280753] default_enter_idle leaked IRQ state
>> > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430
>> 
>> 
... Snip
>> 
>> And again, so something is going wrong
>> 
> Indeed.
> 
> I suggest you run your kernel with CONFIG_KASAN=y to see if there are
> any use-after-free or out-of-bounds bugs.
> 
> --
> Uladzislau Rezki


Pardon the delay... I have enabled KASAN and UBSAN on this kernel. It produced a lot of output and plenty of warnings for misalignment. 

The full dmesg is at https://paste.tnonline.net/files/aBoUMuTd5KBC_dmesg.ubsan.txt (approx 1.7MiB)

The full kernel .conf is. at https://paste.tnonline.net/files/z1mX8TWFgZQ3_kernel.conf-kasan-ubsan.txt

A small exctract around what I think is the  default_enter_idle leaked IRQ event. Is this helpful?

================================================================================
Jul 03 00:33:57 git kernel: UBSAN: misaligned-access in net/ipv4/tcp_ipv4.c:1848:13
Jul 03 00:33:57 git kernel: member access within misaligned address 000000007604d82f for type 'const struct tcphdr'
Jul 03 00:33:57 git kernel: which requires 4 byte alignment
Jul 03 00:33:57 git kernel: CPU: 2 PID: 29 Comm: ksoftirqd/2 Not tainted 6.3.10-ksan-ubsan #8
Jul 03 00:33:57 git kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
Jul 03 00:33:57 git kernel: Call Trace:
Jul 03 00:33:57 git kernel:  <TASK>
Jul 03 00:33:57 git kernel:  dump_stack_lvl+0x86/0xd0
Jul 03 00:33:57 git kernel:  ubsan_type_mismatch_common+0xdf/0x240
Jul 03 00:33:57 git kernel:  __ubsan_handle_type_mismatch_v1+0x44/0x60
Jul 03 00:33:57 git kernel:  tcp_add_backlog+0x1fac/0x3ab0
Jul 03 00:33:57 git kernel:  ? sk_filter_trim_cap+0xcc/0xb60
Jul 03 00:33:57 git kernel:  ? __pfx_tcp_add_backlog+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock+0x10/0x10
Jul 03 00:33:57 git kernel:  tcp_v4_rcv+0x3583/0x4c40
Jul 03 00:33:57 git kernel:  ? __pfx_tcp_v4_rcv+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Jul 03 00:33:57 git kernel:  ip_protocol_deliver_rcu+0x6c/0x480
Jul 03 00:33:57 git kernel:  ip_local_deliver_finish+0x2ae/0x4d0
Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
Jul 03 00:33:57 git kernel:  ip_local_deliver+0x1ba/0x380
Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
Jul 03 00:33:57 git kernel:  ? ipv4_dst_check+0x104/0x250
Jul 03 00:33:57 git kernel:  ? __ubsan_handle_type_mismatch_v1+0x44/0x60
Jul 03 00:33:57 git kernel:  ip_sublist_rcv_finish+0x172/0x380
Jul 03 00:33:57 git kernel: ------------[ cut here ]------------
Jul 03 00:33:57 git kernel:  ip_sublist_rcv+0x3cd/0x900
Jul 03 00:33:57 git kernel: default_enter_idle leaked IRQ state
Jul 03 00:33:57 git kernel:  ? __pfx_ip_sublist_rcv+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __ubsan_handle_type_mismatch_v1+0x44/0x60
Jul 03 00:33:57 git kernel:  ? ip_rcv_core+0x972/0x1b20
Jul 03 00:33:57 git kernel:  ip_list_rcv+0x318/0x750
Jul 03 00:33:57 git kernel:  ? __pfx_ip_list_rcv+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx_ip_list_rcv+0x10/0x10
Jul 03 00:33:57 git kernel:  __netif_receive_skb_list_core+0x5ad/0x1170
Jul 03 00:33:57 git kernel:  ? tcp_gro_receive+0x1f45/0x2990
Jul 03 00:33:57 git kernel:  ? __pfx___netif_receive_skb_list_core+0x10/0x10
Jul 03 00:33:57 git kernel:  ? kvm_clock_read+0x16/0x40
Jul 03 00:33:57 git kernel:  ? ktime_get_with_offset+0xd0/0x1f0
Jul 03 00:33:57 git kernel:  netif_receive_skb_list_internal+0x76f/0x1530
Jul 03 00:33:57 git kernel:  ? __pfx_netif_receive_skb_list_internal+0x10/0x10
Jul 03 00:33:57 git kernel:  ? dev_gro_receive+0x67f/0x4900
Jul 03 00:33:57 git kernel:  ? free_unref_page+0x2fd/0x680
Jul 03 00:33:57 git kernel:  ? put_page+0x69/0x2b0
Jul 03 00:33:57 git kernel:  ? __pfx_eth_type_trans+0x10/0x10
Jul 03 00:33:57 git kernel:  napi_gro_receive+0x77b/0xdc0
Jul 03 00:33:57 git kernel:  receive_buf+0x1001/0xac40
Jul 03 00:33:57 git kernel:  ? _raw_spin_lock_irqsave+0xaa/0x180
Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx_receive_buf+0x10/0x10
Jul 03 00:33:57 git kernel:  ? _raw_spin_unlock_irqrestore+0x40/0x80
Jul 03 00:33:57 git kernel:  ? trace_hardirqs_on+0x2d/0xd0
Jul 03 00:33:57 git kernel:  ? detach_buf_split+0x27e/0xa70
Jul 03 00:33:57 git kernel:  ? virtqueue_get_buf_ctx_split+0x3c3/0x1400
Jul 03 00:33:57 git kernel:  ? virtqueue_enable_cb_delayed+0x5d0/0x1180
Jul 03 00:33:57 git kernel:  virtnet_poll+0x7c7/0x2030
Jul 03 00:33:57 git kernel:  ? __pfx_virtnet_poll+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __run_timers+0x43d/0xf70
Jul 03 00:33:57 git kernel:  __napi_poll.constprop.0+0xd4/0x840
Jul 03 00:33:57 git kernel:  net_rx_action+0x7a0/0x26e0
Jul 03 00:33:57 git kernel:  ? __pfx_net_rx_action+0x10/0x10
Jul 03 00:33:57 git kernel:  __do_softirq+0x277/0x95d
Jul 03 00:33:57 git kernel:  ? __pfx___do_softirq+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx_run_ksoftirqd+0x10/0x10
Jul 03 00:33:57 git kernel:  ? __pfx_run_ksoftirqd+0x10/0x10
Jul 03 00:33:57 git kernel:  run_ksoftirqd+0x2c/0x40
Jul 03 00:33:57 git kernel:  smpboot_thread_fn+0x380/0xbc0
Jul 03 00:33:57 git kernel:  ? __kthread_parkme+0xdc/0x280
Jul 03 00:33:57 git kernel:  ? schedule+0x158/0x360
Jul 03 00:33:57 git kernel:  ? __pfx_smpboot_thread_fn+0x10/0x10
Jul 03 00:33:57 git kernel:  kthread+0x259/0x3d0
Jul 03 00:33:57 git kernel:  ? __pfx_kthread+0x10/0x10
Jul 03 00:33:57 git kernel:  ret_from_fork+0x2c/0x50
Jul 03 00:33:57 git kernel:  </TASK>
Jul 03 00:33:57 git kernel: ================================================================================


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-07-02 23:28             ` Forza
@ 2023-07-06  8:08               ` Forza
  2023-07-06 10:54                 ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 14+ messages in thread
From: Forza @ 2023-07-06  8:08 UTC (permalink / raw)
  To: Uladzislau Rezki, Bagas Sanjaya
  Cc: Uladzislau Rezki, Bagas Sanjaya, Linux btrfs,
	Linux Kernel Mailing List, Linux Regressions, Linux Stable,
	Chris Mason, Josef Bacik, David Sterba, a1bert



---- From: Forza <forza@tnonline.net> -- Sent: 2023-07-03 - 01:28 ----

> 
> 
> ---- From: Uladzislau Rezki <urezki@gmail.com> -- Sent: 2023-05-26 - 14:24 ----
> 
>> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote:
>>> This looks like a different set of problems, though all of them seem to
>>> start on the compression write path in btrfs.
>>> 
>>> On Wed, May 24, 2023 at 07:57:19AM +0200, Forza wrote:
>>> > [   8.641506] 8021q: adding VLAN 0 to HW filter on device enp4s0
>>> > [   13.841691] wireguard: WireGuard 1.0.0 loaded. See www.wireguard.com for information.
>>> > [   13.841705] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
>>> > [13917.280527] ------------[ cut here ]------------
>>> > [13917.280753] default_enter_idle leaked IRQ state
>>> > [13917.281004] WARNING: CPU: 3 PID: 0 at drivers/cpuidle/cpuidle.c:269 cpuidle_enter_state+0x3bb/0x430
>>> 
>>> 
> ... Snip
>>> 
>>> And again, so something is going wrong
>>> 
>> Indeed.
>> 
>> I suggest you run your kernel with CONFIG_KASAN=y to see if there are
>> any use-after-free or out-of-bounds bugs.
>> 
>> --
>> Uladzislau Rezki
> 
> 
> Pardon the delay... I have enabled KASAN and UBSAN on this kernel. It produced a lot of output and plenty of warnings for misalignment. 
> 
> The full dmesg is at https://paste.tnonline.net/files/aBoUMuTd5KBC_dmesg.ubsan.txt (approx 1.7MiB)
> 
> The full kernel .conf is. at https://paste.tnonline.net/files/z1mX8TWFgZQ3_kernel.conf-kasan-ubsan.txt
> 
> A small exctract around what I think is the  default_enter_idle leaked IRQ event. Is this helpful?
> 
> ================================================================================
> Jul 03 00:33:57 git kernel: UBSAN: misaligned-access in net/ipv4/tcp_ipv4.c:1848:13
> Jul 03 00:33:57 git kernel: member access within misaligned address 000000007604d82f for type 'const struct tcphdr'
> Jul 03 00:33:57 git kernel: which requires 4 byte alignment
> Jul 03 00:33:57 git kernel: CPU: 2 PID: 29 Comm: ksoftirqd/2 Not tainted 6.3.10-ksan-ubsan #8
> Jul 03 00:33:57 git kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
> Jul 03 00:33:57 git kernel: Call Trace:
> Jul 03 00:33:57 git kernel:  <TASK>
> Jul 03 00:33:57 git kernel:  dump_stack_lvl+0x86/0xd0
> Jul 03 00:33:57 git kernel:  ubsan_type_mismatch_common+0xdf/0x240
> Jul 03 00:33:57 git kernel:  __ubsan_handle_type_mismatch_v1+0x44/0x60
> Jul 03 00:33:57 git kernel:  tcp_add_backlog+0x1fac/0x3ab0
> Jul 03 00:33:57 git kernel:  ? sk_filter_trim_cap+0xcc/0xb60
> Jul 03 00:33:57 git kernel:  ? __pfx_tcp_add_backlog+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock+0x10/0x10
> Jul 03 00:33:57 git kernel:  tcp_v4_rcv+0x3583/0x4c40
> Jul 03 00:33:57 git kernel:  ? __pfx_tcp_v4_rcv+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> Jul 03 00:33:57 git kernel:  ip_protocol_deliver_rcu+0x6c/0x480
> Jul 03 00:33:57 git kernel:  ip_local_deliver_finish+0x2ae/0x4d0
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
> Jul 03 00:33:57 git kernel:  ip_local_deliver+0x1ba/0x380
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_local_deliver+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? ipv4_dst_check+0x104/0x250
> Jul 03 00:33:57 git kernel:  ? __ubsan_handle_type_mismatch_v1+0x44/0x60
> Jul 03 00:33:57 git kernel:  ip_sublist_rcv_finish+0x172/0x380
> Jul 03 00:33:57 git kernel: ------------[ cut here ]------------
> Jul 03 00:33:57 git kernel:  ip_sublist_rcv+0x3cd/0x900
> Jul 03 00:33:57 git kernel: default_enter_idle leaked IRQ state
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_sublist_rcv+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __ubsan_handle_type_mismatch_v1+0x44/0x60
> Jul 03 00:33:57 git kernel:  ? ip_rcv_core+0x972/0x1b20
> Jul 03 00:33:57 git kernel:  ip_list_rcv+0x318/0x750
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_list_rcv+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx_ip_list_rcv+0x10/0x10
> Jul 03 00:33:57 git kernel:  __netif_receive_skb_list_core+0x5ad/0x1170
> Jul 03 00:33:57 git kernel:  ? tcp_gro_receive+0x1f45/0x2990
> Jul 03 00:33:57 git kernel:  ? __pfx___netif_receive_skb_list_core+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? kvm_clock_read+0x16/0x40
> Jul 03 00:33:57 git kernel:  ? ktime_get_with_offset+0xd0/0x1f0
> Jul 03 00:33:57 git kernel:  netif_receive_skb_list_internal+0x76f/0x1530
> Jul 03 00:33:57 git kernel:  ? __pfx_netif_receive_skb_list_internal+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? dev_gro_receive+0x67f/0x4900
> Jul 03 00:33:57 git kernel:  ? free_unref_page+0x2fd/0x680
> Jul 03 00:33:57 git kernel:  ? put_page+0x69/0x2b0
> Jul 03 00:33:57 git kernel:  ? __pfx_eth_type_trans+0x10/0x10
> Jul 03 00:33:57 git kernel:  napi_gro_receive+0x77b/0xdc0
> Jul 03 00:33:57 git kernel:  receive_buf+0x1001/0xac40
> Jul 03 00:33:57 git kernel:  ? _raw_spin_lock_irqsave+0xaa/0x180
> Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx_receive_buf+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? _raw_spin_unlock_irqrestore+0x40/0x80
> Jul 03 00:33:57 git kernel:  ? trace_hardirqs_on+0x2d/0xd0
> Jul 03 00:33:57 git kernel:  ? detach_buf_split+0x27e/0xa70
> Jul 03 00:33:57 git kernel:  ? virtqueue_get_buf_ctx_split+0x3c3/0x1400
> Jul 03 00:33:57 git kernel:  ? virtqueue_enable_cb_delayed+0x5d0/0x1180
> Jul 03 00:33:57 git kernel:  virtnet_poll+0x7c7/0x2030
> Jul 03 00:33:57 git kernel:  ? __pfx_virtnet_poll+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx__raw_spin_lock+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __run_timers+0x43d/0xf70
> Jul 03 00:33:57 git kernel:  __napi_poll.constprop.0+0xd4/0x840
> Jul 03 00:33:57 git kernel:  net_rx_action+0x7a0/0x26e0
> Jul 03 00:33:57 git kernel:  ? __pfx_net_rx_action+0x10/0x10
> Jul 03 00:33:57 git kernel:  __do_softirq+0x277/0x95d
> Jul 03 00:33:57 git kernel:  ? __pfx___do_softirq+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx_run_ksoftirqd+0x10/0x10
> Jul 03 00:33:57 git kernel:  ? __pfx_run_ksoftirqd+0x10/0x10
> Jul 03 00:33:57 git kernel:  run_ksoftirqd+0x2c/0x40
> Jul 03 00:33:57 git kernel:  smpboot_thread_fn+0x380/0xbc0
> Jul 03 00:33:57 git kernel:  ? __kthread_parkme+0xdc/0x280
> Jul 03 00:33:57 git kernel:  ? schedule+0x158/0x360
> Jul 03 00:33:57 git kernel:  ? __pfx_smpboot_thread_fn+0x10/0x10
> Jul 03 00:33:57 git kernel:  kthread+0x259/0x3d0
> Jul 03 00:33:57 git kernel:  ? __pfx_kthread+0x10/0x10
> Jul 03 00:33:57 git kernel:  ret_from_fork+0x2c/0x50
> Jul 03 00:33:57 git kernel:  </TASK>
> Jul 03 00:33:57 git kernel: ================================================================================
> 


A small update.

I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show the same issue.

I am now trying 6.1.37 since two days and have not been able to reproduce this issue on any of my virtual qemu/kvm machines. Perhaps this information is helpful in finding the root cause?

~Forza 


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-07-06  8:08               ` Forza
@ 2023-07-06 10:54                 ` Linux regression tracking (Thorsten Leemhuis)
  2023-07-07 10:13                   ` Forza
  0 siblings, 1 reply; 14+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-07-06 10:54 UTC (permalink / raw)
  To: Forza, Uladzislau Rezki, Bagas Sanjaya
  Cc: Linux btrfs, Linux Kernel Mailing List, Linux Regressions,
	Linux Stable, Chris Mason, Josef Bacik, David Sterba, a1bert

On 06.07.23 10:08, Forza wrote:
>>> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote:
> [...]
> A small update.

Thx for this.

> I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show
> the same issue.
> 
> I am now trying 6.1.37 since two days and have not been able to
> reproduce this issue on any of my virtual qemu/kvm machines. Perhaps
> this information is helpful in finding the root cause?

That means it's most likely a regression between v6.1..v6.2 (or
v6.1..v6.2.16 if we are unlucky) somewhere (from earlier in the thread
it sounds like it might not be Btrfs).

Which makes we wonder: how long do you usually need to reproduce the
issue? If it's not too long it might mean that a bisection is the best
way forward, unless some developer sits down and looks closely at the
logs. With a bit of luck some dev will do that; but if we are unlucky we
likely will need a bisection.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x
  2023-07-06 10:54                 ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-07-07 10:13                   ` Forza
  0 siblings, 0 replies; 14+ messages in thread
From: Forza @ 2023-07-07 10:13 UTC (permalink / raw)
  To: Linux regressions mailing list, Uladzislau Rezki, Bagas Sanjaya
  Cc: Linux btrfs, Linux Kernel Mailing List, Linux Stable, Chris Mason,
	Josef Bacik, David Sterba, a1bert



On 2023-07-06 12:54, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 06.07.23 10:08, Forza wrote:
>>>> On Wed, May 24, 2023 at 11:13:57AM +0200, David Sterba wrote:
>> [...]
>> A small update.
> 
> Thx for this.
> 
>> I have been able test 6.2.16, all 6.3.x and 6.4.1 and they all show
>> the same issue.
>>
>> I am now trying 6.1.37 since two days and have not been able to
>> reproduce this issue on any of my virtual qemu/kvm machines. Perhaps
>> this information is helpful in finding the root cause?
> 
> That means it's most likely a regression between v6.1..v6.2 (or
> v6.1..v6.2.16 if we are unlucky) somewhere (from earlier in the thread
> it sounds like it might not be Btrfs).

Agreed, I do not think this specific bug (cpuidle /  default_enter_idle 
leaked IRQ state) is Btrfs related. Some of the virtual machines I test 
on do not use Btrfs.
> 
> Which makes we wonder: how long do you usually need to reproduce the
> issue? If it's not too long it might mean that a bisection is the best
> way forward, unless some developer sits down and looks closely at the
> logs. With a bit of luck some dev will do that; but if we are unlucky we
> likely will need a bisection.
> 

It has varied. Sometimes immediately upon boot, but can take several 
hours or a day before showing up.


Also, I forgot to say I was basing my kernels on gentoo-kernels, which 
has some patches against vanilla. Therefore I will I will compile a set 
of vanilla kernels from 6.1.37 until 6.4.2 and run them in my testing 
machines to see where the problem is happening.

This is not a fast system, so it will likely take several days. But I 
will keep you posted.

Meanwhile, if you think of any specific kernel debug options, tracing, 
etc, that I should enable, let me know

Should we change the Subject line of this email thread?

Thanks

~Forza

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-07-07 10:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-22 13:25 Fwd: vmalloc error: btrfs-delalloc btrfs_work_helper [btrfs] in kernel 6.3.x Bagas Sanjaya
2023-05-22 16:00 ` Uladzislau Rezki
2023-05-22 19:09   ` David Sterba
2023-05-22 19:04 ` Forza
2023-05-23  1:52   ` Bagas Sanjaya
2023-05-23 10:28     ` Uladzislau Rezki
2023-05-23 21:25       ` Forza
2023-05-24  5:57       ` Forza
2023-05-24  9:13         ` David Sterba
2023-05-26 12:24           ` Uladzislau Rezki
2023-07-02 23:28             ` Forza
2023-07-06  8:08               ` Forza
2023-07-06 10:54                 ` Linux regression tracking (Thorsten Leemhuis)
2023-07-07 10:13                   ` Forza

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox