* Panic/lockup in z3fold_zpool_free @ 2022-09-22 6:53 Oleksandr Natalenko 2022-09-22 11:37 ` Brian Foster 0 siblings, 1 reply; 7+ messages in thread From: Oleksandr Natalenko @ 2022-09-22 6:53 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin Hello. Since 5.19 series, zswap went unstable for me under memory pressure, and occasionally I get the following: ``` watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix virtio_pci_modern_dev i8042 floppy serio usbhid Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 acpi_cpufreq():1 CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 12baccda8e49539e158b9dd97cbda6c7317d73af Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 Call Trace: <TASK> zswap_free_entry+0xb5/0x110 zswap_frontswap_invalidate_page+0x72/0xa0 __frontswap_invalidate_page+0x3a/0x60 swap_range_free+0xb5/0xd0 swapcache_free_entries+0x16e/0x2e0 free_swap_slot+0xb4/0xc0 put_swap_page+0x259/0x420 delete_from_swap_cache+0x63/0xb0 try_to_free_swap+0x1b5/0x2a0 do_swap_page+0x24c/0xb80 __handle_mm_fault+0xa59/0xf70 handle_mm_fault+0x100/0x2f0 do_user_addr_fault+0x1c7/0x6a0 exc_page_fault+0x74/0x170 asm_exc_page_fault+0x26/0x30 RIP: 0033:0x556e96280428 Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 </TASK> ``` This happens on the latest v5.19.10 kernel as well. Sometimes it's not a soft lockup but GPF, although the stack trace is the same. So, to me it looks like a memory corruption, UAF, double free or something like that. Have you got any idea regarding what's going on? Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-09-22 6:53 Panic/lockup in z3fold_zpool_free Oleksandr Natalenko @ 2022-09-22 11:37 ` Brian Foster 2022-09-23 8:33 ` Oleksandr Natalenko 0 siblings, 1 reply; 7+ messages in thread From: Brian Foster @ 2022-09-22 11:37 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote: > Hello. > > Since 5.19 series, zswap went unstable for me under memory pressure, and > occasionally I get the following: > > ``` > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix > virtio_pci_modern_dev i8042 floppy serio usbhid > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 > acpi_cpufreq():1 > CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 > 12baccda8e49539e158b9dd97cbda6c7317d73af > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 > FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 > Call Trace: > <TASK> > zswap_free_entry+0xb5/0x110 > zswap_frontswap_invalidate_page+0x72/0xa0 > __frontswap_invalidate_page+0x3a/0x60 > swap_range_free+0xb5/0xd0 > swapcache_free_entries+0x16e/0x2e0 > free_swap_slot+0xb4/0xc0 > put_swap_page+0x259/0x420 > delete_from_swap_cache+0x63/0xb0 > try_to_free_swap+0x1b5/0x2a0 > do_swap_page+0x24c/0xb80 > __handle_mm_fault+0xa59/0xf70 > handle_mm_fault+0x100/0x2f0 > do_user_addr_fault+0x1c7/0x6a0 > exc_page_fault+0x74/0x170 > asm_exc_page_fault+0x26/0x30 > RIP: 0033:0x556e96280428 > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 > </TASK> > ``` > > This happens on the latest v5.19.10 kernel as well. > > Sometimes it's not a soft lockup but GPF, although the stack trace is the > same. So, to me it looks like a memory corruption, UAF, double free or > something like that. > > Have you got any idea regarding what's going on? > It might be unrelated, but this looks somewhat similar to a problem I hit recently that is caused by swap entry data stored in page->private being clobbered when splitting a huge page. That problem was introduced in v5.19, so that potentially lines up as well. More details in the links below. [1] includes a VM_BUG_ON() splat with DEBUG_VM enabled, but the problem originally manifested as a soft lockup without the debug checks enabled. [2] includes a properly formatted patch. Any chance you could give that a try? Brian [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ > Thanks. > > -- > Oleksandr Natalenko (post-factum) > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-09-22 11:37 ` Brian Foster @ 2022-09-23 8:33 ` Oleksandr Natalenko 2022-10-06 15:52 ` Oleksandr Natalenko 0 siblings, 1 reply; 7+ messages in thread From: Oleksandr Natalenko @ 2022-09-23 8:33 UTC (permalink / raw) To: Brian Foster Cc: linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin Hello. On čtvrtek 22. září 2022 13:37:36 CEST Brian Foster wrote: > On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote: > > Since 5.19 series, zswap went unstable for me under memory pressure, and > > occasionally I get the following: > > > > ``` > > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] > > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr > > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 > > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 > > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c > > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm > > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel > > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover > > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd > > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix > > virtio_pci_modern_dev i8042 floppy serio usbhid > > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 > > acpi_cpufreq():1 > > CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 > > 12baccda8e49539e158b9dd97cbda6c7317d73af > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 > > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 > > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 > > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b > > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e > > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 > > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 > > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 > > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 > > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 > > FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 > > Call Trace: > > <TASK> > > zswap_free_entry+0xb5/0x110 > > zswap_frontswap_invalidate_page+0x72/0xa0 > > __frontswap_invalidate_page+0x3a/0x60 > > swap_range_free+0xb5/0xd0 > > swapcache_free_entries+0x16e/0x2e0 > > free_swap_slot+0xb4/0xc0 > > put_swap_page+0x259/0x420 > > delete_from_swap_cache+0x63/0xb0 > > try_to_free_swap+0x1b5/0x2a0 > > do_swap_page+0x24c/0xb80 > > __handle_mm_fault+0xa59/0xf70 > > handle_mm_fault+0x100/0x2f0 > > do_user_addr_fault+0x1c7/0x6a0 > > exc_page_fault+0x74/0x170 > > asm_exc_page_fault+0x26/0x30 > > RIP: 0033:0x556e96280428 > > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 > > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 > > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 > > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 > > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 > > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 > > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 > > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 > > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 > > </TASK> > > ``` > > > > This happens on the latest v5.19.10 kernel as well. > > > > Sometimes it's not a soft lockup but GPF, although the stack trace is the > > same. So, to me it looks like a memory corruption, UAF, double free or > > something like that. > > > > Have you got any idea regarding what's going on? > > > > It might be unrelated, but this looks somewhat similar to a problem I > hit recently that is caused by swap entry data stored in page->private > being clobbered when splitting a huge page. That problem was introduced > in v5.19, so that potentially lines up as well. > > More details in the links below. [1] includes a VM_BUG_ON() splat with > DEBUG_VM enabled, but the problem originally manifested as a soft lockup > without the debug checks enabled. [2] includes a properly formatted > patch. Any chance you could give that a try? Thanks for your reply. I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit. > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-09-23 8:33 ` Oleksandr Natalenko @ 2022-10-06 15:52 ` Oleksandr Natalenko 2022-10-17 16:13 ` Brian Foster 0 siblings, 1 reply; 7+ messages in thread From: Oleksandr Natalenko @ 2022-10-06 15:52 UTC (permalink / raw) To: Brian Foster Cc: linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin Hello. On pátek 23. září 2022 10:33:14 CEST Oleksandr Natalenko wrote: > On čtvrtek 22. září 2022 13:37:36 CEST Brian Foster wrote: > > On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote: > > > Since 5.19 series, zswap went unstable for me under memory pressure, and > > > occasionally I get the following: > > > > > > ``` > > > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] > > > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr > > > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 > > > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 > > > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c > > > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm > > > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel > > > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover > > > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd > > > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix > > > virtio_pci_modern_dev i8042 floppy serio usbhid > > > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 > > > acpi_cpufreq():1 > > > CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 > > > 12baccda8e49539e158b9dd97cbda6c7317d73af > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 > > > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 > > > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 > > > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b > > > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e > > > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 > > > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 > > > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 > > > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b > > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 > > > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 > > > FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 > > > Call Trace: > > > <TASK> > > > zswap_free_entry+0xb5/0x110 > > > zswap_frontswap_invalidate_page+0x72/0xa0 > > > __frontswap_invalidate_page+0x3a/0x60 > > > swap_range_free+0xb5/0xd0 > > > swapcache_free_entries+0x16e/0x2e0 > > > free_swap_slot+0xb4/0xc0 > > > put_swap_page+0x259/0x420 > > > delete_from_swap_cache+0x63/0xb0 > > > try_to_free_swap+0x1b5/0x2a0 > > > do_swap_page+0x24c/0xb80 > > > __handle_mm_fault+0xa59/0xf70 > > > handle_mm_fault+0x100/0x2f0 > > > do_user_addr_fault+0x1c7/0x6a0 > > > exc_page_fault+0x74/0x170 > > > asm_exc_page_fault+0x26/0x30 > > > RIP: 0033:0x556e96280428 > > > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 > > > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 > > > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 > > > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 > > > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 > > > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 > > > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 > > > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 > > > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 > > > </TASK> > > > ``` > > > > > > This happens on the latest v5.19.10 kernel as well. > > > > > > Sometimes it's not a soft lockup but GPF, although the stack trace is the > > > same. So, to me it looks like a memory corruption, UAF, double free or > > > something like that. > > > > > > Have you got any idea regarding what's going on? > > > > > > > It might be unrelated, but this looks somewhat similar to a problem I > > hit recently that is caused by swap entry data stored in page->private > > being clobbered when splitting a huge page. That problem was introduced > > in v5.19, so that potentially lines up as well. > > > > More details in the links below. [1] includes a VM_BUG_ON() splat with > > DEBUG_VM enabled, but the problem originally manifested as a soft lockup > > without the debug checks enabled. [2] includes a properly formatted > > patch. Any chance you could give that a try? > > Thanks for your reply. > > I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit. > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ So far, I haven't reproduced this issue with your patch. I haven't run the machine sufficiently long, just under a week, so this is rather to let you know that I haven't abandoned testing. Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-10-06 15:52 ` Oleksandr Natalenko @ 2022-10-17 16:13 ` Brian Foster 2022-10-17 16:34 ` Oleksandr Natalenko 0 siblings, 1 reply; 7+ messages in thread From: Brian Foster @ 2022-10-17 16:13 UTC (permalink / raw) To: Oleksandr Natalenko Cc: linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin On Thu, Oct 06, 2022 at 05:52:52PM +0200, Oleksandr Natalenko wrote: > Hello. > > On pátek 23. září 2022 10:33:14 CEST Oleksandr Natalenko wrote: > > On čtvrtek 22. září 2022 13:37:36 CEST Brian Foster wrote: > > > On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote: > > > > Since 5.19 series, zswap went unstable for me under memory pressure, and > > > > occasionally I get the following: > > > > > > > > ``` > > > > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] > > > > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr > > > > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 > > > > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 > > > > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c > > > > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm > > > > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel > > > > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover > > > > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd > > > > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix > > > > virtio_pci_modern_dev i8042 floppy serio usbhid > > > > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 > > > > acpi_cpufreq():1 > > > > CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 > > > > 12baccda8e49539e158b9dd97cbda6c7317d73af > > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 > > > > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 > > > > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 > > > > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b > > > > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e > > > > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 > > > > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 > > > > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 > > > > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b > > > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 > > > > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 > > > > FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 > > > > Call Trace: > > > > <TASK> > > > > zswap_free_entry+0xb5/0x110 > > > > zswap_frontswap_invalidate_page+0x72/0xa0 > > > > __frontswap_invalidate_page+0x3a/0x60 > > > > swap_range_free+0xb5/0xd0 > > > > swapcache_free_entries+0x16e/0x2e0 > > > > free_swap_slot+0xb4/0xc0 > > > > put_swap_page+0x259/0x420 > > > > delete_from_swap_cache+0x63/0xb0 > > > > try_to_free_swap+0x1b5/0x2a0 > > > > do_swap_page+0x24c/0xb80 > > > > __handle_mm_fault+0xa59/0xf70 > > > > handle_mm_fault+0x100/0x2f0 > > > > do_user_addr_fault+0x1c7/0x6a0 > > > > exc_page_fault+0x74/0x170 > > > > asm_exc_page_fault+0x26/0x30 > > > > RIP: 0033:0x556e96280428 > > > > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 > > > > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 > > > > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 > > > > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 > > > > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 > > > > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 > > > > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 > > > > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 > > > > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 > > > > </TASK> > > > > ``` > > > > > > > > This happens on the latest v5.19.10 kernel as well. > > > > > > > > Sometimes it's not a soft lockup but GPF, although the stack trace is the > > > > same. So, to me it looks like a memory corruption, UAF, double free or > > > > something like that. > > > > > > > > Have you got any idea regarding what's going on? > > > > > > > > > > It might be unrelated, but this looks somewhat similar to a problem I > > > hit recently that is caused by swap entry data stored in page->private > > > being clobbered when splitting a huge page. That problem was introduced > > > in v5.19, so that potentially lines up as well. > > > > > > More details in the links below. [1] includes a VM_BUG_ON() splat with > > > DEBUG_VM enabled, but the problem originally manifested as a soft lockup > > > without the debug checks enabled. [2] includes a properly formatted > > > patch. Any chance you could give that a try? > > > > Thanks for your reply. > > > > I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit. > > > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ > > So far, I haven't reproduced this issue with your patch. I haven't run the machine sufficiently long, just under a week, so this is rather to let you know that I haven't abandoned testing. > Thanks for the update. Is this still going well, or reached a point where you typically see the problem? I can still reproduce the original problem so I may have to ping the patch again.. Brian > Thanks. > > > -- > Oleksandr Natalenko (post-factum) > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-10-17 16:13 ` Brian Foster @ 2022-10-17 16:34 ` Oleksandr Natalenko 2022-10-17 22:24 ` Andrew Morton 0 siblings, 1 reply; 7+ messages in thread From: Oleksandr Natalenko @ 2022-10-17 16:34 UTC (permalink / raw) To: Brian Foster Cc: linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Andrew Morton, Miaohe Lin Hello. On pondělí 17. října 2022 18:13:00 CEST Brian Foster wrote: > On Thu, Oct 06, 2022 at 05:52:52PM +0200, Oleksandr Natalenko wrote: > > On pátek 23. září 2022 10:33:14 CEST Oleksandr Natalenko wrote: > > > On čtvrtek 22. září 2022 13:37:36 CEST Brian Foster wrote: > > > > On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote: > > > > > Since 5.19 series, zswap went unstable for me under memory pressure, and > > > > > occasionally I get the following: > > > > > > > > > > ``` > > > > > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478] > > > > > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr > > > > > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2 > > > > > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6 > > > > > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c > > > > > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm > > > > > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel > > > > > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover > > > > > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd > > > > > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix > > > > > virtio_pci_modern_dev i8042 floppy serio usbhid > > > > > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1 > > > > > acpi_cpufreq():1 > > > > > CPU: 0 PID: 478 Comm: mariadbd Tainted: G L 5.19.0-pf5 #1 > > > > > 12baccda8e49539e158b9dd97cbda6c7317d73af > > > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014 > > > > > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0 > > > > > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48 > > > > > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b > > > > > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e > > > > > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202 > > > > > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380 > > > > > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010 > > > > > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b > > > > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50 > > > > > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000 > > > > > FS: 00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000 > > > > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > > > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0 > > > > > Call Trace: > > > > > <TASK> > > > > > zswap_free_entry+0xb5/0x110 > > > > > zswap_frontswap_invalidate_page+0x72/0xa0 > > > > > __frontswap_invalidate_page+0x3a/0x60 > > > > > swap_range_free+0xb5/0xd0 > > > > > swapcache_free_entries+0x16e/0x2e0 > > > > > free_swap_slot+0xb4/0xc0 > > > > > put_swap_page+0x259/0x420 > > > > > delete_from_swap_cache+0x63/0xb0 > > > > > try_to_free_swap+0x1b5/0x2a0 > > > > > do_swap_page+0x24c/0xb80 > > > > > __handle_mm_fault+0xa59/0xf70 > > > > > handle_mm_fault+0x100/0x2f0 > > > > > do_user_addr_fault+0x1c7/0x6a0 > > > > > exc_page_fault+0x74/0x170 > > > > > asm_exc_page_fault+0x26/0x30 > > > > > RIP: 0033:0x556e96280428 > > > > > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72 > > > > > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00 > > > > > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82 > > > > > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206 > > > > > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000 > > > > > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030 > > > > > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000 > > > > > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8 > > > > > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928 > > > > > </TASK> > > > > > ``` > > > > > > > > > > This happens on the latest v5.19.10 kernel as well. > > > > > > > > > > Sometimes it's not a soft lockup but GPF, although the stack trace is the > > > > > same. So, to me it looks like a memory corruption, UAF, double free or > > > > > something like that. > > > > > > > > > > Have you got any idea regarding what's going on? > > > > > > > > > > > > > It might be unrelated, but this looks somewhat similar to a problem I > > > > hit recently that is caused by swap entry data stored in page->private > > > > being clobbered when splitting a huge page. That problem was introduced > > > > in v5.19, so that potentially lines up as well. > > > > > > > > More details in the links below. [1] includes a VM_BUG_ON() splat with > > > > DEBUG_VM enabled, but the problem originally manifested as a soft lockup > > > > without the debug checks enabled. [2] includes a properly formatted > > > > patch. Any chance you could give that a try? > > > > > > Thanks for your reply. > > > > > > I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit. > > > > > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > > > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ > > > > So far, I haven't reproduced this issue with your patch. I haven't run the machine sufficiently long, just under a week, so this is rather to let you know that I haven't abandoned testing. > > > > Thanks for the update. Is this still going well, or reached a point > where you typically see the problem? I can still reproduce the original > problem so I may have to ping the patch again.. So far, no issue observed with your patch. Thanks. -- Oleksandr Natalenko (post-factum) ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Panic/lockup in z3fold_zpool_free 2022-10-17 16:34 ` Oleksandr Natalenko @ 2022-10-17 22:24 ` Andrew Morton 0 siblings, 0 replies; 7+ messages in thread From: Andrew Morton @ 2022-10-17 22:24 UTC (permalink / raw) To: Oleksandr Natalenko Cc: Brian Foster, linux-kernel, linux-mm, Seth Jennings, Dan Streetman, Vitaly Wool, Miaohe Lin, Matthew Wilcox On Mon, 17 Oct 2022 18:34:50 +0200 Oleksandr Natalenko <oleksandr@natalenko.name> wrote: > > > > I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit. > > > > > > > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/ > > > > > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ > > > > > > So far, I haven't reproduced this issue with your patch. I haven't run the machine sufficiently long, just under a week, so this is rather to let you know that I haven't abandoned testing. > > > > > > > Thanks for the update. Is this still going well, or reached a point > > where you typically see the problem? I can still reproduce the original > > problem so I may have to ping the patch again.. > > So far, no issue observed with your patch. Thanks. It's actually unclear (to me) why Matthew's b653db77350c73 ("mm: Clear page->private when splitting or migrating a page") was considered necessary. What problem did it solve? https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/ is a partial undoing of that change, but should we simply revert b653db77350c73? ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-10-17 22:24 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-09-22 6:53 Panic/lockup in z3fold_zpool_free Oleksandr Natalenko 2022-09-22 11:37 ` Brian Foster 2022-09-23 8:33 ` Oleksandr Natalenko 2022-10-06 15:52 ` Oleksandr Natalenko 2022-10-17 16:13 ` Brian Foster 2022-10-17 16:34 ` Oleksandr Natalenko 2022-10-17 22:24 ` Andrew Morton
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).