All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oleksandr Natalenko <oleksandr@natalenko.name>
To: Brian Foster <bfoster@redhat.com>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Seth Jennings <sjenning@redhat.com>,
	Dan Streetman <ddstreet@ieee.org>,
	Vitaly Wool <vitaly.wool@konsulko.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Miaohe Lin <linmiaohe@huawei.com>
Subject: Re: Panic/lockup in z3fold_zpool_free
Date: Mon, 17 Oct 2022 18:34:50 +0200	[thread overview]
Message-ID: <4433520.LvFx2qVVIh@natalenko.name> (raw)
In-Reply-To: <Y01/DOMdAJQPDaxZ@bfoster>

Hello.

On pondělí 17. října 2022 18:13:00 CEST Brian Foster wrote:
> On Thu, Oct 06, 2022 at 05:52:52PM +0200, Oleksandr Natalenko wrote:
> > On pátek 23. září 2022 10:33:14 CEST Oleksandr Natalenko wrote:
> > > On čtvrtek 22. září 2022 13:37:36 CEST Brian Foster wrote:
> > > > On Thu, Sep 22, 2022 at 08:53:09AM +0200, Oleksandr Natalenko wrote:
> > > > > Since 5.19 series, zswap went unstable for me under memory pressure, and
> > > > > occasionally I get the following:
> > > > > 
> > > > > ```
> > > > > watchdog: BUG: soft lockup - CPU#0 stuck for 10195s! [mariadbd:478]
> > > > > Modules linked in: netconsole joydev mousedev intel_agp psmouse pcspkr
> > > > > intel_gtt cfg80211 cirrus i2c_piix4 tun rfkill mac_hid nft_ct tcp_bbr2
> > > > > nft_chain_nat nf_tables nfnetlink nf_nat nf_conntrack nf_defrag_ipv6
> > > > > nf_defrag_ipv4 fuse qemu_fw_cfg ip_tables x_tables xfs libcrc32c
> > > > > crc32c_generic dm_crypt cbc encrypted_keys trusted asn1_encoder tee tpm
> > > > > rng_core dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel
> > > > > ghash_clmulni_intel virtio_net aesni_intel serio_raw net_failover
> > > > > ata_generic virtio_balloon failover pata_acpi crypto_simd virtio_blk atkbd
> > > > > libps2 vivaldi_fmap virtio_pci cryptd virtio_pci_legacy_dev ata_piix
> > > > > virtio_pci_modern_dev i8042 floppy serio usbhid
> > > > > Unloaded tainted modules: intel_cstate():1 intel_uncore():1 pcc_cpufreq():1
> > > > > acpi_cpufreq():1
> > > > > CPU: 0 PID: 478 Comm: mariadbd Tainted: G             L    5.19.0-pf5 #1
> > > > > 12baccda8e49539e158b9dd97cbda6c7317d73af
> > > > > Hardware name: Red Hat KVM, BIOS 1.11.0-2.el7 04/01/2014
> > > > > RIP: 0010:z3fold_zpool_free+0x4c/0x5e0
> > > > > Code: 7c 24 08 48 89 04 24 0f 85 e0 00 00 00 48 89 f5 41 bd 00 00 00 80 48
> > > > > 83 e5 c0 48 83 c5 28 eb 0a 48 89 df e8 b6 8d 9f 00 f3 90 <48> 89 ef e8 bc 8b
> > > > > 9f 00 4d 8b 34 24 49 81 e6 00 f0 ff ff 49 8d 5e
> > > > > RSP: 0000:ffffbeadc0e87b68 EFLAGS: 00000202
> > > > > RAX: 0000000000000030 RBX: ffff99ac73d2c010 RCX: ffff99ac4e4ba380
> > > > > RDX: 0000665340000000 RSI: ffffe3b540000000 RDI: ffff99ac73d2c010
> > > > > RBP: ffff99ac55ef3a68 R08: ffff99ac422f0bf0 R09: 000000000000c60b
> > > > > R10: ffffffffffffffc0 R11: 0000000000000000 R12: ffff99ac55ef3a50
> > > > > R13: 0000000080000000 R14: ffff99ac73d2c000 R15: ffff99acf3d2c000
> > > > > FS:  00007f587fcd66c0(0000) GS:ffff99ac7ec00000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00007f587ce8bec8 CR3: 0000000005b48006 CR4: 00000000000206f0
> > > > > Call Trace:
> > > > >  <TASK>
> > > > >  zswap_free_entry+0xb5/0x110
> > > > >  zswap_frontswap_invalidate_page+0x72/0xa0
> > > > >  __frontswap_invalidate_page+0x3a/0x60
> > > > >  swap_range_free+0xb5/0xd0
> > > > >  swapcache_free_entries+0x16e/0x2e0
> > > > >  free_swap_slot+0xb4/0xc0
> > > > >  put_swap_page+0x259/0x420
> > > > >  delete_from_swap_cache+0x63/0xb0
> > > > >  try_to_free_swap+0x1b5/0x2a0
> > > > >  do_swap_page+0x24c/0xb80
> > > > >  __handle_mm_fault+0xa59/0xf70
> > > > >  handle_mm_fault+0x100/0x2f0
> > > > >  do_user_addr_fault+0x1c7/0x6a0
> > > > >  exc_page_fault+0x74/0x170
> > > > >  asm_exc_page_fault+0x26/0x30
> > > > > RIP: 0033:0x556e96280428
> > > > > Code: a0 03 00 00 67 e8 28 64 ff ff 48 8b 83 b0 00 00 00 48 8b 0d da 18 72
> > > > > 00 48 8b 10 66 48 0f 6e c1 48 85 d2 74 27 0f 1f 44 00 00 <48> c7 82 98 00 00
> > > > > 00 00 00 00 00 48 8b 10 48 83 c0 08 f2 0f 11 82
> > > > > RSP: 002b:00007f587fcd3980 EFLAGS: 00010206
> > > > > RAX: 00007f587d028468 RBX: 00007f587cb1a818 RCX: 3ff0000000000000
> > > > > RDX: 00007f587ce8be30 RSI: 0000000000000000 RDI: 00007f587cedd030
> > > > > RBP: 00007f587fcd39c0 R08: 0000000000000016 R09: 0000000000000000
> > > > > R10: 0000000000000008 R11: 0000556e970961a0 R12: 00007f587d1f17b8
> > > > > R13: 00007f5883595598 R14: 00007f587d1f17a8 R15: 00007f587cb1a928
> > > > >  </TASK>
> > > > > ```
> > > > > 
> > > > > This happens on the latest v5.19.10 kernel as well.
> > > > > 
> > > > > Sometimes it's not a soft lockup but GPF, although the stack trace is the
> > > > > same. So, to me it looks like a memory corruption, UAF, double free or
> > > > > something like that.
> > > > > 
> > > > > Have you got any idea regarding what's going on?
> > > > > 
> > > > 
> > > > It might be unrelated, but this looks somewhat similar to a problem I
> > > > hit recently that is caused by swap entry data stored in page->private
> > > > being clobbered when splitting a huge page. That problem was introduced
> > > > in v5.19, so that potentially lines up as well.
> > > > 
> > > > More details in the links below. [1] includes a VM_BUG_ON() splat with
> > > > DEBUG_VM enabled, but the problem originally manifested as a soft lockup
> > > > without the debug checks enabled. [2] includes a properly formatted
> > > > patch. Any chance you could give that a try?
> > > 
> > > Thanks for your reply.
> > > 
> > > I'll give it a try. The only problem is that for me the issue is not reproducible at will, it can take 1 day, or it can take 2 weeks before the panic is hit.
> > > 
> > > > [1] https://lore.kernel.org/linux-mm/YxDyZLfBdFHK1Y1P@bfoster/
> > > > [2] https://lore.kernel.org/linux-mm/20220906190602.1626037-1-bfoster@redhat.com/
> > 
> > So far, I haven't reproduced this issue with your patch. I haven't run the machine sufficiently long, just under a week, so this is rather to let you know that I haven't abandoned testing.
> > 
> 
> Thanks for the update. Is this still going well, or reached a point
> where you typically see the problem? I can still reproduce the original
> problem so I may have to ping the patch again..

So far, no issue observed with your patch.

Thanks.

-- 
Oleksandr Natalenko (post-factum)




  reply	other threads:[~2022-10-17 16:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-22  6:53 Panic/lockup in z3fold_zpool_free Oleksandr Natalenko
2022-09-22 11:37 ` Brian Foster
2022-09-23  8:33   ` Oleksandr Natalenko
2022-10-06 15:52     ` Oleksandr Natalenko
2022-10-17 16:13       ` Brian Foster
2022-10-17 16:34         ` Oleksandr Natalenko [this message]
2022-10-17 22:24           ` Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4433520.LvFx2qVVIh@natalenko.name \
    --to=oleksandr@natalenko.name \
    --cc=akpm@linux-foundation.org \
    --cc=bfoster@redhat.com \
    --cc=ddstreet@ieee.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sjenning@redhat.com \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.