All of lore.kernel.org
 help / color / mirror / Atom feed
From: Antal Nemes <antal.nemes@hycu.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org,
	Daniel Dao <dqminh@cloudflare.com>
Subject: Re: [BUG] soft lockup in filemap_get_read_batch
Date: Wed, 04 Oct 2023 10:36:33 +0200	[thread overview]
Message-ID: <53bb6e7a159cef2942e0e4cd9509847a@hycu.com> (raw)
In-Reply-To: <ZRycfLxGP1CSd/ud@dread.disaster.area>

On Wed, Oct 04, 2023 at 09:58:04AM +1100, Dave Chinner wrote:
> On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> > Hi Matthew,
> > 
> > We have observed intermittent soft lockups on at least seven different hosts:
> > - six hosts ran 6.2.8.fc37-200
> > - one host ran 6.0.13.fc37-200
> > 
> > The list of affected hosts is growing.
> > 
> > Stack traces are all similar:
> > 
> > emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> > warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> > warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> > warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> > warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> > warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> > warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> > warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> > warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> > warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> > warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> > warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> > warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> > warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> > warning kern kernel - - Call Trace:
> > warning kern kernel - -  <TASK>
> > warning kern kernel - -  xas_load+0x3d/0x50
> > warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> > warning kern kernel - -  filemap_get_pages+0xa9/0x690
> > warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > warning kern kernel - -  filemap_read+0xd2/0x340
> > warning kern kernel - -  ? filemap_read+0x32f/0x340
> > warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> > warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> > warning kern kernel - -  vfs_read+0x23c/0x310
> > warning kern kernel - -  ksys_read+0x6b/0xf0
> > warning kern kernel - -  do_syscall_64+0x5b/0x80
> > warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> > warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
> from xa_load()").
> 
> Should already be backported to the lastest stable kernels.

The commit seems to be the same as the patch referenced in 
https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 

We have been running 6.2.8 with this patch, but the soft lockup still ocurred.

From https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/
it looks like there could be a different issue at play (locked folio with null 
mapping)?

  reply	other threads:[~2023-10-04  8:36 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
2023-10-03 22:58 ` Dave Chinner
2023-10-04  8:36   ` Antal Nemes [this message]
2023-10-11 13:20     ` Antal Nemes
2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53bb6e7a159cef2942e0e4cd9509847a@hycu.com \
    --to=antal.nemes@hycu.com \
    --cc=david@fromorbit.com \
    --cc=dqminh@cloudflare.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.