[BUG] soft lockup in filemap_get_read

All of lore.kernel.org
 help / color / mirror / Atom feed

* [BUG] soft lockup in filemap_get_read_batch
@ 2023-10-03 13:48 antal.nemes
  2023-10-03 22:58 ` Dave Chinner
  2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang
  0 siblings, 2 replies; 5+ messages in thread
From: antal.nemes @ 2023-10-03 13:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-mm, linux-fsdevel, Daniel Dao

Hi Matthew,

We have observed intermittent soft lockups on at least seven different hosts:
- six hosts ran 6.2.8.fc37-200
- one host ran 6.0.13.fc37-200

The list of affected hosts is growing.

Stack traces are all similar:

emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
warning kern kernel - - Call Trace:
warning kern kernel - -  <TASK>
warning kern kernel - -  xas_load+0x3d/0x50
warning kern kernel - -  filemap_get_read_batch+0x179/0x270
warning kern kernel - -  filemap_get_pages+0xa9/0x690
warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
warning kern kernel - -  filemap_read+0xd2/0x340
warning kern kernel - -  ? filemap_read+0x32f/0x340
warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
warning kern kernel - -  vfs_read+0x23c/0x310
warning kern kernel - -  ksys_read+0x6b/0xf0
warning kern kernel - -  do_syscall_64+0x5b/0x80
warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
warning kern kernel - -  ? do_syscall_64+0x67/0x80
warning kern kernel - -  ? do_syscall_64+0x67/0x80
warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
warning kern kernel - - RIP: 0033:0x7ff1e5b20b25
warning kern kernel - - Code: fe ff ff 50 48 8d 3d 0a c9 06 00 e8 25 ee 01 00 0f 1f 44 00 00 f3 0f 1e fa 48 8d 05 f5 4b 2a 00 8b 00 85 c0 75 0f 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 53 c3 66 90 41 54 49 89 d4 55 48 89 f5 53 89
warning kern kernel - - RSP: 002b:00007ffe1a5d8d78 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
warning kern kernel - - RAX: ffffffffffffffda RBX: 00000000035345c0 RCX: 00007ff1e5b20b25
warning kern kernel - - RDX: 0000000000002000 RSI: 00007ff1dc9c3080 RDI: 0000000000000032
warning kern kernel - - RBP: 0000000000000000 R08: 0000000000000009 R09: 0000000000000000
warning kern kernel - - R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000002000
warning kern kernel - - R13: 00007ff1dc9c3080 R14: 0000000000000000 R15: 0000000001452148
warning kern kernel - -  </TASK>

Lockup is always reported from postgres process with all data and config on a XFS filesystem. 
Because this blocks a postgres process, lockup has a bunch of knock-on effects 
(invalid page errors, hanged or aborted transactions, tuple accumulation, etc). 
All occurrences eventually required a reboot to remedy.

Issue coincided with our rollout with the 6.x kernel. Previously we ran Rocky 
Linux 8 with 4.18.* (clone of RHEL8 kernel), so I recognize that this issue may 
not be new (AFAICT, livelocks were sporadically reported since folio merge in 5.17).

Issue takes anywhere from 2 days to 30+ days since boot to materialize, and lockups
are reported for duration ranging from 1min to 7 hours (the latter until it was 
manually rebooted). This is followed by a  period of relatively high load averages
(~2*#cpus), but low CPU usage. Memory usage was < 70%, so it does not appear 
to be a high-psi condition.

We are unable to reproduce the issue at will (i.e. by load/stress testing), but
the affected hosts have had multiple occurrences across reboots, so we should
be able to observe effects of any patches over a longer span.

From what I can tell, this appears to be similar to what was reported in
https://lore.kernel.org/linux-kernel/CA+wXwBS7YTHUmxGP3JrhcKMnYQJcd6=7HE+E1v-guk01L2K3Zw@mail.gmail.com/
and 
https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/

> > We also have a deadlock reading a very specific file on this host. We managed to
> > do a kdump on this host and extracted out the state of the mapping.
>
> This is almost certainly a different bug, but alos XArray related, so
> I'll keep looking at this one.

I am not sure if the deadlock that Daniel observed matches our stack trace. 
Assuming yes, has there been any follow-up on this?

We tried the patch from https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 , but the
soft lockup reoccurred with the same signature.

Is there anything we can do to further aid in troubleshooting? If this is a folio
lock issue, would it be possible to trace where the lock was taken?

Best regards,
Antal


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
@ 2023-10-03 22:58 ` Dave Chinner
  2023-10-04  8:36   ` Antal Nemes
  2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang
  1 sibling, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2023-10-03 22:58 UTC (permalink / raw)
  To: antal.nemes; +Cc: Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> Hi Matthew,
> 
> We have observed intermittent soft lockups on at least seven different hosts:
> - six hosts ran 6.2.8.fc37-200
> - one host ran 6.0.13.fc37-200
> 
> The list of affected hosts is growing.
> 
> Stack traces are all similar:
> 
> emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> warning kern kernel - - Call Trace:
> warning kern kernel - -  <TASK>
> warning kern kernel - -  xas_load+0x3d/0x50
> warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> warning kern kernel - -  filemap_get_pages+0xa9/0x690
> warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> warning kern kernel - -  filemap_read+0xd2/0x340
> warning kern kernel - -  ? filemap_read+0x32f/0x340
> warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> warning kern kernel - -  vfs_read+0x23c/0x310
> warning kern kernel - -  ksys_read+0x6b/0xf0
> warning kern kernel - -  do_syscall_64+0x5b/0x80
> warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> warning kern kernel - -  ? do_syscall_64+0x67/0x80
> warning kern kernel - -  ? do_syscall_64+0x67/0x80
> warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
from xa_load()").

Should already be backported to the lastest stable kernels.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-03 22:58 ` Dave Chinner
@ 2023-10-04  8:36   ` Antal Nemes
  2023-10-11 13:20     ` Antal Nemes
  0 siblings, 1 reply; 5+ messages in thread
From: Antal Nemes @ 2023-10-04  8:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Wed, Oct 04, 2023 at 09:58:04AM +1100, Dave Chinner wrote:
> On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> > Hi Matthew,
> > 
> > We have observed intermittent soft lockups on at least seven different hosts:
> > - six hosts ran 6.2.8.fc37-200
> > - one host ran 6.0.13.fc37-200
> > 
> > The list of affected hosts is growing.
> > 
> > Stack traces are all similar:
> > 
> > emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> > warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> > warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> > warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> > warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> > warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> > warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> > warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> > warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> > warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> > warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> > warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> > warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> > warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> > warning kern kernel - - Call Trace:
> > warning kern kernel - -  <TASK>
> > warning kern kernel - -  xas_load+0x3d/0x50
> > warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> > warning kern kernel - -  filemap_get_pages+0xa9/0x690
> > warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > warning kern kernel - -  filemap_read+0xd2/0x340
> > warning kern kernel - -  ? filemap_read+0x32f/0x340
> > warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> > warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> > warning kern kernel - -  vfs_read+0x23c/0x310
> > warning kern kernel - -  ksys_read+0x6b/0xf0
> > warning kern kernel - -  do_syscall_64+0x5b/0x80
> > warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> > warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> 
> Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
> from xa_load()").
> 
> Should already be backported to the lastest stable kernels.

The commit seems to be the same as the patch referenced in 
https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 

We have been running 6.2.8 with this patch, but the soft lockup still ocurred.

From https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/
it looks like there could be a different issue at play (locked folio with null 
mapping)?

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [BUG] soft lockup in filemap_get_read_batch
  2023-10-04  8:36   ` Antal Nemes
@ 2023-10-11 13:20     ` Antal Nemes
  0 siblings, 0 replies; 5+ messages in thread
From: Antal Nemes @ 2023-10-11 13:20 UTC (permalink / raw)
  To: Antal Nemes
  Cc: Dave Chinner, Matthew Wilcox, linux-mm, linux-fsdevel, Daniel Dao

On Wed, Oct 04, 2023 at 10:36:33AM +0200, Antal Nemes wrote:
> On Wed, Oct 04, 2023 at 09:58:04AM +1100, Dave Chinner wrote:
> > On Tue, Oct 03, 2023 at 03:48:14PM +0200, antal.nemes@hycu.com wrote:
> > > Hi Matthew,
> > > 
> > > We have observed intermittent soft lockups on at least seven different hosts:
> > > - six hosts ran 6.2.8.fc37-200
> > > - one host ran 6.0.13.fc37-200
> > > 
> > > The list of affected hosts is growing.
> > > 
> > > Stack traces are all similar:
> > > 
> > > emerg kern kernel - - watchdog: BUG: soft lockup - CPU#7 stuck for 17117s! [postmaster:2238460]
> > > warning kern kernel - - Modules linked in: target_core_user uio target_core_pscsi target_core_file target_core_iblock nbd loop nls_utf8 cifs cifs_arc4 cifs_md4 dns_resolver fscache netfs veth iscsi_tcp libiscsi_tcp libiscsi iscsi_target_mod target_core_mod scsi_transport_iscsi nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink sunrpc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua bochs drm_vram_helper drm_ttm_helper ttm crct10dif_pclmul i2c_piix4 crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 virtio_balloon joydev pcspkr xfs crc32c_intel virtio_net serio_raw ata_generic net_failover failover virtio_scsi pata_acpi qemu_fw_cfg fuse [last unloaded: nbd]
> > > warning kern kernel - - CPU: 7 PID: 2238460 Comm: postmaster Kdump: loaded Tainted: G             L     6.2.8-200.fc37.x86_64 #1
> > > warning kern kernel - - Hardware name: Nutanix AHV, BIOS 1.11.0-2.el7 04/01/2014
> > > warning kern kernel - - RIP: 0010:xas_descend+0x28/0x70
> > > warning kern kernel - - Code: 90 90 0f b6 0e 48 8b 57 08 48 d3 ea 83 e2 3f 89 d0 48 83 c0 04 48 8b 44 c6 08 48 89 77 18 48 89 c1 83 e1 03 48 83 f9 02 75 08 <48> 3d fd 00 00 00 76 08 88 57 12 c3 cc cc cc cc 48 c1 e8 02 89 c2
> > > warning kern kernel - - RSP: 0018:ffffab66c9f4bb98 EFLAGS: 00000246
> > > warning kern kernel - - RAX: 00000000000000c2 RBX: ffffab66c9f4bbb8 RCX: 0000000000000002
> > > warning kern kernel - - RDX: 0000000000000032 RSI: ffff89cd6c8cd6d0 RDI: ffffab66c9f4bbb8
> > > warning kern kernel - - RBP: ffff89cd6c8cd6d0 R08: ffffab66c9f4be20 R09: 0000000000000000
> > > warning kern kernel - - R10: 0000000000000001 R11: 0000000000000100 R12: 00000000000000b3
> > > warning kern kernel - - R13: 00000000000000b2 R14: 00000000000000b2 R15: ffffab66c9f4be48
> > > warning kern kernel - - FS:  00007ff1e8bfb540(0000) GS:ffff89d35fbc0000(0000) knlGS:0000000000000000
> > > warning kern kernel - - CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > warning kern kernel - - CR2: 00007ff1e8af0768 CR3: 000000016fdde001 CR4: 00000000003706e0
> > > warning kern kernel - - Call Trace:
> > > warning kern kernel - -  <TASK>
> > > warning kern kernel - -  xas_load+0x3d/0x50
> > > warning kern kernel - -  filemap_get_read_batch+0x179/0x270
> > > warning kern kernel - -  filemap_get_pages+0xa9/0x690
> > > warning kern kernel - -  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> > > warning kern kernel - -  filemap_read+0xd2/0x340
> > > warning kern kernel - -  ? filemap_read+0x32f/0x340
> > > warning kern kernel - -  xfs_file_buffered_read+0x4f/0xd0 [xfs]
> > > warning kern kernel - -  xfs_file_read_iter+0x70/0xe0 [xfs]
> > > warning kern kernel - -  vfs_read+0x23c/0x310
> > > warning kern kernel - -  ksys_read+0x6b/0xf0
> > > warning kern kernel - -  do_syscall_64+0x5b/0x80
> > > warning kern kernel - -  ? syscall_exit_to_user_mode+0x17/0x40
> > > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > > warning kern kernel - -  ? do_syscall_64+0x67/0x80
> > > warning kern kernel - -  ? __irq_exit_rcu+0x3d/0x140
> > > warning kern kernel - -  entry_SYSCALL_64_after_hwframe+0x72/0xdc
> > 
> > Fixed by commit cbc02854331e ("XArray: Do not return sibling entries
> > from xa_load()").
> > 
> > Should already be backported to the lastest stable kernels.
> 
> The commit seems to be the same as the patch referenced in 
> https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 
> 
> We have been running 6.2.8 with this patch, but the soft lockup still ocurred.
> 
> >From https://lore.kernel.org/linux-fsdevel/CA+wXwBRGab3UqbLqsr8xG=ZL2u9bgyDNNea4RGfTDjqB=J3geQ@mail.gmail.com/
> it looks like there could be a different issue at play (locked folio with null 
> mapping)?
>

Daniel successfully worked around this issue by reverting 
6795801366da0cd3d99e27c37f020a8f16714886 (xfs: Support large folios).

We will follow suit for the time being.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration
  2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
  2023-10-03 22:58 ` Dave Chinner
@ 2024-04-16  9:31 ` zhaoyang.huang
  1 sibling, 0 replies; 5+ messages in thread
From: zhaoyang.huang @ 2024-04-16  9:31 UTC (permalink / raw)
  To: antal.nemes; +Cc: dqminh, linux-fsdevel, linux-mm, steve.kang, huangzhaoyang

From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

Livelock in [1] is reported multitimes since v515, where the zero-ref
folio is repeatly found on the page cache by find_get_entry. A possible
timing sequence is proposed in [2], which can be described briefly as
the lockless xarray operation could get harmed by an illegal folio
remaining on the slot[offset]. This commit would like to protect
the xa split stuff(folio_ref_freeze and __split_huge_page) under
lruvec->lock to remove the race window.

[1]
[167789.800297] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[167726.780305] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167726.780319] (detected by 3, t=17256977 jiffies, g=19883597, q=2397394)
[167726.780325] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800308] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P155
[167789.800322] (detected by 3, t=17272732 jiffies, g=19883597, q=2397470)
[167789.800328] task:kswapd0         state:R  running task     stack:   24 pid:  155 ppid:     2 flags:0x00000008
[167789.800339] Call trace:
[167789.800342]  dump_backtrace.cfi_jt+0x0/0x8
[167789.800355]  show_stack+0x1c/0x2c
[167789.800363]  sched_show_task+0x1ac/0x27c
[167789.800370]  print_other_cpu_stall+0x314/0x4dc
[167789.800377]  check_cpu_stall+0x1c4/0x36c
[167789.800382]  rcu_sched_clock_irq+0xe8/0x388
[167789.800389]  update_process_times+0xa0/0xe0
[167789.800396]  tick_sched_timer+0x7c/0xd4
[167789.800404]  __run_hrtimer+0xd8/0x30c
[167789.800408]  hrtimer_interrupt+0x1e4/0x2d0
[167789.800414]  arch_timer_handler_phys+0x5c/0xa0
[167789.800423]  handle_percpu_devid_irq+0xbc/0x318
[167789.800430]  handle_domain_irq+0x7c/0xf0
[167789.800437]  gic_handle_irq+0x54/0x12c
[167789.800445]  call_on_irq_stack+0x40/0x70
[167789.800451]  do_interrupt_handler+0x44/0xa0
[167789.800457]  el1_interrupt+0x34/0x64
[167789.800464]  el1h_64_irq_handler+0x1c/0x2c
[167789.800470]  el1h_64_irq+0x7c/0x80
[167789.800474]  xas_find+0xb4/0x28c
[167789.800481]  find_get_entry+0x3c/0x178
[167789.800487]  find_lock_entries+0x98/0x2f8
[167789.800492]  __invalidate_mapping_pages.llvm.3657204692649320853+0xc8/0x224
[167789.800500]  invalidate_mapping_pages+0x18/0x28
[167789.800506]  inode_lru_isolate+0x140/0x2a4
[167789.800512]  __list_lru_walk_one+0xd8/0x204
[167789.800519]  list_lru_walk_one+0x64/0x90
[167789.800524]  prune_icache_sb+0x54/0xe0
[167789.800529]  super_cache_scan+0x160/0x1ec
[167789.800535]  do_shrink_slab+0x20c/0x5c0
[167789.800541]  shrink_slab+0xf0/0x20c
[167789.800546]  shrink_node_memcgs+0x98/0x320
[167789.800553]  shrink_node+0xe8/0x45c
[167789.800557]  balance_pgdat+0x464/0x814
[167789.800563]  kswapd+0xfc/0x23c
[167789.800567]  kthread+0x164/0x1c8
[167789.800573]  ret_from_fork+0x10/0x20

[2]
Thread_isolate:
1. alloc_contig_range->isolate_migratepages_block isolate a certain of
pages to cc->migratepages via pfn
       (folio has refcount: 1 + n (alloc_pages, page_cache))

2. alloc_contig_range->migrate_pages->folio_ref_freeze(folio, 1 +
extra_pins) set the folio->refcnt to 0

3. alloc_contig_range->migrate_pages->xas_split split the folios to
each slot as folio from slot[offset] to slot[offset + sibs]

4. alloc_contig_range->migrate_pages->__split_huge_page->folio_lruvec_lock
failed which have the folio be failed in setting refcnt to 2

5. Thread_kswapd enter the livelock by the chain below
      rcu_read_lock();
   retry:
        find_get_entry
            folio = xas_find
            if(!folio_try_get_rcu)
                xas_reset;
            goto retry;
      rcu_read_unlock();

5'. Thread_holdlock as the lruvec->lru_lock holder could be stalled in
the same core of Thread_kswapd.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 mm/huge_memory.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9859aa4f7553..418e8d03480a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2891,7 +2891,7 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 {
 	struct folio *folio = page_folio(page);
 	struct page *head = &folio->page;
-	struct lruvec *lruvec;
+	struct lruvec *lruvec = folio_lruvec(folio);
 	struct address_space *swap_cache = NULL;
 	unsigned long offset = 0;
 	int i, nr_dropped = 0;
@@ -2908,8 +2908,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		xa_lock(&swap_cache->i_pages);
 	}
 
-	/* lock lru list/PageCompound, ref frozen by page_ref_freeze */
-	lruvec = folio_lruvec_lock(folio);
 
 	ClearPageHasHWPoisoned(head);
 
@@ -2942,7 +2940,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 
 		folio_set_order(new_folio, new_order);
 	}
-	unlock_page_lruvec(lruvec);
 	/* Caller disabled irqs, so they are still disabled here */
 
 	split_page_owner(head, order, new_order);
@@ -2961,7 +2958,6 @@ static void __split_huge_page(struct page *page, struct list_head *list,
 		folio_ref_add(folio, 1 + new_nr);
 		xa_unlock(&folio->mapping->i_pages);
 	}
-	local_irq_enable();
 
 	if (nr_dropped)
 		shmem_uncharge(folio->mapping->host, nr_dropped);
@@ -3048,6 +3044,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 	int extra_pins, ret;
 	pgoff_t end;
 	bool is_hzp;
+	struct lruvec *lruvec;
 
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_large(folio), folio);
@@ -3159,6 +3156,14 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 
 	/* block interrupt reentry in xa_lock and spinlock */
 	local_irq_disable();
+
+	/*
+	 * take lruvec's lock before freeze the folio to prevent the folio
+	 * remains in the page cache with refcnt == 0, which could lead to
+	 * find_get_entry enters livelock by iterating the xarray.
+	 */
+	lruvec = folio_lruvec_lock(folio);
+
 	if (mapping) {
 		/*
 		 * Check if the folio is present in page cache.
@@ -3203,12 +3208,16 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
 		}
 
 		__split_huge_page(page, list, end, new_order);
+		unlock_page_lruvec(lruvec);
+		local_irq_enable();
 		ret = 0;
 	} else {
 		spin_unlock(&ds_queue->split_queue_lock);
 fail:
 		if (mapping)
 			xas_unlock(&xas);
+
+		unlock_page_lruvec(lruvec);
 		local_irq_enable();
 		remap_page(folio, folio_nr_pages(folio));
 		ret = -EAGAIN;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-16  9:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-03 13:48 [BUG] soft lockup in filemap_get_read_batch antal.nemes
2023-10-03 22:58 ` Dave Chinner
2023-10-04  8:36   ` Antal Nemes
2023-10-11 13:20     ` Antal Nemes
2024-04-16  9:31 ` [PATCH 1/1] mm: protect xa split stuff under lruvec->lru_lock during migration zhaoyang.huang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.