* [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
@ 2024-11-29 15:19 kernel test robot
2024-12-03 2:14 ` Yafang Shao
0 siblings, 1 reply; 8+ messages in thread
From: kernel test robot @ 2024-11-29 15:19 UTC (permalink / raw)
To: Yafang Shao
Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel,
oliver.sang
Hello,
kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
[test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
in testcase: vm-scalability
version: vm-scalability-x86_64-6f4ef16-0_20241103
with following parameters:
runtime: 300s
test: mmap-xread-seq-mt
cpufreq_governor: performance
config: x86_64-rhel-9.4
compiler: gcc-12
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
[ 133.054592][ C1] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [usemem:5463]
[ 133.062611][ C1] Modules linked in: xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal coretemp btrfs blake2b_generic xor kvm_intel raid6_pq libcrc32c kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sd_mod rapl sg intel_cstate ipmi_ssif acpi_power_meter binfmt_misc snd_pcm dax_hmem cxl_acpi snd_timer cxl_port snd ast ahci mei_me cxl_core libahci soundcore drm_shmem_helper ioatdma i2c_i801 intel_uncore einj pcspkr libata megaraid_sas drm_kms_helper mei ipmi_si acpi_ipmi i2c_smbus dca intel_pch_thermal wmi ipmi_devintf ipmi_msghandler joydev drm fuse loop dm_mod ip_tables
[ 133.127927][ C1] CPU: 1 UID: 0 PID: 5463 Comm: usemem Not tainted 6.12.0-rc6-00041-g13da30d6f915 #1
[ 133.137519][ C1] Hardware name: Inspur NF8260M6/NF8260M6, BIOS 06.00.01 04/22/2022
[ 133.145595][ C1] RIP: 0010:memset_orig (arch/x86/lib/memset_64.S:71)
[ 133.150781][ C1] Code: c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 35 0f 1f 44 00 00 48 ff c9 48 89 07 48 89 47 08 48 89 47 10 48 89 47 18 <48> 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d8
All code
========
0: c1 41 89 f9 roll $0xf9,-0x77(%rcx)
4: 41 83 e1 07 and $0x7,%r9d
8: 75 70 jne 0x7a
a: 48 89 d1 mov %rdx,%rcx
d: 48 c1 e9 06 shr $0x6,%rcx
11: 74 35 je 0x48
13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
18: 48 ff c9 dec %rcx
1b: 48 89 07 mov %rax,(%rdi)
1e: 48 89 47 08 mov %rax,0x8(%rdi)
22: 48 89 47 10 mov %rax,0x10(%rdi)
26: 48 89 47 18 mov %rax,0x18(%rdi)
2a:* 48 89 47 20 mov %rax,0x20(%rdi) <-- trapping instruction
2e: 48 89 47 28 mov %rax,0x28(%rdi)
32: 48 89 47 30 mov %rax,0x30(%rdi)
36: 48 89 47 38 mov %rax,0x38(%rdi)
3a: 48 8d 7f 40 lea 0x40(%rdi),%rdi
3e: 75 d8 jne 0x18
Code starting with the faulting instruction
===========================================
0: 48 89 47 20 mov %rax,0x20(%rdi)
4: 48 89 47 28 mov %rax,0x28(%rdi)
8: 48 89 47 30 mov %rax,0x30(%rdi)
c: 48 89 47 38 mov %rax,0x38(%rdi)
10: 48 8d 7f 40 lea 0x40(%rdi),%rdi
14: 75 d8 jne 0xffffffffffffffee
[ 133.170775][ C1] RSP: 0018:ffffc900126efa20 EFLAGS: 00000206
[ 133.177015][ C1] RAX: 0000000000000000 RBX: ffffea00a7c878c0 RCX: 0000000000000030
[ 133.185139][ C1] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88a9f21e33c0
[ 133.193229][ C1] RBP: ffff88a9f21e3000 R08: 0000000000000000 R09: 0000000000000000
[ 133.201373][ C1] R10: ffff88a9f21e3000 R11: 0000000000001000 R12: 0000000000000000
[ 133.209522][ C1] R13: 0000000000000000 R14: 0000000000000000 R15: 00000026b5fdf000
[ 133.217642][ C1] FS: 00007f21a47e86c0(0000) GS:ffff888c0f680000(0000) knlGS:0000000000000000
[ 133.226703][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 133.233410][ C1] CR2: 00005641d476a000 CR3: 0000000c4b6b6003 CR4: 00000000007726f0
[ 133.241514][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 133.249679][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 133.257776][ C1] PKRU: 55555554
[ 133.261446][ C1] Call Trace:
[ 133.264848][ C1] <IRQ>
[ 133.267875][ C1] ? watchdog_timer_fn (kernel/watchdog.c:762)
[ 133.273139][ C1] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:677)
[ 133.278704][ C1] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1691 kernel/time/hrtimer.c:1755)
[ 133.284250][ C1] ? hrtimer_interrupt (kernel/time/hrtimer.c:1820)
[ 133.289443][ C1] ? __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1038 arch/x86/kernel/apic/apic.c:1055)
[ 133.295587][ C1] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
[ 133.301543][ C1] </IRQ>
[ 133.304608][ C1] <TASK>
[ 133.307641][ C1] ? asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
[ 133.313886][ C1] ? memset_orig (arch/x86/lib/memset_64.S:71)
[ 133.318457][ C1] zero_user_segments (include/linux/highmem.h:280)
[ 133.323465][ C1] iomap_readpage_iter (fs/iomap/buffered-io.c:392)
[ 133.328698][ C1] ? xas_load (include/linux/xarray.h:175 include/linux/xarray.h:1264 lib/xarray.c:240)
[ 133.332919][ C1] iomap_readahead (fs/iomap/buffered-io.c:514 fs/iomap/buffered-io.c:550)
[ 133.337765][ C1] read_pages (mm/readahead.c:160)
[ 133.342137][ C1] ? alloc_pages_mpol_noprof (mm/mempolicy.c:2267)
[ 133.347774][ C1] page_cache_ra_unbounded (include/linux/fs.h:882 mm/readahead.c:291)
[ 133.353303][ C1] filemap_fault (mm/filemap.c:3230 mm/filemap.c:3329)
[ 133.357982][ C1] __do_fault (mm/memory.c:4882)
[ 133.362292][ C1] do_read_fault (mm/memory.c:5297)
[ 133.366985][ C1] do_pte_missing (mm/memory.c:5431 mm/memory.c:3965)
[ 133.371754][ C1] __handle_mm_fault (mm/memory.c:5909)
[ 133.376818][ C1] handle_mm_fault (mm/memory.c:6077)
[ 133.381717][ C1] do_user_addr_fault (arch/x86/mm/fault.c:1339)
[ 133.386820][ C1] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
[ 133.391500][ C1] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
[ 133.396396][ C1] RIP: 0033:0x55578aeb9acc
[ 133.400849][ C1] Code: 00 00 e8 b7 f8 ff ff bf 01 00 00 00 e8 0d f9 ff ff 89 c7 e8 6c ff ff ff bf 00 00 00 00 e8 fc f8 ff ff 85 d2 74 08 48 8d 04 f7 <48> 8b 00 c3 48 8d 04 f7 48 89 30 b8 00 00 00 00 c3 41 54 55 53 48
All code
========
0: 00 00 add %al,(%rax)
2: e8 b7 f8 ff ff call 0xfffffffffffff8be
7: bf 01 00 00 00 mov $0x1,%edi
c: e8 0d f9 ff ff call 0xfffffffffffff91e
11: 89 c7 mov %eax,%edi
13: e8 6c ff ff ff call 0xffffffffffffff84
18: bf 00 00 00 00 mov $0x0,%edi
1d: e8 fc f8 ff ff call 0xfffffffffffff91e
22: 85 d2 test %edx,%edx
24: 74 08 je 0x2e
26: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
2d: c3 ret
2e: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
32: 48 89 30 mov %rsi,(%rax)
35: b8 00 00 00 00 mov $0x0,%eax
3a: c3 ret
3b: 41 54 push %r12
3d: 55 push %rbp
3e: 53 push %rbx
3f: 48 rex.W
Code starting with the faulting instruction
===========================================
0: 48 8b 00 mov (%rax),%rax
3: c3 ret
4: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
8: 48 89 30 mov %rsi,(%rax)
b: b8 00 00 00 00 mov $0x0,%eax
10: c3 ret
11: 41 54 push %r12
13: 55 push %rbp
14: 53 push %rbx
15: 48 rex.W
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241129/202411292300.61edbd37-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-11-29 15:19 [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#] kernel test robot
@ 2024-12-03 2:14 ` Yafang Shao
2024-12-03 3:04 ` Oliver Sang
0 siblings, 1 reply; 8+ messages in thread
From: Yafang Shao @ 2024-12-03 2:14 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel
On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> Hello,
>
> kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
>
> commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
>
> in testcase: vm-scalability
> version: vm-scalability-x86_64-6f4ef16-0_20241103
> with following parameters:
>
> runtime: 300s
> test: mmap-xread-seq-mt
> cpufreq_governor: performance
>
>
>
> config: x86_64-rhel-9.4
> compiler: gcc-12
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
>
> (please refer to attached dmesg/kmsg for entire log/backtrace)
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
>
>
> [ 133.054592][ C1] watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [usemem:5463]
> [ 133.062611][ C1] Modules linked in: xfs intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_mbox_msr isst_if_common skx_edac skx_edac_common nfit libnvdimm x86_pkg_temp_thermal coretemp btrfs blake2b_generic xor kvm_intel raid6_pq libcrc32c kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sd_mod rapl sg intel_cstate ipmi_ssif acpi_power_meter binfmt_misc snd_pcm dax_hmem cxl_acpi snd_timer cxl_port snd ast ahci mei_me cxl_core libahci soundcore drm_shmem_helper ioatdma i2c_i801 intel_uncore einj pcspkr libata megaraid_sas drm_kms_helper mei ipmi_si acpi_ipmi i2c_smbus dca intel_pch_thermal wmi ipmi_devintf ipmi_msghandler joydev drm fuse loop dm_mod ip_tables
> [ 133.127927][ C1] CPU: 1 UID: 0 PID: 5463 Comm: usemem Not tainted 6.12.0-rc6-00041-g13da30d6f915 #1
> [ 133.137519][ C1] Hardware name: Inspur NF8260M6/NF8260M6, BIOS 06.00.01 04/22/2022
> [ 133.145595][ C1] RIP: 0010:memset_orig (arch/x86/lib/memset_64.S:71)
> [ 133.150781][ C1] Code: c1 41 89 f9 41 83 e1 07 75 70 48 89 d1 48 c1 e9 06 74 35 0f 1f 44 00 00 48 ff c9 48 89 07 48 89 47 08 48 89 47 10 48 89 47 18 <48> 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d8
> All code
> ========
> 0: c1 41 89 f9 roll $0xf9,-0x77(%rcx)
> 4: 41 83 e1 07 and $0x7,%r9d
> 8: 75 70 jne 0x7a
> a: 48 89 d1 mov %rdx,%rcx
> d: 48 c1 e9 06 shr $0x6,%rcx
> 11: 74 35 je 0x48
> 13: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1)
> 18: 48 ff c9 dec %rcx
> 1b: 48 89 07 mov %rax,(%rdi)
> 1e: 48 89 47 08 mov %rax,0x8(%rdi)
> 22: 48 89 47 10 mov %rax,0x10(%rdi)
> 26: 48 89 47 18 mov %rax,0x18(%rdi)
> 2a:* 48 89 47 20 mov %rax,0x20(%rdi) <-- trapping instruction
> 2e: 48 89 47 28 mov %rax,0x28(%rdi)
> 32: 48 89 47 30 mov %rax,0x30(%rdi)
> 36: 48 89 47 38 mov %rax,0x38(%rdi)
> 3a: 48 8d 7f 40 lea 0x40(%rdi),%rdi
> 3e: 75 d8 jne 0x18
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 89 47 20 mov %rax,0x20(%rdi)
> 4: 48 89 47 28 mov %rax,0x28(%rdi)
> 8: 48 89 47 30 mov %rax,0x30(%rdi)
> c: 48 89 47 38 mov %rax,0x38(%rdi)
> 10: 48 8d 7f 40 lea 0x40(%rdi),%rdi
> 14: 75 d8 jne 0xffffffffffffffee
> [ 133.170775][ C1] RSP: 0018:ffffc900126efa20 EFLAGS: 00000206
> [ 133.177015][ C1] RAX: 0000000000000000 RBX: ffffea00a7c878c0 RCX: 0000000000000030
> [ 133.185139][ C1] RDX: 0000000000001000 RSI: 0000000000000000 RDI: ffff88a9f21e33c0
> [ 133.193229][ C1] RBP: ffff88a9f21e3000 R08: 0000000000000000 R09: 0000000000000000
> [ 133.201373][ C1] R10: ffff88a9f21e3000 R11: 0000000000001000 R12: 0000000000000000
> [ 133.209522][ C1] R13: 0000000000000000 R14: 0000000000000000 R15: 00000026b5fdf000
> [ 133.217642][ C1] FS: 00007f21a47e86c0(0000) GS:ffff888c0f680000(0000) knlGS:0000000000000000
> [ 133.226703][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 133.233410][ C1] CR2: 00005641d476a000 CR3: 0000000c4b6b6003 CR4: 00000000007726f0
> [ 133.241514][ C1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 133.249679][ C1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 133.257776][ C1] PKRU: 55555554
> [ 133.261446][ C1] Call Trace:
> [ 133.264848][ C1] <IRQ>
> [ 133.267875][ C1] ? watchdog_timer_fn (kernel/watchdog.c:762)
> [ 133.273139][ C1] ? __pfx_watchdog_timer_fn (kernel/watchdog.c:677)
> [ 133.278704][ C1] ? __hrtimer_run_queues (kernel/time/hrtimer.c:1691 kernel/time/hrtimer.c:1755)
> [ 133.284250][ C1] ? hrtimer_interrupt (kernel/time/hrtimer.c:1820)
> [ 133.289443][ C1] ? __sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1038 arch/x86/kernel/apic/apic.c:1055)
> [ 133.295587][ C1] ? sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1049 arch/x86/kernel/apic/apic.c:1049)
> [ 133.301543][ C1] </IRQ>
> [ 133.304608][ C1] <TASK>
> [ 133.307641][ C1] ? asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:702)
> [ 133.313886][ C1] ? memset_orig (arch/x86/lib/memset_64.S:71)
> [ 133.318457][ C1] zero_user_segments (include/linux/highmem.h:280)
> [ 133.323465][ C1] iomap_readpage_iter (fs/iomap/buffered-io.c:392)
> [ 133.328698][ C1] ? xas_load (include/linux/xarray.h:175 include/linux/xarray.h:1264 lib/xarray.c:240)
> [ 133.332919][ C1] iomap_readahead (fs/iomap/buffered-io.c:514 fs/iomap/buffered-io.c:550)
> [ 133.337765][ C1] read_pages (mm/readahead.c:160)
> [ 133.342137][ C1] ? alloc_pages_mpol_noprof (mm/mempolicy.c:2267)
> [ 133.347774][ C1] page_cache_ra_unbounded (include/linux/fs.h:882 mm/readahead.c:291)
> [ 133.353303][ C1] filemap_fault (mm/filemap.c:3230 mm/filemap.c:3329)
> [ 133.357982][ C1] __do_fault (mm/memory.c:4882)
> [ 133.362292][ C1] do_read_fault (mm/memory.c:5297)
> [ 133.366985][ C1] do_pte_missing (mm/memory.c:5431 mm/memory.c:3965)
> [ 133.371754][ C1] __handle_mm_fault (mm/memory.c:5909)
> [ 133.376818][ C1] handle_mm_fault (mm/memory.c:6077)
> [ 133.381717][ C1] do_user_addr_fault (arch/x86/mm/fault.c:1339)
> [ 133.386820][ C1] exc_page_fault (arch/x86/include/asm/irqflags.h:37 arch/x86/include/asm/irqflags.h:92 arch/x86/mm/fault.c:1489 arch/x86/mm/fault.c:1539)
> [ 133.391500][ C1] asm_exc_page_fault (arch/x86/include/asm/idtentry.h:623)
> [ 133.396396][ C1] RIP: 0033:0x55578aeb9acc
> [ 133.400849][ C1] Code: 00 00 e8 b7 f8 ff ff bf 01 00 00 00 e8 0d f9 ff ff 89 c7 e8 6c ff ff ff bf 00 00 00 00 e8 fc f8 ff ff 85 d2 74 08 48 8d 04 f7 <48> 8b 00 c3 48 8d 04 f7 48 89 30 b8 00 00 00 00 c3 41 54 55 53 48
> All code
> ========
> 0: 00 00 add %al,(%rax)
> 2: e8 b7 f8 ff ff call 0xfffffffffffff8be
> 7: bf 01 00 00 00 mov $0x1,%edi
> c: e8 0d f9 ff ff call 0xfffffffffffff91e
> 11: 89 c7 mov %eax,%edi
> 13: e8 6c ff ff ff call 0xffffffffffffff84
> 18: bf 00 00 00 00 mov $0x0,%edi
> 1d: e8 fc f8 ff ff call 0xfffffffffffff91e
> 22: 85 d2 test %edx,%edx
> 24: 74 08 je 0x2e
> 26: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
> 2a:* 48 8b 00 mov (%rax),%rax <-- trapping instruction
> 2d: c3 ret
> 2e: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
> 32: 48 89 30 mov %rsi,(%rax)
> 35: b8 00 00 00 00 mov $0x0,%eax
> 3a: c3 ret
> 3b: 41 54 push %r12
> 3d: 55 push %rbp
> 3e: 53 push %rbx
> 3f: 48 rex.W
>
> Code starting with the faulting instruction
> ===========================================
> 0: 48 8b 00 mov (%rax),%rax
> 3: c3 ret
> 4: 48 8d 04 f7 lea (%rdi,%rsi,8),%rax
> 8: 48 89 30 mov %rsi,(%rax)
> b: b8 00 00 00 00 mov $0x0,%eax
> 10: c3 ret
> 11: 41 54 push %r12
> 13: 55 push %rbp
> 14: 53 push %rbx
> 15: 48 rex.W
Is this issue consistently reproducible?
I attempted to reproduce it using the mmap-xread-seq-mt test case but
was unsuccessful.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-03 2:14 ` Yafang Shao
@ 2024-12-03 3:04 ` Oliver Sang
2024-12-03 4:01 ` Yafang Shao
2024-12-03 9:33 ` Yafang Shao
0 siblings, 2 replies; 8+ messages in thread
From: Oliver Sang @ 2024-12-03 3:04 UTC (permalink / raw)
To: Yafang Shao
Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel,
oliver.sang
hi, Yafang,
On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> <oliver.sang@intel.com> wrote:
> >
> >
> >
> > Hello,
> >
> > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> >
> > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >
> > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> >
> > in testcase: vm-scalability
> > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > with following parameters:
> >
> > runtime: 300s
> > test: mmap-xread-seq-mt
> > cpufreq_governor: performance
> >
> >
> >
> > config: x86_64-rhel-9.4
> > compiler: gcc-12
> > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> >
> > (please refer to attached dmesg/kmsg for entire log/backtrace)
> >
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> >
> >
[...]
>
> Is this issue consistently reproducible?
> I attempted to reproduce it using the mmap-xread-seq-mt test case but
> was unsuccessful.
in our tests, the issue is quite persistent. as below, 100% reproduced in all
8 runs, keeps clean on parent.
d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
:8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
to avoid any env issue, we rebuild kernel and rerun more to check. if still
consistently reproduced, we will follow your further requests. thanks
>
> --
> Regards
> Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-03 3:04 ` Oliver Sang
@ 2024-12-03 4:01 ` Yafang Shao
2024-12-03 9:33 ` Yafang Shao
1 sibling, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2024-12-03 4:01 UTC (permalink / raw)
To: Oliver Sang; +Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel
On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yafang,
>
> On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > >
> > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > >
> > > in testcase: vm-scalability
> > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: mmap-xread-seq-mt
> > > cpufreq_governor: performance
> > >
> > >
> > >
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-12
> > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> > >
> > >
>
> [...]
>
> >
> > Is this issue consistently reproducible?
> > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > was unsuccessful.
>
> in our tests, the issue is quite persistent. as below, 100% reproduced in all
> 8 runs, keeps clean on parent.
>
> d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
> :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> to avoid any env issue, we rebuild kernel and rerun more to check. if still
> consistently reproduced, we will follow your further requests. thanks
In your environment, can this issue be reproduced using the following
simple command?
vm-scalability/run -c case-mmap-xread-seq-mt
--
Regards
Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-03 3:04 ` Oliver Sang
2024-12-03 4:01 ` Yafang Shao
@ 2024-12-03 9:33 ` Yafang Shao
2024-12-04 13:38 ` Oliver Sang
2024-12-06 2:27 ` Oliver Sang
1 sibling, 2 replies; 8+ messages in thread
From: Yafang Shao @ 2024-12-03 9:33 UTC (permalink / raw)
To: Oliver Sang; +Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel
On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yafang,
>
> On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > <oliver.sang@intel.com> wrote:
> > >
> > >
> > >
> > > Hello,
> > >
> > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > >
> > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > >
> > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > >
> > > in testcase: vm-scalability
> > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > with following parameters:
> > >
> > > runtime: 300s
> > > test: mmap-xread-seq-mt
> > > cpufreq_governor: performance
> > >
> > >
> > >
> > > config: x86_64-rhel-9.4
> > > compiler: gcc-12
> > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > >
> > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > >
> > >
> > >
> > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > the same patch/commit), kindly add following tags
> > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> > >
> > >
>
> [...]
>
> >
> > Is this issue consistently reproducible?
> > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > was unsuccessful.
>
> in our tests, the issue is quite persistent. as below, 100% reproduced in all
> 8 runs, keeps clean on parent.
>
> d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> ---------------- ---------------------------
> fail:runs %reproduction fail:runs
> | | |
> :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
> :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> to avoid any env issue, we rebuild kernel and rerun more to check. if still
> consistently reproduced, we will follow your further requests. thanks
Although I’ve made extensive attempts, I haven’t been able to
reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE
case, ra->size might be increasing to an unexpectedly large value. If
that’s the case, I believe the issue can be resolved with the
following additional change:
diff --git a/mm/readahead.c b/mm/readahead.c
index 9b8a48e736c6..e30132bc2593 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct
file_ra_state *ra,
return 4 * cur;
if (cur <= max / 2)
return 2 * cur;
- if (cur > max)
- return cur;
return max;
}
@@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
1UL << order);
if (index == expected) {
ra->start += ra->size;
- ra->size = get_next_ra_size(ra, max_pages);
+ /*
+ * For the MADV_HUGEPAGE case, the ra->size might be larger than
+ * the max_pages.
+ */
+ ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
ra->async_size = ra->size;
goto readit;
}
Could you please test this if you can consistently reproduce the bug?
--
Regards
Yafang
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-03 9:33 ` Yafang Shao
@ 2024-12-04 13:38 ` Oliver Sang
2024-12-06 2:27 ` Oliver Sang
1 sibling, 0 replies; 8+ messages in thread
From: Oliver Sang @ 2024-12-04 13:38 UTC (permalink / raw)
To: Yafang Shao
Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel,
oliver.sang
hi, Yafang,
On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote:
> On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Yafang,
> >
> > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > > <oliver.sang@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > > >
> > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > > >
> > > > in testcase: vm-scalability
> > > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > > with following parameters:
> > > >
> > > > runtime: 300s
> > > > test: mmap-xread-seq-mt
> > > > cpufreq_governor: performance
> > > >
> > > >
> > > >
> > > > config: x86_64-rhel-9.4
> > > > compiler: gcc-12
> > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > > >
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > >
> > > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> > > >
> > > >
> >
> > [...]
> >
> > >
> > > Is this issue consistently reproducible?
> > > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > > was unsuccessful.
> >
> > in our tests, the issue is quite persistent. as below, 100% reproduced in all
> > 8 runs, keeps clean on parent.
> >
> > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
> > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
> >
> > to avoid any env issue, we rebuild kernel and rerun more to check. if still
> > consistently reproduced, we will follow your further requests. thanks
>
> Although I’ve made extensive attempts, I haven’t been able to
> reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE
> case, ra->size might be increasing to an unexpectedly large value. If
> that’s the case, I believe the issue can be resolved with the
> following additional change:
sorry that our service runs into some problems these two days and we are
busy fixing them, I cannot address your request quickly.
here is a quick update. we rebuild kernel the rerun tests more, issue seems
still persistent.
d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:20 75% 15:20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
:20 75% 15:20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
in order to remove the possibility of env issues on this machine, we tried same
tests on another Ice Lake platform, still see the similar issues, though the
rate seems a little lower.
d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
:10 50% 5:10 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
:10 50% 5:10 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
we will test your below patch and update you the results. thanks.
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 9b8a48e736c6..e30132bc2593 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct
> file_ra_state *ra,
> return 4 * cur;
> if (cur <= max / 2)
> return 2 * cur;
> - if (cur > max)
> - return cur;
> return max;
> }
>
> @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
> 1UL << order);
> if (index == expected) {
> ra->start += ra->size;
> - ra->size = get_next_ra_size(ra, max_pages);
> + /*
> + * For the MADV_HUGEPAGE case, the ra->size might be larger than
> + * the max_pages.
> + */
> + ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
> ra->async_size = ra->size;
> goto readit;
> }
>
> Could you please test this if you can consistently reproduce the bug?
>
> --
> Regards
> Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-03 9:33 ` Yafang Shao
2024-12-04 13:38 ` Oliver Sang
@ 2024-12-06 2:27 ` Oliver Sang
2024-12-06 5:38 ` Yafang Shao
1 sibling, 1 reply; 8+ messages in thread
From: Oliver Sang @ 2024-12-06 2:27 UTC (permalink / raw)
To: Yafang Shao
Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel,
oliver.sang
hi, Yafang,
On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote:
> On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@intel.com> wrote:
> >
> > hi, Yafang,
> >
> > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > > <oliver.sang@intel.com> wrote:
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > > >
> > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > >
> > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > > >
> > > > in testcase: vm-scalability
> > > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > > with following parameters:
> > > >
> > > > runtime: 300s
> > > > test: mmap-xread-seq-mt
> > > > cpufreq_governor: performance
> > > >
> > > >
> > > >
> > > > config: x86_64-rhel-9.4
> > > > compiler: gcc-12
> > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > > >
> > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > >
> > > >
> > > >
> > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > the same patch/commit), kindly add following tags
> > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> > > >
> > > >
> >
> > [...]
> >
> > >
> > > Is this issue consistently reproducible?
> > > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > > was unsuccessful.
> >
> > in our tests, the issue is quite persistent. as below, 100% reproduced in all
> > 8 runs, keeps clean on parent.
> >
> > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> > ---------------- ---------------------------
> > fail:runs %reproduction fail:runs
> > | | |
> > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
> > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
> >
> > to avoid any env issue, we rebuild kernel and rerun more to check. if still
> > consistently reproduced, we will follow your further requests. thanks
>
> Although I’ve made extensive attempts, I haven’t been able to
> reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE
> case, ra->size might be increasing to an unexpectedly large value. If
> that’s the case, I believe the issue can be resolved with the
> following additional change:
>
> diff --git a/mm/readahead.c b/mm/readahead.c
> index 9b8a48e736c6..e30132bc2593 100644
> --- a/mm/readahead.c
> +++ b/mm/readahead.c
> @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct
> file_ra_state *ra,
> return 4 * cur;
> if (cur <= max / 2)
> return 2 * cur;
> - if (cur > max)
> - return cur;
> return max;
> }
>
> @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
> 1UL << order);
> if (index == expected) {
> ra->start += ra->size;
> - ra->size = get_next_ra_size(ra, max_pages);
> + /*
> + * For the MADV_HUGEPAGE case, the ra->size might be larger than
> + * the max_pages.
> + */
> + ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
> ra->async_size = ra->size;
> goto readit;
> }
>
> Could you please test this if you can consistently reproduce the bug?
by this patch, we confirmed the issue gone on both platforms.
Tested-by: kernel test robot <oliver.sang@intel.com>
below d18114f8dcb33d7ed6216673903 is just your patch
on Cooper Lake in our original report
d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 d18114f8dcb33d7ed6216673903
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:20 75% 15:20 0% :20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
:20 75% 15:20 0% :20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
on another Ice Lake platform
d1aa0c04294e2988 13da30d6f9150dff876f94a3f32 d18114f8dcb33d7ed6216673903
---------------- --------------------------- ---------------------------
fail:runs %reproduction fail:runs %reproduction fail:runs
| | | | |
:10 50% 5:10 0% :20 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
:10 50% 5:10 0% :20 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
>
> --
> Regards
> Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
2024-12-06 2:27 ` Oliver Sang
@ 2024-12-06 5:38 ` Yafang Shao
0 siblings, 0 replies; 8+ messages in thread
From: Yafang Shao @ 2024-12-06 5:38 UTC (permalink / raw)
To: Oliver Sang; +Cc: oe-lkp, lkp, Andrew Morton, Matthew Wilcox, linux-fsdevel
On Fri, Dec 6, 2024 at 10:27 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yafang,
>
> On Tue, Dec 03, 2024 at 05:33:16PM +0800, Yafang Shao wrote:
> > On Tue, Dec 3, 2024 at 11:04 AM Oliver Sang <oliver.sang@intel.com> wrote:
> > >
> > > hi, Yafang,
> > >
> > > On Tue, Dec 03, 2024 at 10:14:50AM +0800, Yafang Shao wrote:
> > > > On Fri, Nov 29, 2024 at 11:19 PM kernel test robot
> > > > <oliver.sang@intel.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > Hello,
> > > > >
> > > > > kernel test robot noticed "BUG:soft_lockup-CPU##stuck_for#s![usemem:#]" on:
> > > > >
> > > > > commit: 13da30d6f9150dff876f94a3f32d555e484ad04f ("mm/readahead: fix large folio support in async readahead")
> > > > > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> > > > >
> > > > > [test failed on linux-next/master cfba9f07a1d6aeca38f47f1f472cfb0ba133d341]
> > > > >
> > > > > in testcase: vm-scalability
> > > > > version: vm-scalability-x86_64-6f4ef16-0_20241103
> > > > > with following parameters:
> > > > >
> > > > > runtime: 300s
> > > > > test: mmap-xread-seq-mt
> > > > > cpufreq_governor: performance
> > > > >
> > > > >
> > > > >
> > > > > config: x86_64-rhel-9.4
> > > > > compiler: gcc-12
> > > > > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > > > >
> > > > > (please refer to attached dmesg/kmsg for entire log/backtrace)
> > > > >
> > > > >
> > > > >
> > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > > > > the same patch/commit), kindly add following tags
> > > > > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > > > > | Closes: https://lore.kernel.org/oe-lkp/202411292300.61edbd37-lkp@intel.com
> > > > >
> > > > >
> > >
> > > [...]
> > >
> > > >
> > > > Is this issue consistently reproducible?
> > > > I attempted to reproduce it using the mmap-xread-seq-mt test case but
> > > > was unsuccessful.
> > >
> > > in our tests, the issue is quite persistent. as below, 100% reproduced in all
> > > 8 runs, keeps clean on parent.
> > >
> > > d1aa0c04294e2988 13da30d6f9150dff876f94a3f32
> > > ---------------- ---------------------------
> > > fail:runs %reproduction fail:runs
> > > | | |
> > > :8 100% 8:8 dmesg.BUG:soft_lockup-CPU##stuck_for#s![usemem:#]
> > > :8 100% 8:8 dmesg.Kernel_panic-not_syncing:softlockup:hung_tasks
> > >
> > > to avoid any env issue, we rebuild kernel and rerun more to check. if still
> > > consistently reproduced, we will follow your further requests. thanks
> >
> > Although I’ve made extensive attempts, I haven’t been able to
> > reproduce the issue. My best guess is that, in the non-MADV_HUGEPAGE
> > case, ra->size might be increasing to an unexpectedly large value. If
> > that’s the case, I believe the issue can be resolved with the
> > following additional change:
> >
> > diff --git a/mm/readahead.c b/mm/readahead.c
> > index 9b8a48e736c6..e30132bc2593 100644
> > --- a/mm/readahead.c
> > +++ b/mm/readahead.c
> > @@ -385,8 +385,6 @@ static unsigned long get_next_ra_size(struct
> > file_ra_state *ra,
> > return 4 * cur;
> > if (cur <= max / 2)
> > return 2 * cur;
> > - if (cur > max)
> > - return cur;
> > return max;
> > }
> >
> > @@ -644,7 +642,11 @@ void page_cache_async_ra(struct readahead_control *ractl,
> > 1UL << order);
> > if (index == expected) {
> > ra->start += ra->size;
> > - ra->size = get_next_ra_size(ra, max_pages);
> > + /*
> > + * For the MADV_HUGEPAGE case, the ra->size might be larger than
> > + * the max_pages.
> > + */
> > + ra->size = max(ra->size, get_next_ra_size(ra, max_pages));
> > ra->async_size = ra->size;
> > goto readit;
> > }
> >
> > Could you please test this if you can consistently reproduce the bug?
>
> by this patch, we confirmed the issue gone on both platforms.
>
> Tested-by: kernel test robot <oliver.sang@intel.com>
Great! Thanks for your work. I'll send a new version.
--
Regards
Yafang
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-12-06 5:39 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-29 15:19 [linux-next:master] [mm/readahead] 13da30d6f9: BUG:soft_lockup-CPU##stuck_for#s![usemem:#] kernel test robot
2024-12-03 2:14 ` Yafang Shao
2024-12-03 3:04 ` Oliver Sang
2024-12-03 4:01 ` Yafang Shao
2024-12-03 9:33 ` Yafang Shao
2024-12-04 13:38 ` Oliver Sang
2024-12-06 2:27 ` Oliver Sang
2024-12-06 5:38 ` Yafang Shao
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox