6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
@ 2024-09-24 22:28 Mikhail Gavrilov
  2024-10-02 17:34 ` Mikhail Gavrilov
  0 siblings, 1 reply; 8+ messages in thread
From: Mikhail Gavrilov @ 2024-09-24 22:28 UTC (permalink / raw)
  To: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 7384 bytes --]

Hi,
I am testing kernel snapshots on Fedora Rawhide and Today with build
on commit de5cb0dcb74c I saw for the first time "KASAN:
slab-use-after-free in m_next+0x13b".
Unfortunately it is not clear what triggered this problem because it
happened after 21 hour uptime.

Full trace looks like:
input: Noble FoKus Mystique (AVRCP) as /devices/virtual/input/input26
==================================================================
BUG: KASAN: slab-use-after-free in m_next+0x13b/0x170
Read of size 8 at addr ffff8885609b40f0 by task htop/3847

CPU: 14 UID: 1000 PID: 3847 Comm: htop Tainted: G        W    L
-------  ---  6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64+debug
#1
Tainted: [W]=WARN, [L]=SOFTLOCKUP
Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
BIOS 3040 09/12/2024
Call Trace:
 <TASK>
 dump_stack_lvl+0x84/0xd0
 ? m_next+0x13b/0x170
 print_report+0x174/0x505
 ? m_next+0x13b/0x170
 ? __virt_addr_valid+0x231/0x420
 ? m_next+0x13b/0x170
 kasan_report+0xab/0x180
 ? m_next+0x13b/0x170
 m_next+0x13b/0x170
 seq_read_iter+0x8e5/0x1130
 seq_read+0x2b4/0x3c0
 ? __pfx_seq_read+0x10/0x10
 ? inode_security+0x54/0xf0
 ? rw_verify_area+0x3b2/0x5e0
 vfs_read+0x165/0xa20
 ? __pfx_vfs_read+0x10/0x10
 ? ktime_get_coarse_real_ts64+0x41/0xd0
 ? local_clock_noinstr+0xd/0x100
 ? __pfx_lock_release+0x10/0x10
 ksys_read+0xfb/0x1d0
 ? __pfx_ksys_read+0x10/0x10
 ? ktime_get_coarse_real_ts64+0x41/0xd0
 do_syscall_64+0x97/0x190
 ? __lock_acquire+0xdcd/0x62c0
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? __pfx___lock_acquire+0x10/0x10
 ? audit_filter_inodes.part.0+0x12d/0x220
 ? local_clock_noinstr+0xd/0x100
 ? __pfx_lock_release+0x10/0x10
 ? rcu_is_watching+0x12/0xc0
 ? kfree+0x27c/0x4d0
 ? audit_reset_context+0x8c5/0xee0
 ? lockdep_hardirqs_on_prepare+0x171/0x400
 ? do_syscall_64+0xa3/0x190
 ? lockdep_hardirqs_on+0x7c/0x100
 ? do_syscall_64+0xa3/0x190
 ? do_syscall_64+0xa3/0x190
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f4190dcac36
Code: 89 df e8 2d c1 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 15
83 e2 39 83 fa 08 75 0d e8 32 ff ff ff 66 90 48 8b 45 10 0f 05 <48> 8b
5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
RSP: 002b:00007ffcde82b690 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f4190ce3740 RCX: 00007f4190dcac36
RDX: 0000000000000400 RSI: 000055bf5e823a20 RDI: 0000000000000005
RBP: 00007ffcde82b6a0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f4190f44fd0
R13: 00007f4190f44e80 R14: 000055bf5e823e20 R15: 000055bf5ecc9160
 </TASK>

Allocated by task 176289:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 __kasan_slab_alloc+0x6e/0x70
 kmem_cache_alloc_noprof+0x15a/0x3d0
 vm_area_dup+0x23/0x190
 __split_vma+0x137/0xd40
 vms_gather_munmap_vmas+0x29d/0xfc0
 mmap_region+0x35a/0x1f50
 do_mmap+0x8e7/0x1020
 vm_mmap_pgoff+0x178/0x2f0
 __do_fast_syscall_32+0x86/0x110
 do_fast_syscall_32+0x32/0x80
 sysret32_from_system_call+0x0/0x4a

Freed by task 0:
 kasan_save_stack+0x30/0x50
 kasan_save_track+0x14/0x30
 kasan_save_free_info+0x3b/0x70
 __kasan_slab_free+0x37/0x50
 kmem_cache_free+0x1a7/0x5a0
 rcu_do_batch+0x3fd/0x1120
 rcu_core+0x636/0x9b0
 handle_softirqs+0x1e9/0x8d0
 __irq_exit_rcu+0xbb/0x1c0
 irq_exit_rcu+0xe/0x30
 sysvec_apic_timer_interrupt+0xa1/0xd0
 asm_sysvec_apic_timer_interrupt+0x1a/0x20

Last potentially related work creation:
 kasan_save_stack+0x30/0x50
 __kasan_record_aux_stack+0x8e/0xa0
 __call_rcu_common.constprop.0+0xf4/0x10d0
 vma_complete+0x720/0x10b0
 commit_merge+0x42a/0x1310
 vma_expand+0x313/0xad0
 vma_merge_new_range+0x2cd/0xec0
 mmap_region+0x432/0x1f50
 do_mmap+0x8e7/0x1020
 vm_mmap_pgoff+0x178/0x2f0
 __do_fast_syscall_32+0x86/0x110
 do_fast_syscall_32+0x32/0x80
 sysret32_from_system_call+0x0/0x4a

The buggy address belongs to the object at ffff8885609b40f0
 which belongs to the cache vm_area_struct of size 176
The buggy address is located 0 bytes inside of
 freed 176-byte region [ffff8885609b40f0, ffff8885609b41a0)

The buggy address belongs to the physical page:
page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5609b4
head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
memcg:ffff88814d36d001
flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
page_type: f5(slab)
raw: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
raw: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
head: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
head: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
head: 0017ffffc0000001 ffffea0015826d01 ffffffffffffffff 0000000000000000
head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff8885609b3f80: 00 00 00 00 00 00 00 00 00 00 00 00task_mmu 00 00 00 00
 ffff8885609b4000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff8885609b4080: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fa fb
                                                             ^
 ffff8885609b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff8885609b4180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00
==================================================================
Disabling lock debugging due to kernel taint

> sh /usr/src/kernels/(uname -r)/scripts/faddr2line /lib/debug/lib/modules/(uname -r)/vmlinux m_next+0x13b
m_next+0x13b/0x170:
proc_get_vma at fs/proc/task_mmu.c:136
(inlined by) m_next at fs/proc/task_mmu.c:187

> cat -n /usr/src/debug/kernel-6.11-8833-gde5cb0dcb74c/linux-6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64/fs/proc/task_mmu.c | sed -n '182,192 p'
   182 {
   183 if (*ppos == -2UL) {
   184 *ppos = -1UL;
   185 return NULL;
   186 }
   187 return proc_get_vma(m->private, ppos);
   188 }
   189
   190 static void m_stop(struct seq_file *m, void *v)
   191 {
   192 struct proc_maps_private *priv = m->private;

> git blame fs/proc/task_mmu.c -L 182,192
Blaming lines: 100% (11/11), done.
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 182) {
c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 183)
 if (*ppos == -2UL) {
c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 184)
         *ppos = -1UL;
c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 185)
         return NULL;
c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 186)   }
c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 187)
 return proc_get_vma(m->private, ppos);
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 188) }
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 189)
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 190)
static void m_stop(struct seq_file *m, void *v)
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 191) {
a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 192)
 struct proc_maps_private *priv = m->private;

Hmm this line hasn't changed for two years.

Machine spec: https://linux-hardware.org/?probe=323b76ce48
I attached below full kernel log and build config.

Can anyone figure out what happened or should we wait for the second
manifestation of this issue?

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: 6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42-BUG-KASAN-slab-use-after-free-in-m_next.zip --]
[-- Type: application/zip, Size: 90276 bytes --]

[-- Attachment #3: .config.zip --]
[-- Type: application/zip, Size: 67403 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-09-24 22:28 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187 Mikhail Gavrilov
@ 2024-10-02 17:34 ` Mikhail Gavrilov
  2024-10-02 17:55   ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Mikhail Gavrilov @ 2024-10-02 17:34 UTC (permalink / raw)
  To: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, lorenzo.stoakes, Andrew Morton,
	Linux Memory Management List

On Wed, Sep 25, 2024 at 3:28 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> Hi,
> I am testing kernel snapshots on Fedora Rawhide and Today with build
> on commit de5cb0dcb74c I saw for the first time "KASAN:
> slab-use-after-free in m_next+0x13b".
> Unfortunately it is not clear what triggered this problem because it
> happened after 21 hour uptime.
>
> Full trace looks like:
> input: Noble FoKus Mystique (AVRCP) as /devices/virtual/input/input26
> ==================================================================
> BUG: KASAN: slab-use-after-free in m_next+0x13b/0x170
> Read of size 8 at addr ffff8885609b40f0 by task htop/3847
>
> CPU: 14 UID: 1000 PID: 3847 Comm: htop Tainted: G        W    L
> -------  ---  6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64+debug
> #1
> Tainted: [W]=WARN, [L]=SOFTLOCKUP
> Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
> BIOS 3040 09/12/2024
> Call Trace:
>  <TASK>
>  dump_stack_lvl+0x84/0xd0
>  ? m_next+0x13b/0x170
>  print_report+0x174/0x505
>  ? m_next+0x13b/0x170
>  ? __virt_addr_valid+0x231/0x420
>  ? m_next+0x13b/0x170
>  kasan_report+0xab/0x180
>  ? m_next+0x13b/0x170
>  m_next+0x13b/0x170
>  seq_read_iter+0x8e5/0x1130
>  seq_read+0x2b4/0x3c0
>  ? __pfx_seq_read+0x10/0x10
>  ? inode_security+0x54/0xf0
>  ? rw_verify_area+0x3b2/0x5e0
>  vfs_read+0x165/0xa20
>  ? __pfx_vfs_read+0x10/0x10
>  ? ktime_get_coarse_real_ts64+0x41/0xd0
>  ? local_clock_noinstr+0xd/0x100
>  ? __pfx_lock_release+0x10/0x10
>  ksys_read+0xfb/0x1d0
>  ? __pfx_ksys_read+0x10/0x10
>  ? ktime_get_coarse_real_ts64+0x41/0xd0
>  do_syscall_64+0x97/0x190
>  ? __lock_acquire+0xdcd/0x62c0
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  ? __pfx___lock_acquire+0x10/0x10
>  ? audit_filter_inodes.part.0+0x12d/0x220
>  ? local_clock_noinstr+0xd/0x100
>  ? __pfx_lock_release+0x10/0x10
>  ? rcu_is_watching+0x12/0xc0
>  ? kfree+0x27c/0x4d0
>  ? audit_reset_context+0x8c5/0xee0
>  ? lockdep_hardirqs_on_prepare+0x171/0x400
>  ? do_syscall_64+0xa3/0x190
>  ? lockdep_hardirqs_on+0x7c/0x100
>  ? do_syscall_64+0xa3/0x190
>  ? do_syscall_64+0xa3/0x190
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x7f4190dcac36
> Code: 89 df e8 2d c1 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 15
> 83 e2 39 83 fa 08 75 0d e8 32 ff ff ff 66 90 48 8b 45 10 0f 05 <48> 8b
> 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
> RSP: 002b:00007ffcde82b690 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
> RAX: ffffffffffffffda RBX: 00007f4190ce3740 RCX: 00007f4190dcac36
> RDX: 0000000000000400 RSI: 000055bf5e823a20 RDI: 0000000000000005
> RBP: 00007ffcde82b6a0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000202 R12: 00007f4190f44fd0
> R13: 00007f4190f44e80 R14: 000055bf5e823e20 R15: 000055bf5ecc9160
>  </TASK>
>
> Allocated by task 176289:
>  kasan_save_stack+0x30/0x50
>  kasan_save_track+0x14/0x30
>  __kasan_slab_alloc+0x6e/0x70
>  kmem_cache_alloc_noprof+0x15a/0x3d0
>  vm_area_dup+0x23/0x190
>  __split_vma+0x137/0xd40
>  vms_gather_munmap_vmas+0x29d/0xfc0
>  mmap_region+0x35a/0x1f50
>  do_mmap+0x8e7/0x1020
>  vm_mmap_pgoff+0x178/0x2f0
>  __do_fast_syscall_32+0x86/0x110
>  do_fast_syscall_32+0x32/0x80
>  sysret32_from_system_call+0x0/0x4a
>
> Freed by task 0:
>  kasan_save_stack+0x30/0x50
>  kasan_save_track+0x14/0x30
>  kasan_save_free_info+0x3b/0x70
>  __kasan_slab_free+0x37/0x50
>  kmem_cache_free+0x1a7/0x5a0
>  rcu_do_batch+0x3fd/0x1120
>  rcu_core+0x636/0x9b0
>  handle_softirqs+0x1e9/0x8d0
>  __irq_exit_rcu+0xbb/0x1c0
>  irq_exit_rcu+0xe/0x30
>  sysvec_apic_timer_interrupt+0xa1/0xd0
>  asm_sysvec_apic_timer_interrupt+0x1a/0x20
>
> Last potentially related work creation:
>  kasan_save_stack+0x30/0x50
>  __kasan_record_aux_stack+0x8e/0xa0
>  __call_rcu_common.constprop.0+0xf4/0x10d0
>  vma_complete+0x720/0x10b0
>  commit_merge+0x42a/0x1310
>  vma_expand+0x313/0xad0
>  vma_merge_new_range+0x2cd/0xec0
>  mmap_region+0x432/0x1f50
>  do_mmap+0x8e7/0x1020
>  vm_mmap_pgoff+0x178/0x2f0
>  __do_fast_syscall_32+0x86/0x110
>  do_fast_syscall_32+0x32/0x80
>  sysret32_from_system_call+0x0/0x4a
>
> The buggy address belongs to the object at ffff8885609b40f0
>  which belongs to the cache vm_area_struct of size 176
> The buggy address is located 0 bytes inside of
>  freed 176-byte region [ffff8885609b40f0, ffff8885609b41a0)
>
> The buggy address belongs to the physical page:
> page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5609b4
> head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> memcg:ffff88814d36d001
> flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> page_type: f5(slab)
> raw: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> raw: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> head: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> head: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> head: 0017ffffc0000001 ffffea0015826d01 ffffffffffffffff 0000000000000000
> head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
> page dumped because: kasan: bad access detected
>
> Memory state around the buggy address:
>  ffff8885609b3f80: 00 00 00 00 00 00 00 00 00 00 00 00task_mmu 00 00 00 00
>  ffff8885609b4000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >ffff8885609b4080: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fa fb
>                                                              ^
>  ffff8885609b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>  ffff8885609b4180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00
> ==================================================================
> Disabling lock debugging due to kernel taint
>
> > sh /usr/src/kernels/(uname -r)/scripts/faddr2line /lib/debug/lib/modules/(uname -r)/vmlinux m_next+0x13b
> m_next+0x13b/0x170:
> proc_get_vma at fs/proc/task_mmu.c:136
> (inlined by) m_next at fs/proc/task_mmu.c:187
>
> > cat -n /usr/src/debug/kernel-6.11-8833-gde5cb0dcb74c/linux-6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64/fs/proc/task_mmu.c | sed -n '182,192 p'
>    182 {
>    183 if (*ppos == -2UL) {
>    184 *ppos = -1UL;
>    185 return NULL;
>    186 }
>    187 return proc_get_vma(m->private, ppos);
>    188 }
>    189
>    190 static void m_stop(struct seq_file *m, void *v)
>    191 {
>    192 struct proc_maps_private *priv = m->private;
>
> > git blame fs/proc/task_mmu.c -L 182,192
> Blaming lines: 100% (11/11), done.
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 182) {
> c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 183)
>  if (*ppos == -2UL) {
> c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 184)
>          *ppos = -1UL;
> c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 185)
>          return NULL;
> c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 186)   }
> c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 187)
>  return proc_get_vma(m->private, ppos);
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 188) }
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 189)
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 190)
> static void m_stop(struct seq_file *m, void *v)
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 191) {
> a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 192)
>  struct proc_maps_private *priv = m->private;
>
> Hmm this line hasn't changed for two years.
>
> Machine spec: https://linux-hardware.org/?probe=323b76ce48
> I attached below full kernel log and build config.
>
> Can anyone figure out what happened or should we wait for the second
> manifestation of this issue?
>

Finally I spotted that this issue is caused by the Steam client.
And usually happens after downloading game updates.
Looks like Steam client runs some post update scripts which cause
slab-use-after-free in m_next.

Git bisect found the first bad commit:
commit f8d112a4e657c65c888e6b8a8435ef61a66e4ab8 (HEAD)
Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
Date:   Fri Aug 30 00:00:54 2024 -0400

    mm/mmap: avoid zeroing vma tree in mmap_region()

    Instead of zeroing the vma tree and then overwriting the area, let the
    area be overwritten and then clean up the gathered vmas using
    vms_complete_munmap_vmas().

    To ensure locking is downgraded correctly, the mm is set regardless of
    MAP_FIXED or not (NULL vma).

    If a driver is mapping over an existing vma, then clear the ptes before
    the call_mmap() invocation.  This is done using the vms_clean_up_area()
    helper.  If there is a close vm_ops, that must also be called to ensure
    any cleanup is done before mapping over the area.  This also means that
    calling open has been added to the abort of an unmap operation, for now.

    Since vm_ops->open() and vm_ops->close() are not always undo each other
    (state cleanup may exist in ->close() that is lost forever), the code
    cannot be left in this way, but that change has been isolated to another
    commit to make this point very obvious for traceability.

    Temporarily keep track of the number of pages that will be removed and
    reduce the charged amount.

    This also drops the validate_mm() call in the vma_expand() function.  It
    is necessary to drop the validate as it would fail since the mm map_count
    would be incorrect during a vma expansion, prior to the cleanup from
    vms_complete_munmap_vmas().

    Clean up the error handing of the vms_gather_munmap_vmas() by calling the
    verification within the function.

    Link: https://lkml.kernel.org/r/20240830040101.822209-15-Liam.Howlett@oracle.com
    Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
    Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
    Cc: Bert Karwatzki <spasswolf@web.de>
    Cc: Jeff Xu <jeffxu@chromium.org>
    Cc: Jiri Olsa <olsajiri@gmail.com>
    Cc: Kees Cook <kees@kernel.org>
    Cc: Lorenzo Stoakes <lstoakes@gmail.com>
    Cc: Mark Brown <broonie@kernel.org>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: "Paul E. McKenney" <paulmck@kernel.org>
    Cc: Paul Moore <paul@paul-moore.com>
    Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
    Cc: Suren Baghdasaryan <surenb@google.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

 mm/mmap.c | 57 +++++++++++++++++++++++++++------------------------------
 mm/vma.c  | 54 ++++++++++++++++++++++++++++++++++++++++++------------
 mm/vma.h  | 22 ++++++++++++++++------
 3 files changed, 85 insertions(+), 48 deletions(-)

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-02 17:34 ` Mikhail Gavrilov
@ 2024-10-02 17:55   ` Lorenzo Stoakes
  2024-10-02 20:32     ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2024-10-02 17:55 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

Thanks for your report!

On Wed, Oct 02, 2024 at 10:34:32PM GMT, Mikhail Gavrilov wrote:
> On Wed, Sep 25, 2024 at 3:28 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > Hi,
> > I am testing kernel snapshots on Fedora Rawhide and Today with build
> > on commit de5cb0dcb74c I saw for the first time "KASAN:
> > slab-use-after-free in m_next+0x13b".
> > Unfortunately it is not clear what triggered this problem because it
> > happened after 21 hour uptime.
> >
> > Full trace looks like:
> > input: Noble FoKus Mystique (AVRCP) as /devices/virtual/input/input26
> > ==================================================================
> > BUG: KASAN: slab-use-after-free in m_next+0x13b/0x170
> > Read of size 8 at addr ffff8885609b40f0 by task htop/3847
> >
> > CPU: 14 UID: 1000 PID: 3847 Comm: htop Tainted: G        W    L
> > -------  ---  6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64+debug
> > #1
> > Tainted: [W]=WARN, [L]=SOFTLOCKUP
> > Hardware name: ASUS System Product Name/ROG STRIX B650E-I GAMING WIFI,
> > BIOS 3040 09/12/2024
> > Call Trace:
> >  <TASK>
> >  dump_stack_lvl+0x84/0xd0
> >  ? m_next+0x13b/0x170
> >  print_report+0x174/0x505
> >  ? m_next+0x13b/0x170
> >  ? __virt_addr_valid+0x231/0x420
> >  ? m_next+0x13b/0x170
> >  kasan_report+0xab/0x180
> >  ? m_next+0x13b/0x170
> >  m_next+0x13b/0x170
> >  seq_read_iter+0x8e5/0x1130
> >  seq_read+0x2b4/0x3c0
> >  ? __pfx_seq_read+0x10/0x10
> >  ? inode_security+0x54/0xf0
> >  ? rw_verify_area+0x3b2/0x5e0
> >  vfs_read+0x165/0xa20
> >  ? __pfx_vfs_read+0x10/0x10
> >  ? ktime_get_coarse_real_ts64+0x41/0xd0
> >  ? local_clock_noinstr+0xd/0x100
> >  ? __pfx_lock_release+0x10/0x10
> >  ksys_read+0xfb/0x1d0
> >  ? __pfx_ksys_read+0x10/0x10
> >  ? ktime_get_coarse_real_ts64+0x41/0xd0
> >  do_syscall_64+0x97/0x190
> >  ? __lock_acquire+0xdcd/0x62c0
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? __pfx___lock_acquire+0x10/0x10
> >  ? audit_filter_inodes.part.0+0x12d/0x220
> >  ? local_clock_noinstr+0xd/0x100
> >  ? __pfx_lock_release+0x10/0x10
> >  ? rcu_is_watching+0x12/0xc0
> >  ? kfree+0x27c/0x4d0
> >  ? audit_reset_context+0x8c5/0xee0
> >  ? lockdep_hardirqs_on_prepare+0x171/0x400
> >  ? do_syscall_64+0xa3/0x190
> >  ? lockdep_hardirqs_on+0x7c/0x100
> >  ? do_syscall_64+0xa3/0x190
> >  ? do_syscall_64+0xa3/0x190
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x7f4190dcac36
> > Code: 89 df e8 2d c1 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 75 15
> > 83 e2 39 83 fa 08 75 0d e8 32 ff ff ff 66 90 48 8b 45 10 0f 05 <48> 8b
> > 5d f8 c9 c3 0f 1f 40 00 f3 0f 1e fa 55 48 89 e5 48 83 ec 08
> > RSP: 002b:00007ffcde82b690 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
> > RAX: ffffffffffffffda RBX: 00007f4190ce3740 RCX: 00007f4190dcac36
> > RDX: 0000000000000400 RSI: 000055bf5e823a20 RDI: 0000000000000005
> > RBP: 00007ffcde82b6a0 R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000000 R11: 0000000000000202 R12: 00007f4190f44fd0
> > R13: 00007f4190f44e80 R14: 000055bf5e823e20 R15: 000055bf5ecc9160
> >  </TASK>
> >
> > Allocated by task 176289:
> >  kasan_save_stack+0x30/0x50
> >  kasan_save_track+0x14/0x30
> >  __kasan_slab_alloc+0x6e/0x70
> >  kmem_cache_alloc_noprof+0x15a/0x3d0
> >  vm_area_dup+0x23/0x190
> >  __split_vma+0x137/0xd40
> >  vms_gather_munmap_vmas+0x29d/0xfc0
> >  mmap_region+0x35a/0x1f50
> >  do_mmap+0x8e7/0x1020
> >  vm_mmap_pgoff+0x178/0x2f0
> >  __do_fast_syscall_32+0x86/0x110
> >  do_fast_syscall_32+0x32/0x80
> >  sysret32_from_system_call+0x0/0x4a
> >
> > Freed by task 0:
> >  kasan_save_stack+0x30/0x50
> >  kasan_save_track+0x14/0x30
> >  kasan_save_free_info+0x3b/0x70
> >  __kasan_slab_free+0x37/0x50
> >  kmem_cache_free+0x1a7/0x5a0
> >  rcu_do_batch+0x3fd/0x1120
> >  rcu_core+0x636/0x9b0
> >  handle_softirqs+0x1e9/0x8d0
> >  __irq_exit_rcu+0xbb/0x1c0
> >  irq_exit_rcu+0xe/0x30
> >  sysvec_apic_timer_interrupt+0xa1/0xd0
> >  asm_sysvec_apic_timer_interrupt+0x1a/0x20
> >
> > Last potentially related work creation:
> >  kasan_save_stack+0x30/0x50
> >  __kasan_record_aux_stack+0x8e/0xa0
> >  __call_rcu_common.constprop.0+0xf4/0x10d0
> >  vma_complete+0x720/0x10b0
> >  commit_merge+0x42a/0x1310
> >  vma_expand+0x313/0xad0
> >  vma_merge_new_range+0x2cd/0xec0
> >  mmap_region+0x432/0x1f50
> >  do_mmap+0x8e7/0x1020
> >  vm_mmap_pgoff+0x178/0x2f0
> >  __do_fast_syscall_32+0x86/0x110
> >  do_fast_syscall_32+0x32/0x80
> >  sysret32_from_system_call+0x0/0x4a
> >
> > The buggy address belongs to the object at ffff8885609b40f0
> >  which belongs to the cache vm_area_struct of size 176
> > The buggy address is located 0 bytes inside of
> >  freed 176-byte region [ffff8885609b40f0, ffff8885609b41a0)
> >
> > The buggy address belongs to the physical page:
> > page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5609b4
> > head: order:1 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> > memcg:ffff88814d36d001
> > flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> > page_type: f5(slab)
> > raw: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> > raw: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> > head: 0017ffffc0000040 ffff888108113d40 dead000000000100 dead000000000122
> > head: 0000000000000000 0000000000220022 00000001f5000000 ffff88814d36d001
> > head: 0017ffffc0000001 ffffea0015826d01 ffffffffffffffff 0000000000000000
> > head: 0000000000000002 0000000000000000 00000000ffffffff 0000000000000000
> > page dumped because: kasan: bad access detected
> >
> > Memory state around the buggy address:
> >  ffff8885609b3f80: 00 00 00 00 00 00 00 00 00 00 00 00task_mmu 00 00 00 00
> >  ffff8885609b4000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> > >ffff8885609b4080: 00 00 00 00 00 00 fc fc fc fc fc fc fc fc fa fb
> >                                                              ^
> >  ffff8885609b4100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> >  ffff8885609b4180: fb fb fb fb fc fc fc fc fc fc fc fc 00 00 00 00
> > ==================================================================
> > Disabling lock debugging due to kernel taint
> >
> > > sh /usr/src/kernels/(uname -r)/scripts/faddr2line /lib/debug/lib/modules/(uname -r)/vmlinux m_next+0x13b
> > m_next+0x13b/0x170:
> > proc_get_vma at fs/proc/task_mmu.c:136
> > (inlined by) m_next at fs/proc/task_mmu.c:187
> >
> > > cat -n /usr/src/debug/kernel-6.11-8833-gde5cb0dcb74c/linux-6.12.0-0.rc0.20240923gitde5cb0dcb74c.9.fc42.x86_64/fs/proc/task_mmu.c | sed -n '182,192 p'
> >    182 {
> >    183 if (*ppos == -2UL) {
> >    184 *ppos = -1UL;
> >    185 return NULL;
> >    186 }
> >    187 return proc_get_vma(m->private, ppos);
> >    188 }
> >    189
> >    190 static void m_stop(struct seq_file *m, void *v)
> >    191 {
> >    192 struct proc_maps_private *priv = m->private;
> >
> > > git blame fs/proc/task_mmu.c -L 182,192
> > Blaming lines: 100% (11/11), done.
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 182) {
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 183)
> >  if (*ppos == -2UL) {
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 184)
> >          *ppos = -1UL;
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 185)
> >          return NULL;
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 186)   }
> > c4c84f06285e4 (Matthew Wilcox (Oracle) 2022-09-06 19:48:57 +0000 187)
> >  return proc_get_vma(m->private, ppos);
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 188) }
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 189)
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 190)
> > static void m_stop(struct seq_file *m, void *v)
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 191) {
> > a6198797cc3fd (Matt Mackall            2008-02-04 22:29:03 -0800 192)
> >  struct proc_maps_private *priv = m->private;
> >
> > Hmm this line hasn't changed for two years.
> >
> > Machine spec: https://linux-hardware.org/?probe=323b76ce48
> > I attached below full kernel log and build config.
> >
> > Can anyone figure out what happened or should we wait for the second
> > manifestation of this issue?
> >
>
> Finally I spotted that this issue is caused by the Steam client.
> And usually happens after downloading game updates.
> Looks like Steam client runs some post update scripts which cause
> slab-use-after-free in m_next.

Yeah similar issue being investigated elsewhere,

See
https://lore.kernel.org/all/c63a64a9-cdee-4586-85ba-800e8e1a8054@lucifer.local/
for latest update.

This is ongoing, but also steam, also this commit and also related to steam
update doing something strange, so strange I literally can't repro locally :)
but Bert in that thread can.

We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
quickly (let us know if you do).


Also note that there is a critical error handling fix in

https://lore.kernel.org/linux-mm/20241002073932.13482-1-lorenzo.stoakes@oracle.com/

Which should get hotfixed soon.



>
> Git bisect found the first bad commit:
> commit f8d112a4e657c65c888e6b8a8435ef61a66e4ab8 (HEAD)
> Author: Liam R. Howlett <Liam.Howlett@Oracle.com>
> Date:   Fri Aug 30 00:00:54 2024 -0400
>
>     mm/mmap: avoid zeroing vma tree in mmap_region()
>
>     Instead of zeroing the vma tree and then overwriting the area, let the
>     area be overwritten and then clean up the gathered vmas using
>     vms_complete_munmap_vmas().
>
>     To ensure locking is downgraded correctly, the mm is set regardless of
>     MAP_FIXED or not (NULL vma).
>
>     If a driver is mapping over an existing vma, then clear the ptes before
>     the call_mmap() invocation.  This is done using the vms_clean_up_area()
>     helper.  If there is a close vm_ops, that must also be called to ensure
>     any cleanup is done before mapping over the area.  This also means that
>     calling open has been added to the abort of an unmap operation, for now.
>
>     Since vm_ops->open() and vm_ops->close() are not always undo each other
>     (state cleanup may exist in ->close() that is lost forever), the code
>     cannot be left in this way, but that change has been isolated to another
>     commit to make this point very obvious for traceability.
>
>     Temporarily keep track of the number of pages that will be removed and
>     reduce the charged amount.
>
>     This also drops the validate_mm() call in the vma_expand() function.  It
>     is necessary to drop the validate as it would fail since the mm map_count
>     would be incorrect during a vma expansion, prior to the cleanup from
>     vms_complete_munmap_vmas().
>
>     Clean up the error handing of the vms_gather_munmap_vmas() by calling the
>     verification within the function.
>
>     Link: https://lkml.kernel.org/r/20240830040101.822209-15-Liam.Howlett@oracle.com
>     Signed-off-by: Liam R. Howlett <Liam.Howlett@Oracle.com>
>     Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
>     Cc: Bert Karwatzki <spasswolf@web.de>
>     Cc: Jeff Xu <jeffxu@chromium.org>
>     Cc: Jiri Olsa <olsajiri@gmail.com>
>     Cc: Kees Cook <kees@kernel.org>
>     Cc: Lorenzo Stoakes <lstoakes@gmail.com>
>     Cc: Mark Brown <broonie@kernel.org>
>     Cc: Matthew Wilcox <willy@infradead.org>
>     Cc: "Paul E. McKenney" <paulmck@kernel.org>
>     Cc: Paul Moore <paul@paul-moore.com>
>     Cc: Sidhartha Kumar <sidhartha.kumar@oracle.com>
>     Cc: Suren Baghdasaryan <surenb@google.com>
>     Cc: Vlastimil Babka <vbabka@suse.cz>
>     Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
>  mm/mmap.c | 57 +++++++++++++++++++++++++++------------------------------
>  mm/vma.c  | 54 ++++++++++++++++++++++++++++++++++++++++++------------
>  mm/vma.h  | 22 ++++++++++++++++------
>  3 files changed, 85 insertions(+), 48 deletions(-)
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-02 17:55   ` Lorenzo Stoakes
@ 2024-10-02 20:32     ` Lorenzo Stoakes
  2024-10-02 20:45       ` Mikhail Gavrilov
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2024-10-02 20:32 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

On Wed, Oct 02, 2024 at 06:55:59PM GMT, Lorenzo Stoakes wrote:
> Thanks for your report!

Out of curiosity, what GPU are you using? :)

[snip]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-02 20:32     ` Lorenzo Stoakes
@ 2024-10-02 20:45       ` Mikhail Gavrilov
  2024-10-03 21:25         ` Mikhail Gavrilov
  0 siblings, 1 reply; 8+ messages in thread
From: Mikhail Gavrilov @ 2024-10-02 20:45 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
> CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
> quickly (let us know if you do).

mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE'
# CONFIG_DEBUG_VM_MAPLE_TREE is not set
mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM'
CONFIG_DEBUG_VM_IRQSOFF=y
CONFIG_DEBUG_VM=y
# CONFIG_DEBUG_VM_MAPLE_TREE is not set
# CONFIG_DEBUG_VM_RB is not set
CONFIG_DEBUG_VM_PGFLAGS=y
CONFIG_DEBUG_VM_PGTABLE=y
mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE'
# CONFIG_DEBUG_MAPLE_TREE is not set

Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for
reproducing this issue.
Anyway I enabled all three options. I'll try to live for a day without
steam launching. In a day I'll write whether it is reproducing without
steam or not.

On Thu, Oct 3, 2024 at 1:32 AM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
> Out of curiosity, what GPU are you using? :)

The issue reproduces on all my machines.
One has an AMD Radeon 6900 XT and a second AMD Radeon 7900 XTX.

-- 
Best Regards,
Mike Gavrilov.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-02 20:45       ` Mikhail Gavrilov
@ 2024-10-03 21:25         ` Mikhail Gavrilov
  2024-10-03 21:52           ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Mikhail Gavrilov @ 2024-10-03 21:25 UTC (permalink / raw)
  To: Lorenzo Stoakes
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 9240 bytes --]

On Thu, Oct 3, 2024 at 1:45 AM Mikhail Gavrilov
<mikhail.v.gavrilov@gmail.com> wrote:
>
> On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes
> <lorenzo.stoakes@oracle.com> wrote:
> > We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
> > CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
> > quickly (let us know if you do).
>
> mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE'
> # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM'
> CONFIG_DEBUG_VM_IRQSOFF=y
> CONFIG_DEBUG_VM=y
> # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> # CONFIG_DEBUG_VM_RB is not set
> CONFIG_DEBUG_VM_PGFLAGS=y
> CONFIG_DEBUG_VM_PGTABLE=y
> mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE'
> # CONFIG_DEBUG_MAPLE_TREE is not set
>
> Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for
> reproducing this issue.
> Anyway I enabled all three options. I'll try to live for a day without
> steam launching. In a day I'll write whether it is reproducing without
> steam or not.

A day passed, and as expected, the problem did not occur until I launch Steam.
But with suggested options the stacktrace looks different.
Instead of "KASAN: slab-use-after-free in m_next+0x13b" I see this:

[88841.586167] node00000000b4c54d84: data_end 9 != the last slot offset 8
[88841.586315] BUG at mas_validate_limits:7523 (1)
[88841.586320] maple_tree(0000000067811125) flags 30F, height 3 root
0000000040e0c786
[88841.586324] 0-ffffffffffffffff: node 000000009b462d47 depth 0 type
3 parent 00000000db18456d contents: 10000 11400000 1e000 1f000 1f000
75e15000 0 0 0 ffffffff00283000 | 09 09| 000000005518cec0 67FFFFFF
0000000085840a0a 79970FFF 00000000975349aa 79F50FFF 00000000afe6ddd8
7B140FFF 0000000083c903b1 7BB96FFF 00000000335e109c F605AFFF
000000007e7333d1 F6570FFF 00000000d8e9900e F6C92FFF 00000000250ada8a
F76E1FFF 00000000e567baed
[88841.586357]   0-67ffffff: node 000000005c64e204 depth 1 type 3
parent 0000000069e1180e contents: 10000 0 0 0 0 0 0 0 0 0 | 05 00|
000000000cfac463 16FFFF 00000000f0522fec 400FFF 00000000cd8938b8
94FFFF 00000000d2bcb2e3 E9FFFF 00000000ed8d307e 173FFFF
0000000056285bf1 67FFFFFF 0000000000000000 0 0000000000000000 0
0000000000000000 0 0000000000000000
[88841.586388]     0-16ffff: node 0000000037648f62 depth 2 type 1
parent 00000000978387fd contents: 0000000000000000 FFFF
000000000bc2e123 10FFFF 0000000049345b43 11FFFF 000000008940e7cb
126FFF 000000007c2365c0 12FFFF 00000000cfc1c890 142FFF
00000000b64ae6ea 14FFFF 00000000f8f8f6c9 165FFF 000000008460c3ec
16FFFF 0000000000000000 0 0000000000000000 0 0000000000000000 0
0000000000000000 0 0000000000000000 0 0000000000000000 0
000000009d394510
[88841.586413]       0-ffff: 0000000000000000
[88841.586417]       10000-10ffff: 000000000bc2e123
[88841.586420]       110000-11ffff: 0000000049345b43
[88841.586424]       120000-126fff: 000000008940e7cb
[88841.586428]       127000-12ffff: 000000007c2365c0
[88841.586431]       130000-142fff: 00000000cfc1c890
[88841.586435]       143000-14ffff: 00000000b64ae6ea
[88841.586438]       150000-165fff: 00000000f8f8f6c9
[88841.586442]       166000-16ffff: 000000008460c3ec
[88841.586445]     170000-400fff: node 0000000030a5de34 depth 2 type 1
parent 00000000161b9281 contents: 0000000090f8ff7b 171FFF
00000000a90cdf09 17FFFF 00000000ad657f59 190FFF 0000000026397ca7
19FFFF 000000003413c0f4 1B0FFF 000000000ca6dd7d 1BFFFF
00000000cf83b99b 1CEFFF 0000000096a06890 1CFFFF 00000000ed96cdbd
1E5FFF 00000000e6e9d2cb 1EFFFF 00000000bc54b9f4 1FFFFF
000000006e42b324 3DFFFF 00000000afd4728b 3FFFFF 0000000082572c0c
400FFF 0000000000000000 0 00000000e89e29fc
[88841.586471]       170000-171fff: 0000000090f8ff7b
[88841.586474]       172000-17ffff: 00000000a90cdf09
[88841.586478]       180000-190fff: 00000000ad657f59
[88841.586481]       191000-19ffff: 0000000026397ca7
[88841.586485]       1a0000-1b0fff: 000000003413c0f4
[88841.586511]       1b1000-1bffff: 000000000ca6dd7d
[88841.586515]       1c0000-1cefff: 00000000cf83b99b
[88841.586519]       1cf000-1cffff: 0000000096a06890
[88841.586522]       1d0000-1e5fff: 00000000ed96cdbd
[88841.586526]       1e6000-1effff: 00000000e6e9d2cb
[88841.586529]       1f0000-1fffff: 00000000bc54b9f4
[88841.586533]       200000-3dffff: 000000006e42b324
[88841.586537]       3e0000-3fffff: 00000000afd4728b
[88841.586540]       400000-400fff: 0000000082572c0c
[88841.586544]     401000-94ffff: node 00000000f4ffb374 depth 2 type 1
parent 000000005fb58d4e contents: 000000004eafabe6 403FFF
00000000104e2e73 404FFF 000000004dbe1ca9 406FFF 00000000ffb92c1b
407FFF 00000000cffd3517 409FFF 000000009ef45250 40FFFF
00000000373dd145 410FFF 00000000eaff67b3 50FFFF 000000002e632fe1
511FFF 000000001839285f 60FFFF 0000000043d54299 611FFF
00000000da2961ba 80FFFF 00000000155e68ba 8C9FFF 0000000010bfe63e
8CFFFF 00000000a4834cd3 94FFFF 000000000e628eae
[88841.586569]       401000-403fff: 000000004eafabe6
[88841.586572]       404000-404fff: 00000000104e2e73
[88841.586576]       405000-406fff: 000000004dbe1ca9
[88841.586579]       407000-407fff: 00000000ffb92c1b
[88841.586583]       408000-409fff: 00000000cffd3517
[88841.586586]       40a000-40ffff: 000000009ef45250
[88841.586590]       410000-410fff: 00000000373dd145
[88841.586594]       411000-50ffff: 00000000eaff67b3
[88841.586597]       510000-511fff: 000000002e632fe1
[88841.586601]       512000-60ffff: 000000001839285f
[88841.586604]       610000-611fff: 0000000043d54299
[88841.586608]       612000-80ffff: 00000000da2961ba
[88841.586611]       810000-8c9fff: 00000000155e68ba
[88841.586615]       8ca000-8cffff: 0000000010bfe63e
[88841.586618]       8d0000-94ffff: 00000000a4834cd3
***
[88841.592355] Pass: 3886705433 Run:3886705434
[88841.592359] CPU: 22 UID: 1000 PID: 273842 Comm: rundll32.exe
Tainted: G        W    L
6.11.0-rc6-13b-f8d112a4e657c65c888e6b8a8435ef61a66e4ab8+ #720
[88841.592364] Tainted: [W]=WARN, [L]=SOFTLOCKUP
[88841.592366] Hardware name: ASUS System Product Name/ROG STRIX
B650E-I GAMING WIFI, BIOS 3040 09/12/2024
[88841.592369] Call Trace:
[88841.592372]  <TASK>
[88841.592376]  dump_stack_lvl+0x84/0xd0
[88841.592384]  mt_validate+0x2932/0x2980
[88841.592397]  ? __pfx_mt_validate+0x10/0x10
[88841.592408]  validate_mm+0xa5/0x310
[88841.592414]  ? __pfx_validate_mm+0x10/0x10
[88841.592427]  vms_complete_munmap_vmas+0x572/0x9b0
[88841.592431]  ? __pfx_mas_prev+0x10/0x10
[88841.592438]  mmap_region+0x10f9/0x24a0
[88841.592447]  ? __pfx_mmap_region+0x10/0x10
[88841.592450]  ? __pfx_mark_lock+0x10/0x10
[88841.592459]  ? mark_lock+0xf5/0x16d0
[88841.592474]  ? mm_get_unmapped_area_vmflags+0x48/0xc0
[88841.592482]  ? security_mmap_addr+0x57/0x90
[88841.592487]  ? __get_unmapped_area+0x191/0x2c0
[88841.592492]  do_mmap+0x8cf/0xff0
[88841.592500]  ? __pfx_do_mmap+0x10/0x10
[88841.592503]  ? down_write_killable+0x19d/0x280
[88841.592506]  ? __pfx_down_write_killable+0x10/0x10
[88841.592513]  vm_mmap_pgoff+0x178/0x2f0
[88841.592521]  ? __pfx_vm_mmap_pgoff+0x10/0x10
[88841.592524]  ? lockdep_hardirqs_on+0x7c/0x100
[88841.592528]  ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
[88841.592537]  __do_fast_syscall_32+0x86/0x110
[88841.592540]  ? kfree+0x257/0x3a0
[88841.592547]  ? audit_reset_context+0x8c5/0xee0
[88841.592555]  ? lockdep_hardirqs_on_prepare+0x171/0x400
[88841.592558]  ? __do_fast_syscall_32+0x92/0x110
[88841.592561]  ? lockdep_hardirqs_on+0x7c/0x100
[88841.592564]  ? __do_fast_syscall_32+0x92/0x110
[88841.592571]  ? lockdep_hardirqs_on_prepare+0x171/0x400
[88841.592574]  ? __do_fast_syscall_32+0x92/0x110
[88841.592577]  ? lockdep_hardirqs_on+0x7c/0x100
[88841.592580]  ? __do_fast_syscall_32+0x92/0x110
[88841.592583]  ? audit_reset_context+0x8c5/0xee0
[88841.592590]  ? lockdep_hardirqs_on_prepare+0x171/0x400
[88841.592593]  ? __do_fast_syscall_32+0x92/0x110
[88841.592596]  ? lockdep_hardirqs_on+0x7c/0x100
[88841.592600]  ? rcu_is_watching+0x12/0xc0
[88841.592603]  ? trace_irq_disable.constprop.0+0xce/0x110
[88841.592609]  do_fast_syscall_32+0x32/0x80
[88841.592612]  entry_SYSCALL_compat_after_hwframe+0x75/0x75
[88841.592616] RIP: 0023:0xf7f3e5a9
[88841.592632] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08
03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 cd 0f
05 cd 80 <5d> 5a 59 c3 cc 90 90 90 2e 8d b4 26 00 00 00 00 8d b4 26 00
00 00
[88841.592635] RSP: 002b:000000000050f450 EFLAGS: 00000256 ORIG_RAX:
00000000000000c0
[88841.592639] RAX: ffffffffffffffda RBX: 0000000001b90000 RCX: 000000000001f000
[88841.592641] RDX: 0000000000000000 RSI: 0000000000004032 RDI: 00000000ffffffff
[88841.592644] RBP: 0000000000000000 R08: 000000000050f450 R09: 0000000000000000
[88841.592646] R10: 0000000000000000 R11: 0000000000000256 R12: 0000000000000000
[88841.592648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[88841.592658]  </TASK>
[88841.592668] 00000000b4c54d84[9] should not have entry 00000000f0273bd5

Full kernel log attached here below as archive.

-- 
Best Regards,
Mike Gavrilov.

[-- Attachment #2: dmesg-6.11.0-rc6-13b-f8d112a4e657c65c888e6b8a8435ef61a66e4ab8.zip --]
[-- Type: application/zip, Size: 169251 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-03 21:25         ` Mikhail Gavrilov
@ 2024-10-03 21:52           ` Lorenzo Stoakes
  2024-10-05  6:45             ` Lorenzo Stoakes
  0 siblings, 1 reply; 8+ messages in thread
From: Lorenzo Stoakes @ 2024-10-03 21:52 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

On Fri, Oct 04, 2024 at 02:25:07AM +0500, Mikhail Gavrilov wrote:
> On Thu, Oct 3, 2024 at 1:45 AM Mikhail Gavrilov
> <mikhail.v.gavrilov@gmail.com> wrote:
> >
> > On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
> > > CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
> > > quickly (let us know if you do).
> >
> > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE'
> > # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM'
> > CONFIG_DEBUG_VM_IRQSOFF=y
> > CONFIG_DEBUG_VM=y
> > # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> > # CONFIG_DEBUG_VM_RB is not set
> > CONFIG_DEBUG_VM_PGFLAGS=y
> > CONFIG_DEBUG_VM_PGTABLE=y
> > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE'
> > # CONFIG_DEBUG_MAPLE_TREE is not set
> >
> > Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for
> > reproducing this issue.
> > Anyway I enabled all three options. I'll try to live for a day without
> > steam launching. In a day I'll write whether it is reproducing without
> > steam or not.
>
> A day passed, and as expected, the problem did not occur until I launch Steam.
> But with suggested options the stacktrace looks different.
> Instead of "KASAN: slab-use-after-free in m_next+0x13b" I see this:
>
> [88841.586167] node00000000b4c54d84: data_end 9 != the last slot offset 8

Thanks, looking into the attached dmesg this looks to be identical to the
issue that Bert reported in the other thread.

The nature of it is that once the corruption happens 'weird stuff' will
happen after this, luckily this debug mode lets us pick up on the original
corruption.

Bert is somehow luckily is able to reproduce very repeatably, so we have
been able to get a lot more information, but it's taking time to truly
narrow it down.

Am working flat out to try to resolve the issue, we have before/after maple
trees and it seems like a certain operation is resulting in a corrupted
maple tree (duplicate 0x67ffffff entry).

It is proving very very stubborn to be able to reproduce locally even in a
controlled environment where the maple tree is manually set up, but am
continuing my efforts to try to do so as best I can! :)

Will respond here once we have a viable fix.

Thanks again for taking the time to report and to grab the debug maple
tree, it's very useful!

Cheers, Lorenzo

> [88841.586315] BUG at mas_validate_limits:7523 (1)
> [88841.586320] maple_tree(0000000067811125) flags 30F, height 3 root
> 0000000040e0c786
> [88841.586324] 0-ffffffffffffffff: node 000000009b462d47 depth 0 type
> 3 parent 00000000db18456d contents: 10000 11400000 1e000 1f000 1f000
> 75e15000 0 0 0 ffffffff00283000 | 09 09| 000000005518cec0 67FFFFFF
> 0000000085840a0a 79970FFF 00000000975349aa 79F50FFF 00000000afe6ddd8
> 7B140FFF 0000000083c903b1 7BB96FFF 00000000335e109c F605AFFF
> 000000007e7333d1 F6570FFF 00000000d8e9900e F6C92FFF 00000000250ada8a
> F76E1FFF 00000000e567baed
> [88841.586357]   0-67ffffff: node 000000005c64e204 depth 1 type 3
> parent 0000000069e1180e contents: 10000 0 0 0 0 0 0 0 0 0 | 05 00|
> 000000000cfac463 16FFFF 00000000f0522fec 400FFF 00000000cd8938b8
> 94FFFF 00000000d2bcb2e3 E9FFFF 00000000ed8d307e 173FFFF
> 0000000056285bf1 67FFFFFF 0000000000000000 0 0000000000000000 0
> 0000000000000000 0 0000000000000000
> [88841.586388]     0-16ffff: node 0000000037648f62 depth 2 type 1
> parent 00000000978387fd contents: 0000000000000000 FFFF
> 000000000bc2e123 10FFFF 0000000049345b43 11FFFF 000000008940e7cb
> 126FFF 000000007c2365c0 12FFFF 00000000cfc1c890 142FFF
> 00000000b64ae6ea 14FFFF 00000000f8f8f6c9 165FFF 000000008460c3ec
> 16FFFF 0000000000000000 0 0000000000000000 0 0000000000000000 0
> 0000000000000000 0 0000000000000000 0 0000000000000000 0
> 000000009d394510
> [88841.586413]       0-ffff: 0000000000000000
> [88841.586417]       10000-10ffff: 000000000bc2e123
> [88841.586420]       110000-11ffff: 0000000049345b43
> [88841.586424]       120000-126fff: 000000008940e7cb
> [88841.586428]       127000-12ffff: 000000007c2365c0
> [88841.586431]       130000-142fff: 00000000cfc1c890
> [88841.586435]       143000-14ffff: 00000000b64ae6ea
> [88841.586438]       150000-165fff: 00000000f8f8f6c9
> [88841.586442]       166000-16ffff: 000000008460c3ec
> [88841.586445]     170000-400fff: node 0000000030a5de34 depth 2 type 1
> parent 00000000161b9281 contents: 0000000090f8ff7b 171FFF
> 00000000a90cdf09 17FFFF 00000000ad657f59 190FFF 0000000026397ca7
> 19FFFF 000000003413c0f4 1B0FFF 000000000ca6dd7d 1BFFFF
> 00000000cf83b99b 1CEFFF 0000000096a06890 1CFFFF 00000000ed96cdbd
> 1E5FFF 00000000e6e9d2cb 1EFFFF 00000000bc54b9f4 1FFFFF
> 000000006e42b324 3DFFFF 00000000afd4728b 3FFFFF 0000000082572c0c
> 400FFF 0000000000000000 0 00000000e89e29fc
> [88841.586471]       170000-171fff: 0000000090f8ff7b
> [88841.586474]       172000-17ffff: 00000000a90cdf09
> [88841.586478]       180000-190fff: 00000000ad657f59
> [88841.586481]       191000-19ffff: 0000000026397ca7
> [88841.586485]       1a0000-1b0fff: 000000003413c0f4
> [88841.586511]       1b1000-1bffff: 000000000ca6dd7d
> [88841.586515]       1c0000-1cefff: 00000000cf83b99b
> [88841.586519]       1cf000-1cffff: 0000000096a06890
> [88841.586522]       1d0000-1e5fff: 00000000ed96cdbd
> [88841.586526]       1e6000-1effff: 00000000e6e9d2cb
> [88841.586529]       1f0000-1fffff: 00000000bc54b9f4
> [88841.586533]       200000-3dffff: 000000006e42b324
> [88841.586537]       3e0000-3fffff: 00000000afd4728b
> [88841.586540]       400000-400fff: 0000000082572c0c
> [88841.586544]     401000-94ffff: node 00000000f4ffb374 depth 2 type 1
> parent 000000005fb58d4e contents: 000000004eafabe6 403FFF
> 00000000104e2e73 404FFF 000000004dbe1ca9 406FFF 00000000ffb92c1b
> 407FFF 00000000cffd3517 409FFF 000000009ef45250 40FFFF
> 00000000373dd145 410FFF 00000000eaff67b3 50FFFF 000000002e632fe1
> 511FFF 000000001839285f 60FFFF 0000000043d54299 611FFF
> 00000000da2961ba 80FFFF 00000000155e68ba 8C9FFF 0000000010bfe63e
> 8CFFFF 00000000a4834cd3 94FFFF 000000000e628eae
> [88841.586569]       401000-403fff: 000000004eafabe6
> [88841.586572]       404000-404fff: 00000000104e2e73
> [88841.586576]       405000-406fff: 000000004dbe1ca9
> [88841.586579]       407000-407fff: 00000000ffb92c1b
> [88841.586583]       408000-409fff: 00000000cffd3517
> [88841.586586]       40a000-40ffff: 000000009ef45250
> [88841.586590]       410000-410fff: 00000000373dd145
> [88841.586594]       411000-50ffff: 00000000eaff67b3
> [88841.586597]       510000-511fff: 000000002e632fe1
> [88841.586601]       512000-60ffff: 000000001839285f
> [88841.586604]       610000-611fff: 0000000043d54299
> [88841.586608]       612000-80ffff: 00000000da2961ba
> [88841.586611]       810000-8c9fff: 00000000155e68ba
> [88841.586615]       8ca000-8cffff: 0000000010bfe63e
> [88841.586618]       8d0000-94ffff: 00000000a4834cd3
> ***
> [88841.592355] Pass: 3886705433 Run:3886705434
> [88841.592359] CPU: 22 UID: 1000 PID: 273842 Comm: rundll32.exe
> Tainted: G        W    L
> 6.11.0-rc6-13b-f8d112a4e657c65c888e6b8a8435ef61a66e4ab8+ #720
> [88841.592364] Tainted: [W]=WARN, [L]=SOFTLOCKUP
> [88841.592366] Hardware name: ASUS System Product Name/ROG STRIX
> B650E-I GAMING WIFI, BIOS 3040 09/12/2024
> [88841.592369] Call Trace:
> [88841.592372]  <TASK>
> [88841.592376]  dump_stack_lvl+0x84/0xd0
> [88841.592384]  mt_validate+0x2932/0x2980
> [88841.592397]  ? __pfx_mt_validate+0x10/0x10
> [88841.592408]  validate_mm+0xa5/0x310
> [88841.592414]  ? __pfx_validate_mm+0x10/0x10
> [88841.592427]  vms_complete_munmap_vmas+0x572/0x9b0
> [88841.592431]  ? __pfx_mas_prev+0x10/0x10
> [88841.592438]  mmap_region+0x10f9/0x24a0
> [88841.592447]  ? __pfx_mmap_region+0x10/0x10
> [88841.592450]  ? __pfx_mark_lock+0x10/0x10
> [88841.592459]  ? mark_lock+0xf5/0x16d0
> [88841.592474]  ? mm_get_unmapped_area_vmflags+0x48/0xc0
> [88841.592482]  ? security_mmap_addr+0x57/0x90
> [88841.592487]  ? __get_unmapped_area+0x191/0x2c0
> [88841.592492]  do_mmap+0x8cf/0xff0
> [88841.592500]  ? __pfx_do_mmap+0x10/0x10
> [88841.592503]  ? down_write_killable+0x19d/0x280
> [88841.592506]  ? __pfx_down_write_killable+0x10/0x10
> [88841.592513]  vm_mmap_pgoff+0x178/0x2f0
> [88841.592521]  ? __pfx_vm_mmap_pgoff+0x10/0x10
> [88841.592524]  ? lockdep_hardirqs_on+0x7c/0x100
> [88841.592528]  ? seqcount_lockdep_reader_access.constprop.0+0xa5/0xb0
> [88841.592537]  __do_fast_syscall_32+0x86/0x110
> [88841.592540]  ? kfree+0x257/0x3a0
> [88841.592547]  ? audit_reset_context+0x8c5/0xee0
> [88841.592555]  ? lockdep_hardirqs_on_prepare+0x171/0x400
> [88841.592558]  ? __do_fast_syscall_32+0x92/0x110
> [88841.592561]  ? lockdep_hardirqs_on+0x7c/0x100
> [88841.592564]  ? __do_fast_syscall_32+0x92/0x110
> [88841.592571]  ? lockdep_hardirqs_on_prepare+0x171/0x400
> [88841.592574]  ? __do_fast_syscall_32+0x92/0x110
> [88841.592577]  ? lockdep_hardirqs_on+0x7c/0x100
> [88841.592580]  ? __do_fast_syscall_32+0x92/0x110
> [88841.592583]  ? audit_reset_context+0x8c5/0xee0
> [88841.592590]  ? lockdep_hardirqs_on_prepare+0x171/0x400
> [88841.592593]  ? __do_fast_syscall_32+0x92/0x110
> [88841.592596]  ? lockdep_hardirqs_on+0x7c/0x100
> [88841.592600]  ? rcu_is_watching+0x12/0xc0
> [88841.592603]  ? trace_irq_disable.constprop.0+0xce/0x110
> [88841.592609]  do_fast_syscall_32+0x32/0x80
> [88841.592612]  entry_SYSCALL_compat_after_hwframe+0x75/0x75
> [88841.592616] RIP: 0023:0xf7f3e5a9
> [88841.592632] Code: b8 01 10 06 03 74 b4 01 10 07 03 74 b0 01 10 08
> 03 74 d8 01 00 00 00 00 00 00 00 00 00 00 00 00 00 51 52 55 89 cd 0f
> 05 cd 80 <5d> 5a 59 c3 cc 90 90 90 2e 8d b4 26 00 00 00 00 8d b4 26 00
> 00 00
> [88841.592635] RSP: 002b:000000000050f450 EFLAGS: 00000256 ORIG_RAX:
> 00000000000000c0
> [88841.592639] RAX: ffffffffffffffda RBX: 0000000001b90000 RCX: 000000000001f000
> [88841.592641] RDX: 0000000000000000 RSI: 0000000000004032 RDI: 00000000ffffffff
> [88841.592644] RBP: 0000000000000000 R08: 000000000050f450 R09: 0000000000000000
> [88841.592646] R10: 0000000000000000 R11: 0000000000000256 R12: 0000000000000000
> [88841.592648] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
> [88841.592658]  </TASK>
> [88841.592668] 00000000b4c54d84[9] should not have entry 00000000f0273bd5
>
> Full kernel log attached here below as archive.
>
> --
> Best Regards,
> Mike Gavrilov.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187
  2024-10-03 21:52           ` Lorenzo Stoakes
@ 2024-10-05  6:45             ` Lorenzo Stoakes
  0 siblings, 0 replies; 8+ messages in thread
From: Lorenzo Stoakes @ 2024-10-05  6:45 UTC (permalink / raw)
  To: Mikhail Gavrilov
  Cc: Linux List Kernel Mailing, Linux regressions mailing list,
	linux-fsdevel, Liam.Howlett, Andrew Morton,
	Linux Memory Management List

On Thu, Oct 03, 2024 at 10:52:03PM +0100, Lorenzo Stoakes wrote:
> On Fri, Oct 04, 2024 at 02:25:07AM +0500, Mikhail Gavrilov wrote:
> > On Thu, Oct 3, 2024 at 1:45 AM Mikhail Gavrilov
> > <mikhail.v.gavrilov@gmail.com> wrote:
> > >
> > > On Wed, Oct 2, 2024 at 10:56 PM Lorenzo Stoakes
> > > <lorenzo.stoakes@oracle.com> wrote:
> > > > We can reliably repro it with CONFIG_DEBUG_VM_MAPLE_TREE, CONFIG_DEBUG_VM, and
> > > > CONFIG_DEBUG_MAPLE_TREE set, if you set these you should see a report more
> > > > quickly (let us know if you do).
> > >
> > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM_MAPLE_TREE'
> > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_VM'
> > > CONFIG_DEBUG_VM_IRQSOFF=y
> > > CONFIG_DEBUG_VM=y
> > > # CONFIG_DEBUG_VM_MAPLE_TREE is not set
> > > # CONFIG_DEBUG_VM_RB is not set
> > > CONFIG_DEBUG_VM_PGFLAGS=y
> > > CONFIG_DEBUG_VM_PGTABLE=y
> > > mikhail@primary-ws ~/dmesg> cat .config | grep 'CONFIG_DEBUG_MAPLE_TREE'
> > > # CONFIG_DEBUG_MAPLE_TREE is not set
> > >
> > > Fedora's kernel build uses only CONFIG_DEBUG_VM and it's enough for
> > > reproducing this issue.
> > > Anyway I enabled all three options. I'll try to live for a day without
> > > steam launching. In a day I'll write whether it is reproducing without
> > > steam or not.
> >
> > A day passed, and as expected, the problem did not occur until I launch Steam.
> > But with suggested options the stacktrace looks different.
> > Instead of "KASAN: slab-use-after-free in m_next+0x13b" I see this:
> >
> > [88841.586167] node00000000b4c54d84: data_end 9 != the last slot offset 8
>
> Thanks, looking into the attached dmesg this looks to be identical to the
> issue that Bert reported in the other thread.
>
> The nature of it is that once the corruption happens 'weird stuff' will
> happen after this, luckily this debug mode lets us pick up on the original
> corruption.
>
> Bert is somehow luckily is able to reproduce very repeatably, so we have
> been able to get a lot more information, but it's taking time to truly
> narrow it down.
>
> Am working flat out to try to resolve the issue, we have before/after maple
> trees and it seems like a certain operation is resulting in a corrupted
> maple tree (duplicate 0x67ffffff entry).
>
> It is proving very very stubborn to be able to reproduce locally even in a
> controlled environment where the maple tree is manually set up, but am
> continuing my efforts to try to do so as best I can! :)
>
> Will respond here once we have a viable fix.

I cc'd (and tagged) you over there, but I have a fix for this problem, do give
it a try! [0]

[0]: https://lore.kernel.org/linux-mm/20241005064114.42770-1-lorenzo.stoakes@oracle.com/

[snip]

Cheers, Lorenzo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-10-05  6:45 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-24 22:28 6.12/BUG: KASAN: slab-use-after-free in m_next at fs/proc/task_mmu.c:187 Mikhail Gavrilov
2024-10-02 17:34 ` Mikhail Gavrilov
2024-10-02 17:55   ` Lorenzo Stoakes
2024-10-02 20:32     ` Lorenzo Stoakes
2024-10-02 20:45       ` Mikhail Gavrilov
2024-10-03 21:25         ` Mikhail Gavrilov
2024-10-03 21:52           ` Lorenzo Stoakes
2024-10-05  6:45             ` Lorenzo Stoakes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).