* BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
@ 2024-12-06 21:50 Ben Greear
2024-12-06 22:03 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2024-12-06 21:50 UTC (permalink / raw)
To: Suren Baghdasaryan, LKML
Hello Suren,
My system crashes on bootup, and I bisected to this commit.
0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
commit 0f9b685626daa2f8e19a9788625c9b624c223e45
Author: Suren Baghdasaryan <surenb@google.com>
Date: Wed Oct 23 10:07:57 2024 -0700
alloc_tag: populate memory for module tags as needed
The memory reserved for module tags does not need to be backed by physical
pages until there are tags to store there. Change the way we reserve this
memory to allocate only virtual area for the tags and populate it with
physical pages as needed when we load a module.
The crash looks like this:
BUG: unable to handle page fault for address: fffffbfff4041000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
RIP: 0010:kasan_check_range+0xa5/0x190
Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
[ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
- Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
dev Devices.
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
[ OK ? __die+0x1f/0x60
0m] Reached targ ? page_fault_oops+0x258/0x910
et sysi ? dump_pagetable+0x690/0x690
nit.target - ? search_bpf_extables+0x22/0x250
System Initiali ? trace_page_fault_kernel+0x120/0x120
zation.
? search_bpf_extables+0x164/0x250
? kasan_check_range+0xa5/0x190
? fixup_exception+0x4d/0xc70
? exc_page_fault+0xe1/0xf0
[ OK ? asm_exc_page_fault+0x22/0x30
0m] Reached targ ? load_module+0x3d7b/0x7560
et netw ? kasan_check_range+0xa5/0x190
ork.target - __asan_memcpy+0x38/0x60
Network.
load_module+0x3d7b/0x7560
? module_frob_arch_sections+0x30/0x30
? lockdep_lock+0xbe/0x1b0
? rw_verify_area+0x18d/0x5e0
? kernel_read_file+0x246/0x870
? __x64_sys_fspick+0x290/0x290
? init_module_from_file+0xd1/0x130
init_module_from_file+0xd1/0x130
? __ia32_sys_init_module+0xa0/0xa0
? lock_acquire+0x2d/0xb0
? idempotent_init_module+0x116/0x790
? do_raw_spin_unlock+0x54/0x220
idempotent_init_module+0x226/0x790
? init_module_from_file+0x130/0x130
? vm_mmap_pgoff+0x203/0x2e0
__x64_sys_finit_module+0xba/0x130
do_syscall_64+0x69/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fe869de327d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
</TASK>
Modules linked in:
CR2: fffffbfff4041000
---[ end trace 0000000000000000 ]---
I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
is found here:
http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
I will be happy to test fixes.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-06 21:50 BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot Ben Greear
@ 2024-12-06 22:03 ` Suren Baghdasaryan
2024-12-06 22:43 ` Ben Greear
0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-06 22:03 UTC (permalink / raw)
To: Ben Greear; +Cc: LKML
On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
>
> Hello Suren,
>
> My system crashes on bootup, and I bisected to this commit.
>
> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> Author: Suren Baghdasaryan <surenb@google.com>
> Date: Wed Oct 23 10:07:57 2024 -0700
>
> alloc_tag: populate memory for module tags as needed
>
> The memory reserved for module tags does not need to be backed by physical
> pages until there are tags to store there. Change the way we reserve this
> memory to allocate only virtual area for the tags and populate it with
> physical pages as needed when we load a module.
>
> The crash looks like this:
>
> BUG: unable to handle page fault for address: fffffbfff4041000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> RIP: 0010:kasan_check_range+0xa5/0x190
> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> dev Devices.
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> [ OK ? __die+0x1f/0x60
> 0m] Reached targ ? page_fault_oops+0x258/0x910
> et sysi ? dump_pagetable+0x690/0x690
> nit.target - ? search_bpf_extables+0x22/0x250
> System Initiali ? trace_page_fault_kernel+0x120/0x120
> zation.
> ? search_bpf_extables+0x164/0x250
> ? kasan_check_range+0xa5/0x190
> ? fixup_exception+0x4d/0xc70
> ? exc_page_fault+0xe1/0xf0
> [ OK ? asm_exc_page_fault+0x22/0x30
> 0m] Reached targ ? load_module+0x3d7b/0x7560
> et netw ? kasan_check_range+0xa5/0x190
> ork.target - __asan_memcpy+0x38/0x60
> Network.
> load_module+0x3d7b/0x7560
> ? module_frob_arch_sections+0x30/0x30
> ? lockdep_lock+0xbe/0x1b0
> ? rw_verify_area+0x18d/0x5e0
> ? kernel_read_file+0x246/0x870
> ? __x64_sys_fspick+0x290/0x290
> ? init_module_from_file+0xd1/0x130
> init_module_from_file+0xd1/0x130
> ? __ia32_sys_init_module+0xa0/0xa0
> ? lock_acquire+0x2d/0xb0
> ? idempotent_init_module+0x116/0x790
> ? do_raw_spin_unlock+0x54/0x220
> idempotent_init_module+0x226/0x790
> ? init_module_from_file+0x130/0x130
> ? vm_mmap_pgoff+0x203/0x2e0
> __x64_sys_finit_module+0xba/0x130
> do_syscall_64+0x69/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7fe869de327d
> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> </TASK>
> Modules linked in:
> CR2: fffffbfff4041000
> ---[ end trace 0000000000000000 ]---
>
> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> is found here:
>
> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
>
> I will be happy to test fixes.
Hi Ben,
Thanks for reporting the issue. Do you have these recent fixes in your tree:
https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
If not, couple you please apply them and see if the issue is still happening?
Thanks,
Suren.
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc http://www.candelatech.com
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-06 22:03 ` Suren Baghdasaryan
@ 2024-12-06 22:43 ` Ben Greear
2024-12-06 22:55 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2024-12-06 22:43 UTC (permalink / raw)
To: Suren Baghdasaryan; +Cc: LKML
On 12/6/24 14:03, Suren Baghdasaryan wrote:
> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
>>
>> Hello Suren,
>>
>> My system crashes on bootup, and I bisected to this commit.
>>
>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
>> Author: Suren Baghdasaryan <surenb@google.com>
>> Date: Wed Oct 23 10:07:57 2024 -0700
>>
>> alloc_tag: populate memory for module tags as needed
>>
>> The memory reserved for module tags does not need to be backed by physical
>> pages until there are tags to store there. Change the way we reserve this
>> memory to allocate only virtual area for the tags and populate it with
>> physical pages as needed when we load a module.
>>
>> The crash looks like this:
>>
>> BUG: unable to handle page fault for address: fffffbfff4041000
>> #PF: supervisor read access in kernel mode
>> #PF: error_code(0x0000) - not-present page
>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
>> RIP: 0010:kasan_check_range+0xa5/0x190
>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
>> dev Devices.
>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>> Call Trace:
>> <TASK>
>> [ OK ? __die+0x1f/0x60
>> 0m] Reached targ ? page_fault_oops+0x258/0x910
>> et sysi ? dump_pagetable+0x690/0x690
>> nit.target - ? search_bpf_extables+0x22/0x250
>> System Initiali ? trace_page_fault_kernel+0x120/0x120
>> zation.
>> ? search_bpf_extables+0x164/0x250
>> ? kasan_check_range+0xa5/0x190
>> ? fixup_exception+0x4d/0xc70
>> ? exc_page_fault+0xe1/0xf0
>> [ OK ? asm_exc_page_fault+0x22/0x30
>> 0m] Reached targ ? load_module+0x3d7b/0x7560
>> et netw ? kasan_check_range+0xa5/0x190
>> ork.target - __asan_memcpy+0x38/0x60
>> Network.
>> load_module+0x3d7b/0x7560
>> ? module_frob_arch_sections+0x30/0x30
>> ? lockdep_lock+0xbe/0x1b0
>> ? rw_verify_area+0x18d/0x5e0
>> ? kernel_read_file+0x246/0x870
>> ? __x64_sys_fspick+0x290/0x290
>> ? init_module_from_file+0xd1/0x130
>> init_module_from_file+0xd1/0x130
>> ? __ia32_sys_init_module+0xa0/0xa0
>> ? lock_acquire+0x2d/0xb0
>> ? idempotent_init_module+0x116/0x790
>> ? do_raw_spin_unlock+0x54/0x220
>> idempotent_init_module+0x226/0x790
>> ? init_module_from_file+0x130/0x130
>> ? vm_mmap_pgoff+0x203/0x2e0
>> __x64_sys_finit_module+0xba/0x130
>> do_syscall_64+0x69/0x160
>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>> RIP: 0033:0x7fe869de327d
>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
>> </TASK>
>> Modules linked in:
>> CR2: fffffbfff4041000
>> ---[ end trace 0000000000000000 ]---
>>
>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
>> is found here:
>>
>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
>>
>> I will be happy to test fixes.
>
> Hi Ben,
> Thanks for reporting the issue. Do you have these recent fixes in your tree:
>
> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
>
> If not, couple you please apply them and see if the issue is still happening?
> Thanks,
> Suren.
Hello Suren,
Thanks for the quick response. The first patch is already in latest Linus tree,
and I applied the second one manually. The kernel still crashes:
(gdb) l *(load_module+0x3c46)
0xffffffff814f30d6 is in load_module (/home/greearb/git/linux-2.6/kernel/module/main.c:2624).
2619 if (i == info->index.mod &&
2620 (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
2621 ret = -ENOEXEC;
2622 goto out_err;
2623 }
2624 memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
2625 }
2626 /*
2627 * Update the userspace copy's ELF section address to point to
2628 * our newly allocated memory as a pure convenience so that
(gdb)
(gdb) l *(__asan_memcpy+0x38)
0xffffffff81b39268 is in __asan_memcpy (/home/greearb/git/linux-2.6/mm/kasan/shadow.c:105).
100 EXPORT_SYMBOL(__asan_memmove);
101 #endif
102
103 void *__asan_memcpy(void *dest, const void *src, ssize_t len)
104 {
105 if (!kasan_check_range(src, len, false, _RET_IP_) ||
106 !kasan_check_range(dest, len, true, _RET_IP_))
107 return NULL;
108
109 return __memcpy(dest, src, len);
(gdb)
(gdb) l *(kasan_check_range+0xa5)
0xffffffff81b386f5 is in kasan_check_range (/home/greearb/git/linux-2.6/mm/kasan/generic.c:116).
111 start += prefix;
112 }
113
114 words = (end - start) / 8;
115 while (words) {
116 if (unlikely(*(u64 *)start))
117 return bytes_is_nonzero(start, 8);
118 start += 8;
119 words--;
120 }
(gdb)
BUG: unable to handle page fault for address: fffffbfff4041000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb3d067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
CPU: 7 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.13.0-rc1+ #24
Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
RIP: 0010:kasan_check_range+0xa5/0x190
Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
RSP: 0018:ffff88812d3ef980 EFLAGS: 00010206
RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814f30d6
RDX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
[ OK BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
0m] Finished 10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
;1;39msystemd-udR13: ffffc90000dac930 R14: ffffc90000dac950 R15: dffffc0000000000
ev-trig…e FS: 00007fc25423bb40(0000) GS:ffff88841dd80000(0000) knlGS:0000000000000000
- Coldplug All uCS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: fffffbfff4041000 CR3: 0000000128ef4003 CR4: 00000000003706f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? __die+0x1f/0x60
? page_fault_oops+0x258/0x910
? dump_pagetable+0x690/0x690
? search_bpf_extables+0x22/0x250
? show_ldttss+0x230/0x230
? search_bpf_extables+0x164/0x250
? kasan_check_range+0xa5/0x190
? fixup_exception+0x4d/0xc70
? exc_page_fault+0xe1/0xf0
? asm_exc_page_fault+0x22/0x30
? load_module+0x3c46/0x8680
? kasan_check_range+0xa5/0x190
__asan_memcpy+0x38/0x60
load_module+0x3c46/0x8680
? __kernel_read+0x270/0x9f0
? module_frob_arch_sections+0x30/0x30
? lockdep_lock+0xbe/0x1b0
? rw_verify_area+0x18d/0x5e0
? kernel_read_file+0x246/0x870
? __x64_sys_fspick+0x290/0x290
? init_module_from_file+0xd1/0x130
init_module_from_file+0xd1/0x130
? __ia32_sys_init_module+0xa0/0xa0
? lock_acquire+0x2d/0xb0
? idempotent_init_module+0x10d/0x780
? do_raw_spin_unlock+0x54/0x220
idempotent_init_module+0x21d/0x780
? init_module_from_file+0x130/0x130
? __fget_files+0x1a5/0x2d0
__x64_sys_finit_module+0xbe/0x130
do_syscall_64+0x69/0x160
entry_SYSCALL_64_after_hwframe+0x4b/0x53
RIP: 0033:0x7fc254e0827d
Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
RSP: 002b:00007ffe1750a718 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
RAX: ffffffffffffffda RBX: 000055da274c5500 RCX: 00007fc254e0827d
RDX: 0000000000000000 RSI: 00007fc254f6e43c RDI: 0000000000000006
RBP: 00007fc254f6e43c R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
R13: 000055da2749cbb0 R14: 0000000000000000 R15: 000055da274c44a0
</TASK>
Modules linked in:
CR2: fffffbfff4041000
---[ end trace 0000000000000000 ]---
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-06 22:43 ` Ben Greear
@ 2024-12-06 22:55 ` Suren Baghdasaryan
2024-12-07 0:15 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-06 22:55 UTC (permalink / raw)
To: Ben Greear; +Cc: LKML
On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
>
> On 12/6/24 14:03, Suren Baghdasaryan wrote:
> > On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >>
> >> Hello Suren,
> >>
> >> My system crashes on bootup, and I bisected to this commit.
> >>
> >> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> >> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> >> Author: Suren Baghdasaryan <surenb@google.com>
> >> Date: Wed Oct 23 10:07:57 2024 -0700
> >>
> >> alloc_tag: populate memory for module tags as needed
> >>
> >> The memory reserved for module tags does not need to be backed by physical
> >> pages until there are tags to store there. Change the way we reserve this
> >> memory to allocate only virtual area for the tags and populate it with
> >> physical pages as needed when we load a module.
> >>
> >> The crash looks like this:
> >>
> >> BUG: unable to handle page fault for address: fffffbfff4041000
> >> #PF: supervisor read access in kernel mode
> >> #PF: error_code(0x0000) - not-present page
> >> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> >> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> >> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> >> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> >> RIP: 0010:kasan_check_range+0xa5/0x190
> >> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> >> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> >> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> >> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> >> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> >> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> >> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> >> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> >> dev Devices.
> >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >> Call Trace:
> >> <TASK>
> >> [ OK ? __die+0x1f/0x60
> >> 0m] Reached targ ? page_fault_oops+0x258/0x910
> >> et sysi ? dump_pagetable+0x690/0x690
> >> nit.target - ? search_bpf_extables+0x22/0x250
> >> System Initiali ? trace_page_fault_kernel+0x120/0x120
> >> zation.
> >> ? search_bpf_extables+0x164/0x250
> >> ? kasan_check_range+0xa5/0x190
> >> ? fixup_exception+0x4d/0xc70
> >> ? exc_page_fault+0xe1/0xf0
> >> [ OK ? asm_exc_page_fault+0x22/0x30
> >> 0m] Reached targ ? load_module+0x3d7b/0x7560
> >> et netw ? kasan_check_range+0xa5/0x190
> >> ork.target - __asan_memcpy+0x38/0x60
> >> Network.
> >> load_module+0x3d7b/0x7560
> >> ? module_frob_arch_sections+0x30/0x30
> >> ? lockdep_lock+0xbe/0x1b0
> >> ? rw_verify_area+0x18d/0x5e0
> >> ? kernel_read_file+0x246/0x870
> >> ? __x64_sys_fspick+0x290/0x290
> >> ? init_module_from_file+0xd1/0x130
> >> init_module_from_file+0xd1/0x130
> >> ? __ia32_sys_init_module+0xa0/0xa0
> >> ? lock_acquire+0x2d/0xb0
> >> ? idempotent_init_module+0x116/0x790
> >> ? do_raw_spin_unlock+0x54/0x220
> >> idempotent_init_module+0x226/0x790
> >> ? init_module_from_file+0x130/0x130
> >> ? vm_mmap_pgoff+0x203/0x2e0
> >> __x64_sys_finit_module+0xba/0x130
> >> do_syscall_64+0x69/0x160
> >> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >> RIP: 0033:0x7fe869de327d
> >> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> >> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> >> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> >> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> >> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> >> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> >> </TASK>
> >> Modules linked in:
> >> CR2: fffffbfff4041000
> >> ---[ end trace 0000000000000000 ]---
> >>
> >> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> >> is found here:
> >>
> >> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
> >>
> >> I will be happy to test fixes.
> >
> > Hi Ben,
> > Thanks for reporting the issue. Do you have these recent fixes in your tree:
> >
> > https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> > https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
> >
> > If not, couple you please apply them and see if the issue is still happening?
> > Thanks,
> > Suren.
>
> Hello Suren,
>
> Thanks for the quick response. The first patch is already in latest Linus tree,
> and I applied the second one manually. The kernel still crashes:
Ok, I'll try to reproduce and troubleshoot it. Thanks!
>
> (gdb) l *(load_module+0x3c46)
> 0xffffffff814f30d6 is in load_module (/home/greearb/git/linux-2.6/kernel/module/main.c:2624).
> 2619 if (i == info->index.mod &&
> 2620 (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
> 2621 ret = -ENOEXEC;
> 2622 goto out_err;
> 2623 }
> 2624 memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> 2625 }
> 2626 /*
> 2627 * Update the userspace copy's ELF section address to point to
> 2628 * our newly allocated memory as a pure convenience so that
> (gdb)
>
> (gdb) l *(__asan_memcpy+0x38)
> 0xffffffff81b39268 is in __asan_memcpy (/home/greearb/git/linux-2.6/mm/kasan/shadow.c:105).
> 100 EXPORT_SYMBOL(__asan_memmove);
> 101 #endif
> 102
> 103 void *__asan_memcpy(void *dest, const void *src, ssize_t len)
> 104 {
> 105 if (!kasan_check_range(src, len, false, _RET_IP_) ||
> 106 !kasan_check_range(dest, len, true, _RET_IP_))
> 107 return NULL;
> 108
> 109 return __memcpy(dest, src, len);
> (gdb)
>
> (gdb) l *(kasan_check_range+0xa5)
> 0xffffffff81b386f5 is in kasan_check_range (/home/greearb/git/linux-2.6/mm/kasan/generic.c:116).
> 111 start += prefix;
> 112 }
> 113
> 114 words = (end - start) / 8;
> 115 while (words) {
> 116 if (unlikely(*(u64 *)start))
> 117 return bytes_is_nonzero(start, 8);
> 118 start += 8;
> 119 words--;
> 120 }
> (gdb)
>
>
> BUG: unable to handle page fault for address: fffffbfff4041000
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb3d067 PTE 0
> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> CPU: 7 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.13.0-rc1+ #24
> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> RIP: 0010:kasan_check_range+0xa5/0x190
> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> RSP: 0018:ffff88812d3ef980 EFLAGS: 00010206
> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814f30d6
> RDX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> [ OK BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> 0m] Finished 10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> ;1;39msystemd-udR13: ffffc90000dac930 R14: ffffc90000dac950 R15: dffffc0000000000
> ev-trig…e FS: 00007fc25423bb40(0000) GS:ffff88841dd80000(0000) knlGS:0000000000000000
> - Coldplug All uCS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: fffffbfff4041000 CR3: 0000000128ef4003 CR4: 00000000003706f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
> <TASK>
> ? __die+0x1f/0x60
> ? page_fault_oops+0x258/0x910
> ? dump_pagetable+0x690/0x690
> ? search_bpf_extables+0x22/0x250
> ? show_ldttss+0x230/0x230
> ? search_bpf_extables+0x164/0x250
> ? kasan_check_range+0xa5/0x190
> ? fixup_exception+0x4d/0xc70
> ? exc_page_fault+0xe1/0xf0
> ? asm_exc_page_fault+0x22/0x30
> ? load_module+0x3c46/0x8680
> ? kasan_check_range+0xa5/0x190
> __asan_memcpy+0x38/0x60
> load_module+0x3c46/0x8680
> ? __kernel_read+0x270/0x9f0
> ? module_frob_arch_sections+0x30/0x30
> ? lockdep_lock+0xbe/0x1b0
> ? rw_verify_area+0x18d/0x5e0
> ? kernel_read_file+0x246/0x870
> ? __x64_sys_fspick+0x290/0x290
> ? init_module_from_file+0xd1/0x130
> init_module_from_file+0xd1/0x130
> ? __ia32_sys_init_module+0xa0/0xa0
> ? lock_acquire+0x2d/0xb0
> ? idempotent_init_module+0x10d/0x780
> ? do_raw_spin_unlock+0x54/0x220
> idempotent_init_module+0x21d/0x780
> ? init_module_from_file+0x130/0x130
> ? __fget_files+0x1a5/0x2d0
> __x64_sys_finit_module+0xbe/0x130
> do_syscall_64+0x69/0x160
> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> RIP: 0033:0x7fc254e0827d
> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> RSP: 002b:00007ffe1750a718 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> RAX: ffffffffffffffda RBX: 000055da274c5500 RCX: 00007fc254e0827d
> RDX: 0000000000000000 RSI: 00007fc254f6e43c RDI: 0000000000000006
> RBP: 00007fc254f6e43c R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> R13: 000055da2749cbb0 R14: 0000000000000000 R15: 000055da274c44a0
> </TASK>
> Modules linked in:
> CR2: fffffbfff4041000
> ---[ end trace 0000000000000000 ]---
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc http://www.candelatech.com
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-06 22:55 ` Suren Baghdasaryan
@ 2024-12-07 0:15 ` Suren Baghdasaryan
2024-12-07 0:50 ` Ben Greear
0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-07 0:15 UTC (permalink / raw)
To: Ben Greear; +Cc: LKML
On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
> >
> > On 12/6/24 14:03, Suren Baghdasaryan wrote:
> > > On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
> > >>
> > >> Hello Suren,
> > >>
> > >> My system crashes on bootup, and I bisected to this commit.
> > >>
> > >> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> > >> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> > >> Author: Suren Baghdasaryan <surenb@google.com>
> > >> Date: Wed Oct 23 10:07:57 2024 -0700
> > >>
> > >> alloc_tag: populate memory for module tags as needed
> > >>
> > >> The memory reserved for module tags does not need to be backed by physical
> > >> pages until there are tags to store there. Change the way we reserve this
> > >> memory to allocate only virtual area for the tags and populate it with
> > >> physical pages as needed when we load a module.
> > >>
> > >> The crash looks like this:
> > >>
> > >> BUG: unable to handle page fault for address: fffffbfff4041000
> > >> #PF: supervisor read access in kernel mode
> > >> #PF: error_code(0x0000) - not-present page
> > >> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> > >> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> > >> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> > >> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> > >> RIP: 0010:kasan_check_range+0xa5/0x190
> > >> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> > >> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> > >> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> > >> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> > >> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> > >> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> > >> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> > >> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> > >> dev Devices.
> > >> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> > >> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >> Call Trace:
> > >> <TASK>
> > >> [ OK ? __die+0x1f/0x60
> > >> 0m] Reached targ ? page_fault_oops+0x258/0x910
> > >> et sysi ? dump_pagetable+0x690/0x690
> > >> nit.target - ? search_bpf_extables+0x22/0x250
> > >> System Initiali ? trace_page_fault_kernel+0x120/0x120
> > >> zation.
> > >> ? search_bpf_extables+0x164/0x250
> > >> ? kasan_check_range+0xa5/0x190
> > >> ? fixup_exception+0x4d/0xc70
> > >> ? exc_page_fault+0xe1/0xf0
> > >> [ OK ? asm_exc_page_fault+0x22/0x30
> > >> 0m] Reached targ ? load_module+0x3d7b/0x7560
> > >> et netw ? kasan_check_range+0xa5/0x190
> > >> ork.target - __asan_memcpy+0x38/0x60
> > >> Network.
> > >> load_module+0x3d7b/0x7560
> > >> ? module_frob_arch_sections+0x30/0x30
> > >> ? lockdep_lock+0xbe/0x1b0
> > >> ? rw_verify_area+0x18d/0x5e0
> > >> ? kernel_read_file+0x246/0x870
> > >> ? __x64_sys_fspick+0x290/0x290
> > >> ? init_module_from_file+0xd1/0x130
> > >> init_module_from_file+0xd1/0x130
> > >> ? __ia32_sys_init_module+0xa0/0xa0
> > >> ? lock_acquire+0x2d/0xb0
> > >> ? idempotent_init_module+0x116/0x790
> > >> ? do_raw_spin_unlock+0x54/0x220
> > >> idempotent_init_module+0x226/0x790
> > >> ? init_module_from_file+0x130/0x130
> > >> ? vm_mmap_pgoff+0x203/0x2e0
> > >> __x64_sys_finit_module+0xba/0x130
> > >> do_syscall_64+0x69/0x160
> > >> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > >> RIP: 0033:0x7fe869de327d
> > >> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> > >> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > >> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> > >> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> > >> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> > >> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> > >> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> > >> </TASK>
> > >> Modules linked in:
> > >> CR2: fffffbfff4041000
> > >> ---[ end trace 0000000000000000 ]---
> > >>
> > >> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> > >> is found here:
> > >>
> > >> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
> > >>
> > >> I will be happy to test fixes.
> > >
> > > Hi Ben,
> > > Thanks for reporting the issue. Do you have these recent fixes in your tree:
> > >
> > > https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> > > https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
> > >
> > > If not, couple you please apply them and see if the issue is still happening?
> > > Thanks,
> > > Suren.
> >
> > Hello Suren,
> >
> > Thanks for the quick response. The first patch is already in latest Linus tree,
Hmm. Could you please double-check which tree you are using? I don't
see the first patch
(https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
in Linus' tree. Maybe you are using linux-next?
> > and I applied the second one manually. The kernel still crashes:
>
> Ok, I'll try to reproduce and troubleshoot it. Thanks!
>
> >
> > (gdb) l *(load_module+0x3c46)
> > 0xffffffff814f30d6 is in load_module (/home/greearb/git/linux-2.6/kernel/module/main.c:2624).
> > 2619 if (i == info->index.mod &&
> > 2620 (WARN_ON_ONCE(shdr->sh_size != sizeof(struct module)))) {
> > 2621 ret = -ENOEXEC;
> > 2622 goto out_err;
> > 2623 }
> > 2624 memcpy(dest, (void *)shdr->sh_addr, shdr->sh_size);
> > 2625 }
> > 2626 /*
> > 2627 * Update the userspace copy's ELF section address to point to
> > 2628 * our newly allocated memory as a pure convenience so that
> > (gdb)
> >
> > (gdb) l *(__asan_memcpy+0x38)
> > 0xffffffff81b39268 is in __asan_memcpy (/home/greearb/git/linux-2.6/mm/kasan/shadow.c:105).
> > 100 EXPORT_SYMBOL(__asan_memmove);
> > 101 #endif
> > 102
> > 103 void *__asan_memcpy(void *dest, const void *src, ssize_t len)
> > 104 {
> > 105 if (!kasan_check_range(src, len, false, _RET_IP_) ||
> > 106 !kasan_check_range(dest, len, true, _RET_IP_))
> > 107 return NULL;
> > 108
> > 109 return __memcpy(dest, src, len);
> > (gdb)
> >
> > (gdb) l *(kasan_check_range+0xa5)
> > 0xffffffff81b386f5 is in kasan_check_range (/home/greearb/git/linux-2.6/mm/kasan/generic.c:116).
> > 111 start += prefix;
> > 112 }
> > 113
> > 114 words = (end - start) / 8;
> > 115 while (words) {
> > 116 if (unlikely(*(u64 *)start))
> > 117 return bytes_is_nonzero(start, 8);
> > 118 start += 8;
> > 119 words--;
> > 120 }
> > (gdb)
> >
> >
> > BUG: unable to handle page fault for address: fffffbfff4041000
> > #PF: supervisor read access in kernel mode
> > #PF: error_code(0x0000) - not-present page
> > PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb3d067 PTE 0
> > Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> > CPU: 7 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.13.0-rc1+ #24
> > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> > RIP: 0010:kasan_check_range+0xa5/0x190
> > Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> > RSP: 0018:ffff88812d3ef980 EFLAGS: 00010206
> > RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814f30d6
> > RDX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> > [ OK BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> > 0m] Finished 10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> > ;1;39msystemd-udR13: ffffc90000dac930 R14: ffffc90000dac950 R15: dffffc0000000000
> > ev-trig…e FS: 00007fc25423bb40(0000) GS:ffff88841dd80000(0000) knlGS:0000000000000000
> > - Coldplug All uCS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: fffffbfff4041000 CR3: 0000000128ef4003 CR4: 00000000003706f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > ? __die+0x1f/0x60
> > ? page_fault_oops+0x258/0x910
> > ? dump_pagetable+0x690/0x690
> > ? search_bpf_extables+0x22/0x250
> > ? show_ldttss+0x230/0x230
> > ? search_bpf_extables+0x164/0x250
> > ? kasan_check_range+0xa5/0x190
> > ? fixup_exception+0x4d/0xc70
> > ? exc_page_fault+0xe1/0xf0
> > ? asm_exc_page_fault+0x22/0x30
> > ? load_module+0x3c46/0x8680
> > ? kasan_check_range+0xa5/0x190
> > __asan_memcpy+0x38/0x60
> > load_module+0x3c46/0x8680
> > ? __kernel_read+0x270/0x9f0
> > ? module_frob_arch_sections+0x30/0x30
> > ? lockdep_lock+0xbe/0x1b0
> > ? rw_verify_area+0x18d/0x5e0
> > ? kernel_read_file+0x246/0x870
> > ? __x64_sys_fspick+0x290/0x290
> > ? init_module_from_file+0xd1/0x130
> > init_module_from_file+0xd1/0x130
> > ? __ia32_sys_init_module+0xa0/0xa0
> > ? lock_acquire+0x2d/0xb0
> > ? idempotent_init_module+0x10d/0x780
> > ? do_raw_spin_unlock+0x54/0x220
> > idempotent_init_module+0x21d/0x780
> > ? init_module_from_file+0x130/0x130
> > ? __fget_files+0x1a5/0x2d0
> > __x64_sys_finit_module+0xbe/0x130
> > do_syscall_64+0x69/0x160
> > entry_SYSCALL_64_after_hwframe+0x4b/0x53
> > RIP: 0033:0x7fc254e0827d
> > Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> > RSP: 002b:00007ffe1750a718 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > RAX: ffffffffffffffda RBX: 000055da274c5500 RCX: 00007fc254e0827d
> > RDX: 0000000000000000 RSI: 00007fc254f6e43c RDI: 0000000000000006
> > RBP: 00007fc254f6e43c R08: 0000000000000000 R09: 0000000000000000
> > R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> > R13: 000055da2749cbb0 R14: 0000000000000000 R15: 000055da274c44a0
> > </TASK>
> > Modules linked in:
> > CR2: fffffbfff4041000
> > ---[ end trace 0000000000000000 ]---
> >
> > Thanks,
> > Ben
> >
> > --
> > Ben Greear <greearb@candelatech.com>
> > Candela Technologies Inc http://www.candelatech.com
> >
> >
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-07 0:15 ` Suren Baghdasaryan
@ 2024-12-07 0:50 ` Ben Greear
2024-12-07 1:27 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2024-12-07 0:50 UTC (permalink / raw)
To: Suren Baghdasaryan; +Cc: LKML
On 12/6/24 16:15, Suren Baghdasaryan wrote:
> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>
>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
>>>
>>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>
>>>>> Hello Suren,
>>>>>
>>>>> My system crashes on bootup, and I bisected to this commit.
>>>>>
>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
>>>>> Author: Suren Baghdasaryan <surenb@google.com>
>>>>> Date: Wed Oct 23 10:07:57 2024 -0700
>>>>>
>>>>> alloc_tag: populate memory for module tags as needed
>>>>>
>>>>> The memory reserved for module tags does not need to be backed by physical
>>>>> pages until there are tags to store there. Change the way we reserve this
>>>>> memory to allocate only virtual area for the tags and populate it with
>>>>> physical pages as needed when we load a module.
>>>>>
>>>>> The crash looks like this:
>>>>>
>>>>> BUG: unable to handle page fault for address: fffffbfff4041000
>>>>> #PF: supervisor read access in kernel mode
>>>>> #PF: error_code(0x0000) - not-present page
>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
>>>>> RIP: 0010:kasan_check_range+0xa5/0x190
>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
>>>>> dev Devices.
>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>> Call Trace:
>>>>> <TASK>
>>>>> [ OK ? __die+0x1f/0x60
>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
>>>>> et sysi ? dump_pagetable+0x690/0x690
>>>>> nit.target - ? search_bpf_extables+0x22/0x250
>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
>>>>> zation.
>>>>> ? search_bpf_extables+0x164/0x250
>>>>> ? kasan_check_range+0xa5/0x190
>>>>> ? fixup_exception+0x4d/0xc70
>>>>> ? exc_page_fault+0xe1/0xf0
>>>>> [ OK ? asm_exc_page_fault+0x22/0x30
>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
>>>>> et netw ? kasan_check_range+0xa5/0x190
>>>>> ork.target - __asan_memcpy+0x38/0x60
>>>>> Network.
>>>>> load_module+0x3d7b/0x7560
>>>>> ? module_frob_arch_sections+0x30/0x30
>>>>> ? lockdep_lock+0xbe/0x1b0
>>>>> ? rw_verify_area+0x18d/0x5e0
>>>>> ? kernel_read_file+0x246/0x870
>>>>> ? __x64_sys_fspick+0x290/0x290
>>>>> ? init_module_from_file+0xd1/0x130
>>>>> init_module_from_file+0xd1/0x130
>>>>> ? __ia32_sys_init_module+0xa0/0xa0
>>>>> ? lock_acquire+0x2d/0xb0
>>>>> ? idempotent_init_module+0x116/0x790
>>>>> ? do_raw_spin_unlock+0x54/0x220
>>>>> idempotent_init_module+0x226/0x790
>>>>> ? init_module_from_file+0x130/0x130
>>>>> ? vm_mmap_pgoff+0x203/0x2e0
>>>>> __x64_sys_finit_module+0xba/0x130
>>>>> do_syscall_64+0x69/0x160
>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>> RIP: 0033:0x7fe869de327d
>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
>>>>> </TASK>
>>>>> Modules linked in:
>>>>> CR2: fffffbfff4041000
>>>>> ---[ end trace 0000000000000000 ]---
>>>>>
>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
>>>>> is found here:
>>>>>
>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
>>>>>
>>>>> I will be happy to test fixes.
>>>>
>>>> Hi Ben,
>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
>>>>
>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
>>>>
>>>> If not, couple you please apply them and see if the issue is still happening?
>>>> Thanks,
>>>> Suren.
>>>
>>> Hello Suren,
>>>
>>> Thanks for the quick response. The first patch is already in latest Linus tree,
>
> Hmm. Could you please double-check which tree you are using? I don't
> see the first patch
> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
> in Linus' tree. Maybe you are using linux-next?
Sorry, you are correct. I must have mangled something when trying to apply
the patch and I didn't look hard enough when patch said changes were already applied.
I can re-test this next week...and for reference, kernel boots fine when you disable
KASAN and other debugging.
Thanks,
Ben
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-07 0:50 ` Ben Greear
@ 2024-12-07 1:27 ` Suren Baghdasaryan
2024-12-09 12:47 ` Hao Ge
0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-07 1:27 UTC (permalink / raw)
To: Ben Greear; +Cc: LKML
On Fri, Dec 6, 2024 at 4:50 PM Ben Greear <greearb@candelatech.com> wrote:
>
> On 12/6/24 16:15, Suren Baghdasaryan wrote:
> > On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>
> >> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>
> >>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
> >>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>
> >>>>> Hello Suren,
> >>>>>
> >>>>> My system crashes on bootup, and I bisected to this commit.
> >>>>>
> >>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> >>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> >>>>> Author: Suren Baghdasaryan <surenb@google.com>
> >>>>> Date: Wed Oct 23 10:07:57 2024 -0700
> >>>>>
> >>>>> alloc_tag: populate memory for module tags as needed
> >>>>>
> >>>>> The memory reserved for module tags does not need to be backed by physical
> >>>>> pages until there are tags to store there. Change the way we reserve this
> >>>>> memory to allocate only virtual area for the tags and populate it with
> >>>>> physical pages as needed when we load a module.
> >>>>>
> >>>>> The crash looks like this:
> >>>>>
> >>>>> BUG: unable to handle page fault for address: fffffbfff4041000
> >>>>> #PF: supervisor read access in kernel mode
> >>>>> #PF: error_code(0x0000) - not-present page
> >>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> >>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> >>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> >>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> >>>>> RIP: 0010:kasan_check_range+0xa5/0x190
> >>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> >>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> >>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> >>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> >>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> >>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> >>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> >>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> >>>>> dev Devices.
> >>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>> Call Trace:
> >>>>> <TASK>
> >>>>> [ OK ? __die+0x1f/0x60
> >>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
> >>>>> et sysi ? dump_pagetable+0x690/0x690
> >>>>> nit.target - ? search_bpf_extables+0x22/0x250
> >>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
> >>>>> zation.
> >>>>> ? search_bpf_extables+0x164/0x250
> >>>>> ? kasan_check_range+0xa5/0x190
> >>>>> ? fixup_exception+0x4d/0xc70
> >>>>> ? exc_page_fault+0xe1/0xf0
> >>>>> [ OK ? asm_exc_page_fault+0x22/0x30
> >>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
> >>>>> et netw ? kasan_check_range+0xa5/0x190
> >>>>> ork.target - __asan_memcpy+0x38/0x60
> >>>>> Network.
> >>>>> load_module+0x3d7b/0x7560
> >>>>> ? module_frob_arch_sections+0x30/0x30
> >>>>> ? lockdep_lock+0xbe/0x1b0
> >>>>> ? rw_verify_area+0x18d/0x5e0
> >>>>> ? kernel_read_file+0x246/0x870
> >>>>> ? __x64_sys_fspick+0x290/0x290
> >>>>> ? init_module_from_file+0xd1/0x130
> >>>>> init_module_from_file+0xd1/0x130
> >>>>> ? __ia32_sys_init_module+0xa0/0xa0
> >>>>> ? lock_acquire+0x2d/0xb0
> >>>>> ? idempotent_init_module+0x116/0x790
> >>>>> ? do_raw_spin_unlock+0x54/0x220
> >>>>> idempotent_init_module+0x226/0x790
> >>>>> ? init_module_from_file+0x130/0x130
> >>>>> ? vm_mmap_pgoff+0x203/0x2e0
> >>>>> __x64_sys_finit_module+0xba/0x130
> >>>>> do_syscall_64+0x69/0x160
> >>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >>>>> RIP: 0033:0x7fe869de327d
> >>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> >>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> >>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> >>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> >>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> >>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> >>>>> </TASK>
> >>>>> Modules linked in:
> >>>>> CR2: fffffbfff4041000
> >>>>> ---[ end trace 0000000000000000 ]---
> >>>>>
> >>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> >>>>> is found here:
> >>>>>
> >>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
> >>>>>
> >>>>> I will be happy to test fixes.
> >>>>
> >>>> Hi Ben,
> >>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
> >>>>
> >>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> >>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
> >>>>
> >>>> If not, couple you please apply them and see if the issue is still happening?
> >>>> Thanks,
> >>>> Suren.
> >>>
> >>> Hello Suren,
> >>>
> >>> Thanks for the quick response. The first patch is already in latest Linus tree,
> >
> > Hmm. Could you please double-check which tree you are using? I don't
> > see the first patch
> > (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
> > in Linus' tree. Maybe you are using linux-next?
>
> Sorry, you are correct. I must have mangled something when trying to apply
> the patch and I didn't look hard enough when patch said changes were already applied.
>
> I can re-test this next week...and for reference, kernel boots fine when you disable
> KASAN and other debugging.
Thanks! Please retest with this patch and let me know if you are still
having issues.
Suren.
>
> Thanks,
> Ben
>
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc http://www.candelatech.com
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-07 1:27 ` Suren Baghdasaryan
@ 2024-12-09 12:47 ` Hao Ge
2024-12-09 22:33 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Hao Ge @ 2024-12-09 12:47 UTC (permalink / raw)
To: Suren Baghdasaryan, Ben Greear; +Cc: LKML
Hi Suren
On 12/7/24 09:27, Suren Baghdasaryan wrote:
> On Fri, Dec 6, 2024 at 4:50 PM Ben Greear <greearb@candelatech.com> wrote:
>> On 12/6/24 16:15, Suren Baghdasaryan wrote:
>>> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
>>>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>> Hello Suren,
>>>>>>>
>>>>>>> My system crashes on bootup, and I bisected to this commit.
>>>>>>>
>>>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
>>>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
>>>>>>> Author: Suren Baghdasaryan <surenb@google.com>
>>>>>>> Date: Wed Oct 23 10:07:57 2024 -0700
>>>>>>>
>>>>>>> alloc_tag: populate memory for module tags as needed
>>>>>>>
>>>>>>> The memory reserved for module tags does not need to be backed by physical
>>>>>>> pages until there are tags to store there. Change the way we reserve this
>>>>>>> memory to allocate only virtual area for the tags and populate it with
>>>>>>> physical pages as needed when we load a module.
>>>>>>>
>>>>>>> The crash looks like this:
>>>>>>>
>>>>>>> BUG: unable to handle page fault for address: fffffbfff4041000
>>>>>>> #PF: supervisor read access in kernel mode
>>>>>>> #PF: error_code(0x0000) - not-present page
>>>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
>>>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
>>>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
>>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
>>>>>>> RIP: 0010:kasan_check_range+0xa5/0x190
>>>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
>>>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
>>>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
>>>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
>>>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
>>>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
>>>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
>>>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
>>>>>>> dev Devices.
>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>> Call Trace:
>>>>>>> <TASK>
>>>>>>> [ OK ? __die+0x1f/0x60
>>>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
>>>>>>> et sysi ? dump_pagetable+0x690/0x690
>>>>>>> nit.target - ? search_bpf_extables+0x22/0x250
>>>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
>>>>>>> zation.
>>>>>>> ? search_bpf_extables+0x164/0x250
>>>>>>> ? kasan_check_range+0xa5/0x190
>>>>>>> ? fixup_exception+0x4d/0xc70
>>>>>>> ? exc_page_fault+0xe1/0xf0
>>>>>>> [ OK ? asm_exc_page_fault+0x22/0x30
>>>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
>>>>>>> et netw ? kasan_check_range+0xa5/0x190
>>>>>>> ork.target - __asan_memcpy+0x38/0x60
>>>>>>> Network.
>>>>>>> load_module+0x3d7b/0x7560
>>>>>>> ? module_frob_arch_sections+0x30/0x30
>>>>>>> ? lockdep_lock+0xbe/0x1b0
>>>>>>> ? rw_verify_area+0x18d/0x5e0
>>>>>>> ? kernel_read_file+0x246/0x870
>>>>>>> ? __x64_sys_fspick+0x290/0x290
>>>>>>> ? init_module_from_file+0xd1/0x130
>>>>>>> init_module_from_file+0xd1/0x130
>>>>>>> ? __ia32_sys_init_module+0xa0/0xa0
>>>>>>> ? lock_acquire+0x2d/0xb0
>>>>>>> ? idempotent_init_module+0x116/0x790
>>>>>>> ? do_raw_spin_unlock+0x54/0x220
>>>>>>> idempotent_init_module+0x226/0x790
>>>>>>> ? init_module_from_file+0x130/0x130
>>>>>>> ? vm_mmap_pgoff+0x203/0x2e0
>>>>>>> __x64_sys_finit_module+0xba/0x130
>>>>>>> do_syscall_64+0x69/0x160
>>>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>>>> RIP: 0033:0x7fe869de327d
>>>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
>>>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
>>>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
>>>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
>>>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
>>>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
>>>>>>> </TASK>
>>>>>>> Modules linked in:
>>>>>>> CR2: fffffbfff4041000
>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>
>>>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
>>>>>>> is found here:
>>>>>>>
>>>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
>>>>>>>
>>>>>>> I will be happy to test fixes.
>>>>>> Hi Ben,
>>>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
>>>>>>
>>>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
>>>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
>>>>>>
>>>>>> If not, couple you please apply them and see if the issue is still happening?
>>>>>> Thanks,
>>>>>> Suren.
>>>>> Hello Suren,
>>>>>
>>>>> Thanks for the quick response. The first patch is already in latest Linus tree,
>>> Hmm. Could you please double-check which tree you are using? I don't
>>> see the first patch
>>> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
>>> in Linus' tree. Maybe you are using linux-next?
>> Sorry, you are correct. I must have mangled something when trying to apply
>> the patch and I didn't look hard enough when patch said changes were already applied.
>>
>> I can re-test this next week...and for reference, kernel boots fine when you disable
>> KASAN and other debugging.
> Thanks! Please retest with this patch and let me know if you are still
> having issues.
> Suren.
Indeed, this is a bug that still exists in another context, namely when
CONFIG_KASAN_VMALLOC is not enabled.
We may need to look into this scenario next.
Thanks
Best Regards
Hao
>> Thanks,
>> Ben
>>
>>
>> --
>> Ben Greear <greearb@candelatech.com>
>> Candela Technologies Inc http://www.candelatech.com
>>
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-09 12:47 ` Hao Ge
@ 2024-12-09 22:33 ` Suren Baghdasaryan
2024-12-10 4:20 ` Hao Ge
0 siblings, 1 reply; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-09 22:33 UTC (permalink / raw)
To: Hao Ge; +Cc: Ben Greear, LKML
On Mon, Dec 9, 2024 at 4:48 AM Hao Ge <hao.ge@linux.dev> wrote:
>
> Hi Suren
>
>
> On 12/7/24 09:27, Suren Baghdasaryan wrote:
> > On Fri, Dec 6, 2024 at 4:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >> On 12/6/24 16:15, Suren Baghdasaryan wrote:
> >>> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
> >>>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>> Hello Suren,
> >>>>>>>
> >>>>>>> My system crashes on bootup, and I bisected to this commit.
> >>>>>>>
> >>>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> >>>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> >>>>>>> Author: Suren Baghdasaryan <surenb@google.com>
> >>>>>>> Date: Wed Oct 23 10:07:57 2024 -0700
> >>>>>>>
> >>>>>>> alloc_tag: populate memory for module tags as needed
> >>>>>>>
> >>>>>>> The memory reserved for module tags does not need to be backed by physical
> >>>>>>> pages until there are tags to store there. Change the way we reserve this
> >>>>>>> memory to allocate only virtual area for the tags and populate it with
> >>>>>>> physical pages as needed when we load a module.
> >>>>>>>
> >>>>>>> The crash looks like this:
> >>>>>>>
> >>>>>>> BUG: unable to handle page fault for address: fffffbfff4041000
> >>>>>>> #PF: supervisor read access in kernel mode
> >>>>>>> #PF: error_code(0x0000) - not-present page
> >>>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> >>>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> >>>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> >>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> >>>>>>> RIP: 0010:kasan_check_range+0xa5/0x190
> >>>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> >>>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> >>>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> >>>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> >>>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> >>>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> >>>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> >>>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> >>>>>>> dev Devices.
> >>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> >>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>> Call Trace:
> >>>>>>> <TASK>
> >>>>>>> [ OK ? __die+0x1f/0x60
> >>>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
> >>>>>>> et sysi ? dump_pagetable+0x690/0x690
> >>>>>>> nit.target - ? search_bpf_extables+0x22/0x250
> >>>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
> >>>>>>> zation.
> >>>>>>> ? search_bpf_extables+0x164/0x250
> >>>>>>> ? kasan_check_range+0xa5/0x190
> >>>>>>> ? fixup_exception+0x4d/0xc70
> >>>>>>> ? exc_page_fault+0xe1/0xf0
> >>>>>>> [ OK ? asm_exc_page_fault+0x22/0x30
> >>>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
> >>>>>>> et netw ? kasan_check_range+0xa5/0x190
> >>>>>>> ork.target - __asan_memcpy+0x38/0x60
> >>>>>>> Network.
> >>>>>>> load_module+0x3d7b/0x7560
> >>>>>>> ? module_frob_arch_sections+0x30/0x30
> >>>>>>> ? lockdep_lock+0xbe/0x1b0
> >>>>>>> ? rw_verify_area+0x18d/0x5e0
> >>>>>>> ? kernel_read_file+0x246/0x870
> >>>>>>> ? __x64_sys_fspick+0x290/0x290
> >>>>>>> ? init_module_from_file+0xd1/0x130
> >>>>>>> init_module_from_file+0xd1/0x130
> >>>>>>> ? __ia32_sys_init_module+0xa0/0xa0
> >>>>>>> ? lock_acquire+0x2d/0xb0
> >>>>>>> ? idempotent_init_module+0x116/0x790
> >>>>>>> ? do_raw_spin_unlock+0x54/0x220
> >>>>>>> idempotent_init_module+0x226/0x790
> >>>>>>> ? init_module_from_file+0x130/0x130
> >>>>>>> ? vm_mmap_pgoff+0x203/0x2e0
> >>>>>>> __x64_sys_finit_module+0xba/0x130
> >>>>>>> do_syscall_64+0x69/0x160
> >>>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >>>>>>> RIP: 0033:0x7fe869de327d
> >>>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> >>>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >>>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> >>>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> >>>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> >>>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> >>>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> >>>>>>> </TASK>
> >>>>>>> Modules linked in:
> >>>>>>> CR2: fffffbfff4041000
> >>>>>>> ---[ end trace 0000000000000000 ]---
> >>>>>>>
> >>>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> >>>>>>> is found here:
> >>>>>>>
> >>>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
> >>>>>>>
> >>>>>>> I will be happy to test fixes.
> >>>>>> Hi Ben,
> >>>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
> >>>>>>
> >>>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> >>>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
> >>>>>>
> >>>>>> If not, couple you please apply them and see if the issue is still happening?
> >>>>>> Thanks,
> >>>>>> Suren.
> >>>>> Hello Suren,
> >>>>>
> >>>>> Thanks for the quick response. The first patch is already in latest Linus tree,
> >>> Hmm. Could you please double-check which tree you are using? I don't
> >>> see the first patch
> >>> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
> >>> in Linus' tree. Maybe you are using linux-next?
> >> Sorry, you are correct. I must have mangled something when trying to apply
> >> the patch and I didn't look hard enough when patch said changes were already applied.
> >>
> >> I can re-test this next week...and for reference, kernel boots fine when you disable
> >> KASAN and other debugging.
> > Thanks! Please retest with this patch and let me know if you are still
> > having issues.
> > Suren.
>
> Indeed, this is a bug that still exists in another context, namely when
> CONFIG_KASAN_VMALLOC is not enabled.
Hmm. Are you able to reproduce this issue with all the fixes we had?
>
> We may need to look into this scenario next.
>
> Thanks
>
> Best Regards
>
> Hao
>
> >> Thanks,
> >> Ben
> >>
> >>
> >> --
> >> Ben Greear <greearb@candelatech.com>
> >> Candela Technologies Inc http://www.candelatech.com
> >>
> >>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-09 22:33 ` Suren Baghdasaryan
@ 2024-12-10 4:20 ` Hao Ge
2024-12-10 17:36 ` Suren Baghdasaryan
0 siblings, 1 reply; 11+ messages in thread
From: Hao Ge @ 2024-12-10 4:20 UTC (permalink / raw)
To: Suren Baghdasaryan, Ben Greear; +Cc: LKML
Hi Suren and Ben
On 12/10/24 06:33, Suren Baghdasaryan wrote:
> On Mon, Dec 9, 2024 at 4:48 AM Hao Ge <hao.ge@linux.dev> wrote:
>> Hi Suren
>>
>>
>> On 12/7/24 09:27, Suren Baghdasaryan wrote:
>>> On Fri, Dec 6, 2024 at 4:50 PM Ben Greear <greearb@candelatech.com> wrote:
>>>> On 12/6/24 16:15, Suren Baghdasaryan wrote:
>>>>> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
>>>>>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
>>>>>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
>>>>>>>>> Hello Suren,
>>>>>>>>>
>>>>>>>>> My system crashes on bootup, and I bisected to this commit.
>>>>>>>>>
>>>>>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
>>>>>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
>>>>>>>>> Author: Suren Baghdasaryan <surenb@google.com>
>>>>>>>>> Date: Wed Oct 23 10:07:57 2024 -0700
>>>>>>>>>
>>>>>>>>> alloc_tag: populate memory for module tags as needed
>>>>>>>>>
>>>>>>>>> The memory reserved for module tags does not need to be backed by physical
>>>>>>>>> pages until there are tags to store there. Change the way we reserve this
>>>>>>>>> memory to allocate only virtual area for the tags and populate it with
>>>>>>>>> physical pages as needed when we load a module.
>>>>>>>>>
>>>>>>>>> The crash looks like this:
>>>>>>>>>
>>>>>>>>> BUG: unable to handle page fault for address: fffffbfff4041000
>>>>>>>>> #PF: supervisor read access in kernel mode
>>>>>>>>> #PF: error_code(0x0000) - not-present page
>>>>>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
>>>>>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
>>>>>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
>>>>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
>>>>>>>>> RIP: 0010:kasan_check_range+0xa5/0x190
>>>>>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
>>>>>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
>>>>>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
>>>>>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
>>>>>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
>>>>>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
>>>>>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
>>>>>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
>>>>>>>>> dev Devices.
>>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>>>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
>>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
>>>>>>>>> Call Trace:
>>>>>>>>> <TASK>
>>>>>>>>> [ OK ? __die+0x1f/0x60
>>>>>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
>>>>>>>>> et sysi ? dump_pagetable+0x690/0x690
>>>>>>>>> nit.target - ? search_bpf_extables+0x22/0x250
>>>>>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
>>>>>>>>> zation.
>>>>>>>>> ? search_bpf_extables+0x164/0x250
>>>>>>>>> ? kasan_check_range+0xa5/0x190
>>>>>>>>> ? fixup_exception+0x4d/0xc70
>>>>>>>>> ? exc_page_fault+0xe1/0xf0
>>>>>>>>> [ OK ? asm_exc_page_fault+0x22/0x30
>>>>>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
>>>>>>>>> et netw ? kasan_check_range+0xa5/0x190
>>>>>>>>> ork.target - __asan_memcpy+0x38/0x60
>>>>>>>>> Network.
>>>>>>>>> load_module+0x3d7b/0x7560
>>>>>>>>> ? module_frob_arch_sections+0x30/0x30
>>>>>>>>> ? lockdep_lock+0xbe/0x1b0
>>>>>>>>> ? rw_verify_area+0x18d/0x5e0
>>>>>>>>> ? kernel_read_file+0x246/0x870
>>>>>>>>> ? __x64_sys_fspick+0x290/0x290
>>>>>>>>> ? init_module_from_file+0xd1/0x130
>>>>>>>>> init_module_from_file+0xd1/0x130
>>>>>>>>> ? __ia32_sys_init_module+0xa0/0xa0
>>>>>>>>> ? lock_acquire+0x2d/0xb0
>>>>>>>>> ? idempotent_init_module+0x116/0x790
>>>>>>>>> ? do_raw_spin_unlock+0x54/0x220
>>>>>>>>> idempotent_init_module+0x226/0x790
>>>>>>>>> ? init_module_from_file+0x130/0x130
>>>>>>>>> ? vm_mmap_pgoff+0x203/0x2e0
>>>>>>>>> __x64_sys_finit_module+0xba/0x130
>>>>>>>>> do_syscall_64+0x69/0x160
>>>>>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
>>>>>>>>> RIP: 0033:0x7fe869de327d
>>>>>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
>>>>>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
>>>>>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
>>>>>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
>>>>>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
>>>>>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
>>>>>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
>>>>>>>>> </TASK>
>>>>>>>>> Modules linked in:
>>>>>>>>> CR2: fffffbfff4041000
>>>>>>>>> ---[ end trace 0000000000000000 ]---
>>>>>>>>>
>>>>>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
>>>>>>>>> is found here:
>>>>>>>>>
>>>>>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
>>>>>>>>>
>>>>>>>>> I will be happy to test fixes.
>>>>>>>> Hi Ben,
>>>>>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
>>>>>>>>
>>>>>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
>>>>>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
>>>>>>>>
>>>>>>>> If not, couple you please apply them and see if the issue is still happening?
>>>>>>>> Thanks,
>>>>>>>> Suren.
>>>>>>> Hello Suren,
>>>>>>>
>>>>>>> Thanks for the quick response. The first patch is already in latest Linus tree,
>>>>> Hmm. Could you please double-check which tree you are using? I don't
>>>>> see the first patch
>>>>> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
>>>>> in Linus' tree. Maybe you are using linux-next?
>>>> Sorry, you are correct. I must have mangled something when trying to apply
>>>> the patch and I didn't look hard enough when patch said changes were already applied.
>>>>
>>>> I can re-test this next week...and for reference, kernel boots fine when you disable
>>>> KASAN and other debugging.
>>> Thanks! Please retest with this patch and let me know if you are still
>>> having issues.
>>> Suren.
>> Indeed, this is a bug that still exists in another context, namely when
>> CONFIG_KASAN_VMALLOC is not enabled.
> Hmm. Are you able to reproduce this issue with all the fixes we had?
Yes, I set up an x86 virtual machine and after porting both of our
patches, I encountered a reproduction of the issue.
I have also submitted a patch to fix this issue.
https://lore.kernel.org/all/20241210041515.765569-1-hao.ge@linux.dev/
I verified it locally.
Hi Ben
Can you test with this patch?
Thanks
Best Regards
Hao
>
>> We may need to look into this scenario next.
>>
>> Thanks
>>
>> Best Regards
>>
>> Hao
>>
>>>> Thanks,
>>>> Ben
>>>>
>>>>
>>>> --
>>>> Ben Greear <greearb@candelatech.com>
>>>> Candela Technologies Inc http://www.candelatech.com
>>>>
>>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot.
2024-12-10 4:20 ` Hao Ge
@ 2024-12-10 17:36 ` Suren Baghdasaryan
0 siblings, 0 replies; 11+ messages in thread
From: Suren Baghdasaryan @ 2024-12-10 17:36 UTC (permalink / raw)
To: Hao Ge; +Cc: Ben Greear, LKML
On Mon, Dec 9, 2024 at 8:21 PM Hao Ge <hao.ge@linux.dev> wrote:
>
> Hi Suren and Ben
>
>
> On 12/10/24 06:33, Suren Baghdasaryan wrote:
> > On Mon, Dec 9, 2024 at 4:48 AM Hao Ge <hao.ge@linux.dev> wrote:
> >> Hi Suren
> >>
> >>
> >> On 12/7/24 09:27, Suren Baghdasaryan wrote:
> >>> On Fri, Dec 6, 2024 at 4:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>> On 12/6/24 16:15, Suren Baghdasaryan wrote:
> >>>>> On Fri, Dec 6, 2024 at 2:55 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >>>>>> On Fri, Dec 6, 2024 at 2:43 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>> On 12/6/24 14:03, Suren Baghdasaryan wrote:
> >>>>>>>> On Fri, Dec 6, 2024 at 1:50 PM Ben Greear <greearb@candelatech.com> wrote:
> >>>>>>>>> Hello Suren,
> >>>>>>>>>
> >>>>>>>>> My system crashes on bootup, and I bisected to this commit.
> >>>>>>>>>
> >>>>>>>>> 0f9b685626daa2f8e19a9788625c9b624c223e45 is the first bad commit
> >>>>>>>>> commit 0f9b685626daa2f8e19a9788625c9b624c223e45
> >>>>>>>>> Author: Suren Baghdasaryan <surenb@google.com>
> >>>>>>>>> Date: Wed Oct 23 10:07:57 2024 -0700
> >>>>>>>>>
> >>>>>>>>> alloc_tag: populate memory for module tags as needed
> >>>>>>>>>
> >>>>>>>>> The memory reserved for module tags does not need to be backed by physical
> >>>>>>>>> pages until there are tags to store there. Change the way we reserve this
> >>>>>>>>> memory to allocate only virtual area for the tags and populate it with
> >>>>>>>>> physical pages as needed when we load a module.
> >>>>>>>>>
> >>>>>>>>> The crash looks like this:
> >>>>>>>>>
> >>>>>>>>> BUG: unable to handle page fault for address: fffffbfff4041000
> >>>>>>>>> #PF: supervisor read access in kernel mode
> >>>>>>>>> #PF: error_code(0x0000) - not-present page
> >>>>>>>>> PGD 44d0e7067 P4D 44d0e7067 PUD 44d0e3067 PMD 10bb38067 PTE 0
> >>>>>>>>> Oops: Oops: 0000 [#1] PREEMPT SMP KASAN
> >>>>>>>>> CPU: 0 UID: 0 PID: 319 Comm: systemd-udevd Not tainted 6.12.0-rc6+ #21
> >>>>>>>>> Hardware name: Default string Default string/SKYBAY, BIOS 5.12 02/15/2023
> >>>>>>>>> RIP: 0010:kasan_check_range+0xa5/0x190
> >>>>>>>>> Code: 8d 5a 07 4c 0f 49 da 49 c1 fb 03 45 85 db 0f 84 ce 00 00 00 45 89 db 4a 8d 14 d8 eb 0d 48 83 c0 08 48 39 d0 0f 84 b29
> >>>>>>>>> RSP: 0018:ffff88812c26f980 EFLAGS: 00010206
> >>>>>>>>> RAX: fffffbfff4041000 RBX: fffffbfff404101e RCX: ffffffff814ec29b
> >>>>>>>>> [ OK DX: fffffbfff4041018 RSI: 00000000000000f0 RDI: ffffffffa0208000
> >>>>>>>>> 0m] Finished BP: fffffbfff4041000 R08: 0000000000000001 R09: fffffbfff404101d
> >>>>>>>>> ;1;39msystemd-udR10: ffffffffa02080ef R11: 0000000000000003 R12: ffffffffa0208000
> >>>>>>>>> ev-trig…e R13: ffffc90000dac7c8 R14: ffffc90000dac7e8 R15: dffffc0000000000
> >>>>>>>>> - Coldplug All uFS: 00007fe869216b40(0000) GS:ffff88841da00000(0000) knlGS:0000000000000000
> >>>>>>>>> dev Devices.
> >>>>>>>>> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >>>>>>>>> CR2: fffffbfff4041000 CR3: 0000000121e86002 CR4: 00000000003706f0
> >>>>>>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> >>>>>>>>> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> >>>>>>>>> Call Trace:
> >>>>>>>>> <TASK>
> >>>>>>>>> [ OK ? __die+0x1f/0x60
> >>>>>>>>> 0m] Reached targ ? page_fault_oops+0x258/0x910
> >>>>>>>>> et sysi ? dump_pagetable+0x690/0x690
> >>>>>>>>> nit.target - ? search_bpf_extables+0x22/0x250
> >>>>>>>>> System Initiali ? trace_page_fault_kernel+0x120/0x120
> >>>>>>>>> zation.
> >>>>>>>>> ? search_bpf_extables+0x164/0x250
> >>>>>>>>> ? kasan_check_range+0xa5/0x190
> >>>>>>>>> ? fixup_exception+0x4d/0xc70
> >>>>>>>>> ? exc_page_fault+0xe1/0xf0
> >>>>>>>>> [ OK ? asm_exc_page_fault+0x22/0x30
> >>>>>>>>> 0m] Reached targ ? load_module+0x3d7b/0x7560
> >>>>>>>>> et netw ? kasan_check_range+0xa5/0x190
> >>>>>>>>> ork.target - __asan_memcpy+0x38/0x60
> >>>>>>>>> Network.
> >>>>>>>>> load_module+0x3d7b/0x7560
> >>>>>>>>> ? module_frob_arch_sections+0x30/0x30
> >>>>>>>>> ? lockdep_lock+0xbe/0x1b0
> >>>>>>>>> ? rw_verify_area+0x18d/0x5e0
> >>>>>>>>> ? kernel_read_file+0x246/0x870
> >>>>>>>>> ? __x64_sys_fspick+0x290/0x290
> >>>>>>>>> ? init_module_from_file+0xd1/0x130
> >>>>>>>>> init_module_from_file+0xd1/0x130
> >>>>>>>>> ? __ia32_sys_init_module+0xa0/0xa0
> >>>>>>>>> ? lock_acquire+0x2d/0xb0
> >>>>>>>>> ? idempotent_init_module+0x116/0x790
> >>>>>>>>> ? do_raw_spin_unlock+0x54/0x220
> >>>>>>>>> idempotent_init_module+0x226/0x790
> >>>>>>>>> ? init_module_from_file+0x130/0x130
> >>>>>>>>> ? vm_mmap_pgoff+0x203/0x2e0
> >>>>>>>>> __x64_sys_finit_module+0xba/0x130
> >>>>>>>>> do_syscall_64+0x69/0x160
> >>>>>>>>> entry_SYSCALL_64_after_hwframe+0x4b/0x53
> >>>>>>>>> RIP: 0033:0x7fe869de327d
> >>>>>>>>> Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 248
> >>>>>>>>> RSP: 002b:00007ffe34a828d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> >>>>>>>>> RAX: ffffffffffffffda RBX: 0000557fa8f3f3f0 RCX: 00007fe869de327d
> >>>>>>>>> RDX: 0000000000000000 RSI: 00007fe869f4943c RDI: 0000000000000006
> >>>>>>>>> RBP: 00007fe869f4943c R08: 0000000000000000 R09: 0000000000000000
> >>>>>>>>> R10: 0000000000000006 R11: 0000000000000246 R12: 0000000000020000
> >>>>>>>>> R13: 0000557fa8f3f030 R14: 0000000000000000 R15: 0000557fa8f3d110
> >>>>>>>>> </TASK>
> >>>>>>>>> Modules linked in:
> >>>>>>>>> CR2: fffffbfff4041000
> >>>>>>>>> ---[ end trace 0000000000000000 ]---
> >>>>>>>>>
> >>>>>>>>> I suspect you only hit this with an unlucky amount of debugging enabled. The kernel config I used
> >>>>>>>>> is found here:
> >>>>>>>>>
> >>>>>>>>> http://www.candelatech.com/downloads/cfg-kasan-crash-regression.config
> >>>>>>>>>
> >>>>>>>>> I will be happy to test fixes.
> >>>>>>>> Hi Ben,
> >>>>>>>> Thanks for reporting the issue. Do you have these recent fixes in your tree:
> >>>>>>>>
> >>>>>>>> https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/
> >>>>>>>> https://lore.kernel.org/all/20241205170528.81000-1-hao.ge@linux.dev/
> >>>>>>>>
> >>>>>>>> If not, couple you please apply them and see if the issue is still happening?
> >>>>>>>> Thanks,
> >>>>>>>> Suren.
> >>>>>>> Hello Suren,
> >>>>>>>
> >>>>>>> Thanks for the quick response. The first patch is already in latest Linus tree,
> >>>>> Hmm. Could you please double-check which tree you are using? I don't
> >>>>> see the first patch
> >>>>> (https://lore.kernel.org/all/20241130001423.1114965-1-surenb@google.com/)
> >>>>> in Linus' tree. Maybe you are using linux-next?
> >>>> Sorry, you are correct. I must have mangled something when trying to apply
> >>>> the patch and I didn't look hard enough when patch said changes were already applied.
> >>>>
> >>>> I can re-test this next week...and for reference, kernel boots fine when you disable
> >>>> KASAN and other debugging.
> >>> Thanks! Please retest with this patch and let me know if you are still
> >>> having issues.
> >>> Suren.
> >> Indeed, this is a bug that still exists in another context, namely when
> >> CONFIG_KASAN_VMALLOC is not enabled.
> > Hmm. Are you able to reproduce this issue with all the fixes we had?
>
>
> Yes, I set up an x86 virtual machine and after porting both of our
> patches, I encountered a reproduction of the issue.
>
> I have also submitted a patch to fix this issue.
>
> https://lore.kernel.org/all/20241210041515.765569-1-hao.ge@linux.dev/
Thanks! I'll post my comments there.
>
> I verified it locally.
>
>
> Hi Ben
>
> Can you test with this patch?
>
>
> Thanks
>
> Best Regards
>
> Hao
>
> >
> >> We may need to look into this scenario next.
> >>
> >> Thanks
> >>
> >> Best Regards
> >>
> >> Hao
> >>
> >>>> Thanks,
> >>>> Ben
> >>>>
> >>>>
> >>>> --
> >>>> Ben Greear <greearb@candelatech.com>
> >>>> Candela Technologies Inc http://www.candelatech.com
> >>>>
> >>>>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-12-10 17:36 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-12-06 21:50 BISECTED: 'alloc_tag: populate memory for module tags as needed' crashes on boot Ben Greear
2024-12-06 22:03 ` Suren Baghdasaryan
2024-12-06 22:43 ` Ben Greear
2024-12-06 22:55 ` Suren Baghdasaryan
2024-12-07 0:15 ` Suren Baghdasaryan
2024-12-07 0:50 ` Ben Greear
2024-12-07 1:27 ` Suren Baghdasaryan
2024-12-09 12:47 ` Hao Ge
2024-12-09 22:33 ` Suren Baghdasaryan
2024-12-10 4:20 ` Hao Ge
2024-12-10 17:36 ` Suren Baghdasaryan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox