From: David Hildenbrand <david@redhat.com>
To: Alexander Potapenko <glider@google.com>,
"Uladzislau Rezki (Sony)" <urezki@gmail.com>,
Andrey Konovalov <andreyknvl@gmail.com>
Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
kasan-dev@googlegroups.com
Subject: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS
Date: Thu, 6 Oct 2022 15:46:59 +0200 [thread overview]
Message-ID: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> (raw)
Hi,
we're currently hitting a weird vmap issue in debug kernels with KASAN enabled
on fairly large VMs. I reproduced it on v5.19 (did not get the chance to
try 6.0 yet because I don't have access to the machine right now, but
I suspect it persists).
It seems to trigger when udev probes a massive amount of devices in parallel
while the system is booting up. Once the system booted, I no longer see any
such issues.
[ 165.818200] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.836622] vmap allocation for size 315392 failed: use vmalloc=<size> to increase size
[ 165.837461] vmap allocation for size 315392 failed: use vmalloc=<size> to increase size
[ 165.840573] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.841059] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.841428] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.841819] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.842123] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.843359] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.844894] vmap allocation for size 2498560 failed: use vmalloc=<size> to increase size
[ 165.847028] CPU: 253 PID: 4995 Comm: systemd-udevd Not tainted 5.19.0 #2
[ 165.935689] Hardware name: Lenovo ThinkSystem SR950 -[7X12ABC1WW]-/-[7X12ABC1WW]-, BIOS -[PSE130O-1.81]- 05/20/2020
[ 165.947343] Call Trace:
[ 165.950075] <TASK>
[ 165.952425] dump_stack_lvl+0x57/0x81
[ 165.956532] warn_alloc.cold+0x95/0x18a
[ 165.960836] ? zone_watermark_ok_safe+0x240/0x240
[ 165.966100] ? slab_free_freelist_hook+0x11d/0x1d0
[ 165.971461] ? __get_vm_area_node+0x2af/0x360
[ 165.976341] ? __get_vm_area_node+0x2af/0x360
[ 165.981219] __vmalloc_node_range+0x291/0x560
[ 165.986087] ? __mutex_unlock_slowpath+0x161/0x5e0
[ 165.991447] ? move_module+0x4c/0x630
[ 165.995547] ? vfree_atomic+0xa0/0xa0
[ 165.999647] ? move_module+0x4c/0x630
[ 166.003741] module_alloc+0xe7/0x170
[ 166.007747] ? move_module+0x4c/0x630
[ 166.011840] move_module+0x4c/0x630
[ 166.015751] layout_and_allocate+0x32c/0x560
[ 166.020519] load_module+0x8e0/0x25c0
[ 166.024623] ? layout_and_allocate+0x560/0x560
[ 166.029586] ? kernel_read_file+0x286/0x6b0
[ 166.034269] ? __x64_sys_fspick+0x290/0x290
[ 166.038946] ? userfaultfd_unmap_prep+0x430/0x430
[ 166.044203] ? lock_downgrade+0x130/0x130
[ 166.048698] ? __do_sys_finit_module+0x11a/0x1c0
[ 166.053854] __do_sys_finit_module+0x11a/0x1c0
[ 166.058818] ? __ia32_sys_init_module+0xa0/0xa0
[ 166.063882] ? __seccomp_filter+0x92/0x930
[ 166.068494] do_syscall_64+0x59/0x90
[ 166.072492] ? do_syscall_64+0x69/0x90
[ 166.076679] ? do_syscall_64+0x69/0x90
[ 166.080864] ? do_syscall_64+0x69/0x90
[ 166.085047] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 166.090984] ? lockdep_hardirqs_on+0x79/0x100
[ 166.095855] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Some facts:
1. The #CPUs seems to be more important than the #MEM
Initially we thought the memory size would be the relevant trigger,
because we've only seen it on 8TiB machines. But I was able to
reproduce also on a "small" machine with ~450GiB.
We've seen this issue only on machines with a lot (~448) logical CPUs.
On such systems, I was not able to reproduce when booting the kernel with
"nosmt" so far, which could indicate some kind of concurrency problem.
2. CONFIG_KASAN_INLINE seems to be relevant
This issue only seems to trigger with KASAN enabled, and what I can tell,
only with CONFIG_KASAN_INLINE=y:
CONFIG_KASAN_INLINE: "but makes the kernel's .text size much bigger.", that
should include kernel module to be loaded.
3. All systems have 8, equally sized NUMA nodes
... which implies, that at least one node is practically completely filled with
KASAN data. I remember adjusting the system size with "mem=", such that some
nodes were memory-less but NODE 0 would still have some free memory.
I remember that it still triggered.
My current best guess is that this is a combination of large VMAP demands
(e.g., kernel modules with quite a size due to CONFIG_KASAN_INLINE) and
eventually a concurrency issue with large #CPUs. But I might be wrong and
this might be something zone/node related.
Does any of that ring a bell -- especially why it would fail with 448 logical
CPUs but succeed with 224 logical CPUs (nosmt)?
My best guess would be that the purge_vmap_area_lazy() logic in alloc_vmap_area()
might not be sufficient when there is a lot of concurrency: simply purging
once and then failing might be problematic in corner cases where there is a lot of
concurrent vmap action going on. But that's just my best guess.
Cheers!
--
Thanks,
David / dhildenb
next reply other threads:[~2022-10-06 13:47 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 13:46 David Hildenbrand [this message]
2022-10-06 15:35 ` KASAN-related VMAP allocation errors in debug kernels with many logical CPUS Uladzislau Rezki
2022-10-06 16:12 ` David Hildenbrand
2022-10-07 15:34 ` Uladzislau Rezki
2022-10-10 6:56 ` David Hildenbrand
2022-10-10 12:19 ` Uladzislau Rezki
2022-10-11 19:52 ` David Hildenbrand
2022-10-12 16:36 ` Uladzislau Rezki
2022-10-13 16:21 ` David Hildenbrand
2022-10-15 9:23 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com \
--to=david@redhat.com \
--cc=andreyknvl@gmail.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-mm@kvack.org \
--cc=ryabinin.a.a@gmail.com \
--cc=urezki@gmail.com \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).