From: Uladzislau Rezki <urezki@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: Uladzislau Rezki <urezki@gmail.com>,
Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
kasan-dev@googlegroups.com
Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS
Date: Wed, 12 Oct 2022 18:36:07 +0200 [thread overview]
Message-ID: <Y0bs97aVCH7SOqwX@pc638.lan> (raw)
In-Reply-To: <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com>
>
> Was lucky to grab that system again. Compiled a custom 6.0 kernel, whereby I printk all vmap allocation errors, including the range similarly to what you suggested above (but printk only on the failure path).
>
> So these are the failing allocations:
>
> # dmesg | grep " -> alloc"
> [ 168.862511] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.863020] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.863841] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.864562] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.864646] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.865688] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.865718] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.866098] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.866551] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.866752] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867147] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867210] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867312] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867650] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867767] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.868059] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.868463] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.868822] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.868919] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.869843] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.869854] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.870174] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.870611] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.870806] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.870982] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 168.879000] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.449101] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.449834] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.450667] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.451539] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.452326] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.453239] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.454052] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.454697] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.454811] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.455575] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.455754] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.461450] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.805223] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.805507] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.929577] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.930389] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.931244] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.932035] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.932796] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.933592] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.934470] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.935344] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 169.970641] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.191600] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.191875] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.241901] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.242708] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.243465] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.244211] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.245060] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.245868] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.246433] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.246657] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.247451] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.248226] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.248902] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.249704] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.250497] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.251244] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.252076] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.587168] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 170.598995] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 171.865721] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
> [ 172.138557] -> alloc 917504 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400
>
OK. It is related to a module vmap space allocation when a module is
inserted. I wounder why it requires 2.5MB for a module? It seems a lot
to me.
>
> Really looks like only module vmap space. ~ 1 GiB of vmap module space ...
>
If an allocation request for a module is 2.5MB we can load ~400 modules
having 1GB address space.
"lsmod | wc -l"? How many modules your system has?
> I did try:
>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index dd6cdb201195..199154a2228a 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -72,6 +72,8 @@ early_param("nohugevmalloc", set_nohugevmalloc);
> static const bool vmap_allow_huge = false;
> #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */
> +static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0);
> +
> bool is_vmalloc_addr(const void *x)
> {
> unsigned long addr = (unsigned long)kasan_reset_tag(x);
> @@ -1574,7 +1576,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> struct vmap_area *va;
> unsigned long freed;
> unsigned long addr;
> - int purged = 0;
> int ret;
> BUG_ON(!size);
> @@ -1631,23 +1632,22 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> return va;
> overflow:
> - if (!purged) {
> + if (atomic_long_read(&vmap_lazy_nr)) {
> purge_vmap_area_lazy();
> - purged = 1;
> goto retry;
> }
> freed = 0;
> blocking_notifier_call_chain(&vmap_notify_list, 0, &freed);
> - if (freed > 0) {
> - purged = 0;
> + if (freed > 0)
> goto retry;
> - }
> - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
> + if (!(gfp_mask & __GFP_NOWARN)) {
> pr_warn("vmap allocation for size %lu failed: use vmalloc=<size> to increase size\n",
> size);
> + printk("-> alloc %lu size, align: %lu, vstart: %lu, vend: %lu\n", size, align, vstart, vend);
> + }
> kmem_cache_free(vmap_area_cachep, va);
> return ERR_PTR(-EBUSY);
> @@ -1690,8 +1690,6 @@ static unsigned long lazy_max_pages(void)
> return log * (32UL * 1024 * 1024 / PAGE_SIZE);
> }
> -static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0);
> -
>
>
> But that didn't help at all. That system is crazy:
>
If an allocation fails, the next step is to drain outstanding vmap
areas. So a caller does it from its context and then repeat one more
time and only after that a fail message is printed.
>
> # lspci | wc -l
> 1117
>
So probably you need a lot of modules in order to fully make functional your HW :)
>
> What I find interesting is that we have these recurring allocations of similar sizes failing.
> I wonder if user space is capable of loading the same kernel module concurrently to
> trigger a massive amount of allocations, and module loading code only figures out
> later that it has already been loaded and backs off.
>
If there is a request about allocating memory it has to be succeeded
unless there are some errors like no space no memory.
>
> My best guess would be that module loading is serialized completely, but for some reason,
> something seems to go wrong with a lot of concurrency ...
>
lazy_max_pages() depends on number of online CPUs. Probably something
related...
I wrote a small patch to dump a modules address space when a fail occurs:
<snip v6.0>
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 83b54beb12fa..88d323310df5 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
kmem_cache_free(vmap_area_cachep, va);
}
+static void
+dump_modules_free_space(unsigned long vstart, unsigned long vend)
+{
+ unsigned long va_start, va_end;
+ unsigned int total = 0;
+ struct vmap_area *va;
+
+ if (vend != MODULES_END)
+ return;
+
+ trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend);
+
+ spin_lock(&free_vmap_area_lock);
+ list_for_each_entry(va, &free_vmap_area_list, list) {
+ va_start = (va->va_start > vstart) ? va->va_start:vstart;
+ va_end = (va->va_end < vend) ? va->va_end:vend;
+
+ if (va_start >= va_end)
+ continue;
+
+ if (va_start >= vstart && va_end <= vend) {
+ trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n",
+ va_start, va_end, va_end - va_start);
+ total += (va_end - va_start);
+ }
+ }
+
+ spin_unlock(&free_vmap_area_lock);
+ trace_printk("--- Total free: %u ---\n", total);
+}
+
/*
* Allocate a region of KVA of the specified size and alignment, within the
* vstart and vend.
@@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
goto retry;
}
- if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
+ if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
pr_warn("vmap allocation for size %lu failed: use vmalloc=<size> to increase size\n",
size);
+ dump_modules_free_space();
+ }
+
kmem_cache_free(vmap_area_cachep, va);
return ERR_PTR(-EBUSY);
}
<snip>
it would be good to understand whether we are really run out of space?
Adding a print of lazy_max_pages() and vmap_lazy_nr would be also good.
--
Uladzislau Rezki
next prev parent reply other threads:[~2022-10-12 16:36 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 13:46 KASAN-related VMAP allocation errors in debug kernels with many logical CPUS David Hildenbrand
2022-10-06 15:35 ` Uladzislau Rezki
2022-10-06 16:12 ` David Hildenbrand
2022-10-07 15:34 ` Uladzislau Rezki
2022-10-10 6:56 ` David Hildenbrand
2022-10-10 12:19 ` Uladzislau Rezki
2022-10-11 19:52 ` David Hildenbrand
2022-10-12 16:36 ` Uladzislau Rezki [this message]
2022-10-13 16:21 ` David Hildenbrand
2022-10-15 9:23 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y0bs97aVCH7SOqwX@pc638.lan \
--to=urezki@gmail.com \
--cc=andreyknvl@gmail.com \
--cc=david@redhat.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-mm@kvack.org \
--cc=ryabinin.a.a@gmail.com \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).