From: David Hildenbrand <david@redhat.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Alexander Potapenko <glider@google.com>,
Andrey Konovalov <andreyknvl@gmail.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Andrey Ryabinin <ryabinin.a.a@gmail.com>,
Dmitry Vyukov <dvyukov@google.com>,
Vincenzo Frascino <vincenzo.frascino@arm.com>,
kasan-dev@googlegroups.com
Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS
Date: Thu, 13 Oct 2022 18:21:17 +0200 [thread overview]
Message-ID: <e397d8aa-17a5-299b-2383-cfb01bd7197e@redhat.com> (raw)
In-Reply-To: <Y0bs97aVCH7SOqwX@pc638.lan>
>>
> OK. It is related to a module vmap space allocation when a module is
> inserted. I wounder why it requires 2.5MB for a module? It seems a lot
> to me.
>
Indeed. I assume KASAN can go wild when it instruments each and every
memory access.
>>
>> Really looks like only module vmap space. ~ 1 GiB of vmap module space ...
>>
> If an allocation request for a module is 2.5MB we can load ~400 modules
> having 1GB address space.
>
> "lsmod | wc -l"? How many modules your system has?
>
~71, so not even close to 400.
>> What I find interesting is that we have these recurring allocations of similar sizes failing.
>> I wonder if user space is capable of loading the same kernel module concurrently to
>> trigger a massive amount of allocations, and module loading code only figures out
>> later that it has already been loaded and backs off.
>>
> If there is a request about allocating memory it has to be succeeded
> unless there are some errors like no space no memory.
Yes. But as I found out we're really out of space because module loading
code allocates module VMAP space first, before verifying if the module
was already loaded or is concurrently getting loaded.
See below.
[...]
> I wrote a small patch to dump a modules address space when a fail occurs:
>
> <snip v6.0>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 83b54beb12fa..88d323310df5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
> kmem_cache_free(vmap_area_cachep, va);
> }
>
> +static void
> +dump_modules_free_space(unsigned long vstart, unsigned long vend)
> +{
> + unsigned long va_start, va_end;
> + unsigned int total = 0;
> + struct vmap_area *va;
> +
> + if (vend != MODULES_END)
> + return;
> +
> + trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend);
> +
> + spin_lock(&free_vmap_area_lock);
> + list_for_each_entry(va, &free_vmap_area_list, list) {
> + va_start = (va->va_start > vstart) ? va->va_start:vstart;
> + va_end = (va->va_end < vend) ? va->va_end:vend;
> +
> + if (va_start >= va_end)
> + continue;
> +
> + if (va_start >= vstart && va_end <= vend) {
> + trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n",
> + va_start, va_end, va_end - va_start);
> + total += (va_end - va_start);
> + }
> + }
> +
> + spin_unlock(&free_vmap_area_lock);
> + trace_printk("--- Total free: %u ---\n", total);
> +}
> +
> /*
> * Allocate a region of KVA of the specified size and alignment, within the
> * vstart and vend.
> @@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
> goto retry;
> }
>
> - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
> + if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
> pr_warn("vmap allocation for size %lu failed: use vmalloc=<size> to increase size\n",
> size);
>
> + dump_modules_free_space();
> + }
> +
> kmem_cache_free(vmap_area_cachep, va);
> return ERR_PTR(-EBUSY);
> }
Thanks!
I can spot the same module getting loaded over and over again
concurrently from user space, only failing after all the allocations
when realizing that the module is in fact already loaded in
add_unformed_module(), failing with -EEXIST.
That looks quite inefficient. Here is how often user space tries to load
the same module on that system. Note that I print *after* allocating
module VMAP space.
# dmesg | grep Loading | cut -d" " -f5 | sort | uniq -c
896 acpi_cpufreq
1 acpi_pad
1 acpi_power_meter
2 ahci
1 cdrom
2 compiled-in
1 coretemp
15 crc32c_intel
307 crc32_pclmul
1 crc64
1 crc64_rocksoft
1 crc64_rocksoft_generic
12 crct10dif_pclmul
16 dca
1 dm_log
1 dm_mirror
1 dm_mod
1 dm_region_hash
1 drm
1 drm_kms_helper
1 drm_shmem_helper
1 fat
1 fb_sys_fops
14 fjes
1 fuse
205 ghash_clmulni_intel
1 i2c_algo_bit
1 i2c_i801
1 i2c_smbus
4 i40e
4 ib_core
1 ib_uverbs
4 ice
403 intel_cstate
1 intel_pch_thermal
1 intel_powerclamp
1 intel_rapl_common
1 intel_rapl_msr
399 intel_uncore
1 intel_uncore_frequency
1 intel_uncore_frequency_common
64 ioatdma
1 ipmi_devintf
1 ipmi_msghandler
1 ipmi_si
1 ipmi_ssif
4 irdma
406 irqbypass
1 isst_if_common
165 isst_if_mbox_msr
300 kvm
408 kvm_intel
1 libahci
2 libata
1 libcrc32c
409 libnvdimm
8 Loading
1 lpc_ich
1 megaraid_sas
1 mei
1 mei_me
1 mgag200
1 nfit
1 pcspkr
1 qrtr
405 rapl
1 rfkill
1 sd_mod
2 sg
409 skx_edac
1 sr_mod
1 syscopyarea
1 sysfillrect
1 sysimgblt
1 t10_pi
1 uas
1 usb_storage
1 vfat
1 wmi
1 x86_pkg_temp_thermal
1 xfs
For each if these loading request, we'll reserve module VMAP space, and
free it once we realize later that the module was already previously loaded.
So with a lot of CPUs we might end up trying to load the same module
that often at the same time that we actually run out of module VMAP space.
I have a prototype patch that seems to fix this in module loading code.
Thanks!
--
Thanks,
David / dhildenb
next prev parent reply other threads:[~2022-10-13 16:21 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-06 13:46 KASAN-related VMAP allocation errors in debug kernels with many logical CPUS David Hildenbrand
2022-10-06 15:35 ` Uladzislau Rezki
2022-10-06 16:12 ` David Hildenbrand
2022-10-07 15:34 ` Uladzislau Rezki
2022-10-10 6:56 ` David Hildenbrand
2022-10-10 12:19 ` Uladzislau Rezki
2022-10-11 19:52 ` David Hildenbrand
2022-10-12 16:36 ` Uladzislau Rezki
2022-10-13 16:21 ` David Hildenbrand [this message]
2022-10-15 9:23 ` Uladzislau Rezki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e397d8aa-17a5-299b-2383-cfb01bd7197e@redhat.com \
--to=david@redhat.com \
--cc=andreyknvl@gmail.com \
--cc=dvyukov@google.com \
--cc=glider@google.com \
--cc=kasan-dev@googlegroups.com \
--cc=linux-mm@kvack.org \
--cc=ryabinin.a.a@gmail.com \
--cc=urezki@gmail.com \
--cc=vincenzo.frascino@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).