Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Uladzislau Rezki <urezki@gmail.com>
Cc: Alexander Potapenko <glider@google.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Andrey Ryabinin <ryabinin.a.a@gmail.com>,
	Dmitry Vyukov <dvyukov@google.com>,
	Vincenzo Frascino <vincenzo.frascino@arm.com>,
	kasan-dev@googlegroups.com
Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS
Date: Thu, 13 Oct 2022 18:21:17 +0200	[thread overview]
Message-ID: <e397d8aa-17a5-299b-2383-cfb01bd7197e@redhat.com> (raw)
In-Reply-To: <Y0bs97aVCH7SOqwX@pc638.lan>

>>
> OK. It is related to a module vmap space allocation when a module is
> inserted. I wounder why it requires 2.5MB for a module? It seems a lot
> to me.
> 

Indeed. I assume KASAN can go wild when it instruments each and every 
memory access.

>>
>> Really looks like only module vmap space. ~ 1 GiB of vmap module space ...
>>
> If an allocation request for a module is 2.5MB we can load ~400 modules
> having 1GB address space.
> 
> "lsmod | wc -l"? How many modules your system has?
> 

~71, so not even close to 400.

>> What I find interesting is that we have these recurring allocations of similar sizes failing.
>> I wonder if user space is capable of loading the same kernel module concurrently to
>> trigger a massive amount of allocations, and module loading code only figures out
>> later that it has already been loaded and backs off.
>>
> If there is a request about allocating memory it has to be succeeded
> unless there are some errors like no space no memory.

Yes. But as I found out we're really out of space because module loading 
code allocates module VMAP space first, before verifying if the module 
was already loaded or is concurrently getting loaded.

See below.

[...]

> I wrote a small patch to dump a modules address space when a fail occurs:
> 
> <snip v6.0>
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 83b54beb12fa..88d323310df5 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node)
>   		kmem_cache_free(vmap_area_cachep, va);
>   }
>   
> +static void
> +dump_modules_free_space(unsigned long vstart, unsigned long vend)
> +{
> +	unsigned long va_start, va_end;
> +	unsigned int total = 0;
> +	struct vmap_area *va;
> +
> +	if (vend != MODULES_END)
> +		return;
> +
> +	trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend);
> +
> +	spin_lock(&free_vmap_area_lock);
> +	list_for_each_entry(va, &free_vmap_area_list, list) {
> +		va_start = (va->va_start > vstart) ? va->va_start:vstart;
> +		va_end = (va->va_end < vend) ? va->va_end:vend;
> +
> +		if (va_start >= va_end)
> +			continue;
> +
> +		if (va_start >= vstart && va_end <= vend) {
> +			trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n",
> +				va_start, va_end, va_end - va_start);
> +			total += (va_end - va_start);
> +		}
> +	}
> +
> +	spin_unlock(&free_vmap_area_lock);
> +	trace_printk("--- Total free: %u ---\n", total);
> +}
> +
>   /*
>    * Allocate a region of KVA of the specified size and alignment, within the
>    * vstart and vend.
> @@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size,
>   		goto retry;
>   	}
>   
> -	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit())
> +	if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) {
>   		pr_warn("vmap allocation for size %lu failed: use vmalloc=<size> to increase size\n",
>   			size);
>   
> +		dump_modules_free_space();
> +	}
> +
>   	kmem_cache_free(vmap_area_cachep, va);
>   	return ERR_PTR(-EBUSY);
>   }

Thanks!

I can spot the same module getting loaded over and over again 
concurrently from user space, only failing after all the allocations 
when realizing that the module is in fact already loaded in 
add_unformed_module(), failing with -EEXIST.

That looks quite inefficient. Here is how often user space tries to load 
the same module on that system. Note that I print *after* allocating 
module VMAP space.

# dmesg | grep Loading | cut -d" " -f5 | sort | uniq -c
     896 acpi_cpufreq
       1 acpi_pad
       1 acpi_power_meter
       2 ahci
       1 cdrom
       2 compiled-in
       1 coretemp
      15 crc32c_intel
     307 crc32_pclmul
       1 crc64
       1 crc64_rocksoft
       1 crc64_rocksoft_generic
      12 crct10dif_pclmul
      16 dca
       1 dm_log
       1 dm_mirror
       1 dm_mod
       1 dm_region_hash
       1 drm
       1 drm_kms_helper
       1 drm_shmem_helper
       1 fat
       1 fb_sys_fops
      14 fjes
       1 fuse
     205 ghash_clmulni_intel
       1 i2c_algo_bit
       1 i2c_i801
       1 i2c_smbus
       4 i40e
       4 ib_core
       1 ib_uverbs
       4 ice
     403 intel_cstate
       1 intel_pch_thermal
       1 intel_powerclamp
       1 intel_rapl_common
       1 intel_rapl_msr
     399 intel_uncore
       1 intel_uncore_frequency
       1 intel_uncore_frequency_common
      64 ioatdma
       1 ipmi_devintf
       1 ipmi_msghandler
       1 ipmi_si
       1 ipmi_ssif
       4 irdma
     406 irqbypass
       1 isst_if_common
     165 isst_if_mbox_msr
     300 kvm
     408 kvm_intel
       1 libahci
       2 libata
       1 libcrc32c
     409 libnvdimm
       8 Loading
       1 lpc_ich
       1 megaraid_sas
       1 mei
       1 mei_me
       1 mgag200
       1 nfit
       1 pcspkr
       1 qrtr
     405 rapl
       1 rfkill
       1 sd_mod
       2 sg
     409 skx_edac
       1 sr_mod
       1 syscopyarea
       1 sysfillrect
       1 sysimgblt
       1 t10_pi
       1 uas
       1 usb_storage
       1 vfat
       1 wmi
       1 x86_pkg_temp_thermal
       1 xfs


For each if these loading request, we'll reserve module VMAP space, and 
free it once we realize later that the module was already previously loaded.

So with a lot of CPUs we might end up trying to load the same module 
that often at the same time that we actually run out of module VMAP space.

I have a prototype patch that seems to fix this in module loading code.

Thanks!

-- 
Thanks,

David / dhildenb

next prev parent reply	other threads:[~2022-10-13 16:21 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-06 13:46 KASAN-related VMAP allocation errors in debug kernels with many logical CPUS David Hildenbrand
2022-10-06 15:35 ` Uladzislau Rezki
2022-10-06 16:12   ` David Hildenbrand
2022-10-07 15:34     ` Uladzislau Rezki
2022-10-10  6:56       ` David Hildenbrand
2022-10-10 12:19         ` Uladzislau Rezki
2022-10-11 19:52           ` David Hildenbrand
2022-10-12 16:36             ` Uladzislau Rezki
2022-10-13 16:21               ` David Hildenbrand [this message]
2022-10-15  9:23                 ` Uladzislau Rezki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e397d8aa-17a5-299b-2383-cfb01bd7197e@redhat.com \
    --to=david@redhat.com \
    --cc=andreyknvl@gmail.com \
    --cc=dvyukov@google.com \
    --cc=glider@google.com \
    --cc=kasan-dev@googlegroups.com \
    --cc=linux-mm@kvack.org \
    --cc=ryabinin.a.a@gmail.com \
    --cc=urezki@gmail.com \
    --cc=vincenzo.frascino@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).