From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A3201C433FE for ; Thu, 13 Oct 2022 16:21:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3611C6B0073; Thu, 13 Oct 2022 12:21:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E8816B0074; Thu, 13 Oct 2022 12:21:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1630B8E0001; Thu, 13 Oct 2022 12:21:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 007456B0073 for ; Thu, 13 Oct 2022 12:21:22 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 984141A0A5A for ; Thu, 13 Oct 2022 16:21:22 +0000 (UTC) X-FDA: 80016441204.18.663E7F3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf17.hostedemail.com (Postfix) with ESMTP id 2009C40031 for ; Thu, 13 Oct 2022 16:21:21 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665678081; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=kCqR8mB4QLWOY15w1n1k5dkwpabYfzQfC4S08EZLtmg=; b=HV1YifRWRMgJJmwzKM7/wctYN2YUAkk93mTo12lGPFu5eO4ZTeISMCG84y0tp1wbTo8ES9 vPfpgplH0xwhJZz8fIoe4K82f8EczVT6z8k2W7ghPn2cuRmSEwE1W1bEUe5CxjGe6VWyYg 9XFVCZ0AjPEw5a7jDjhZP0JmmSWR7Xc= Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-301-E5qG0iwPNwSHCNX_9DXZzw-1; Thu, 13 Oct 2022 12:21:20 -0400 X-MC-Unique: E5qG0iwPNwSHCNX_9DXZzw-1 Received: by mail-wr1-f71.google.com with SMTP id s4-20020adfbc04000000b0022e03fc10a9so762685wrg.15 for ; Thu, 13 Oct 2022 09:21:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=kCqR8mB4QLWOY15w1n1k5dkwpabYfzQfC4S08EZLtmg=; b=nr1HdWSL+ygZVFdUm//arfY3b592KeF+m4GDS+ZZhjjiOUNGOp2+Ox71Psi9wdc3QC ntwI9kqkrLK4ak1wYJoZKANHy74kV5mQ7WnytBKWaIoy3I2JbwV0RUt+5gWPFdRNlcmo dd02Z3c4obLaAgBwoCxHIr0xhn5ShI5IkLvGZent/q/H3+2LafOK8YHYvHMaI1ro2NHd y61XM45Aa+NI8Usxj8C7q5eP/5hmP0ssSFqIJODu2qhTQH+3/T7sFtPVLDGonslMExMP oxazv3n+PyFMjdBw69UQq+UVhjp9uyFqGe0zLZTpZkIgnPD4s6kymKu/Gotjd2TBLdmC r5yw== X-Gm-Message-State: ACrzQf2H26WQu0jYd1SCHRSdN4Uo0Hmb3yxZPpWMX73iJZu2SXouWvZI xnN75LbF/s0BRMYhN76MGZ9IsNIQ+yWyOP1X7HkuIidfqUoACzxUW92Udpz6DrfvZMwzKOYCF4e DM15SznjEFQk= X-Received: by 2002:adf:d1ea:0:b0:22e:33f9:bcc1 with SMTP id g10-20020adfd1ea000000b0022e33f9bcc1mr499386wrd.535.1665678078952; Thu, 13 Oct 2022 09:21:18 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4GsG6UVYQi3flDbmWZuvTETLPW8iFR7lytRh7OKlhT1t1EJdww8HlvVkaBg/MElyz0OsSXKA== X-Received: by 2002:adf:d1ea:0:b0:22e:33f9:bcc1 with SMTP id g10-20020adfd1ea000000b0022e33f9bcc1mr499361wrd.535.1665678078516; Thu, 13 Oct 2022 09:21:18 -0700 (PDT) Received: from ?IPV6:2003:cb:c706:9d00:a34c:e448:d59b:831? (p200300cbc7069d00a34ce448d59b0831.dip0.t-ipconnect.de. [2003:cb:c706:9d00:a34c:e448:d59b:831]) by smtp.gmail.com with ESMTPSA id n2-20020a5d4c42000000b0022a2bacabbasm19500wrt.31.2022.10.13.09.21.17 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 Oct 2022 09:21:18 -0700 (PDT) Message-ID: Date: Thu, 13 Oct 2022 18:21:17 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 To: Uladzislau Rezki Cc: Alexander Potapenko , Andrey Konovalov , "linux-mm@kvack.org" , Andrey Ryabinin , Dmitry Vyukov , Vincenzo Frascino , kasan-dev@googlegroups.com References: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> <9ce8a3a3-8305-31a4-a097-3719861c234e@redhat.com> <6d75325f-a630-5ae3-5162-65f5bb51caf7@redhat.com> <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665678082; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kCqR8mB4QLWOY15w1n1k5dkwpabYfzQfC4S08EZLtmg=; b=207ndvrlFJczaTFZtUMDCI38gKvPOL/UQsh77NMS+NzFtEtHa3t2MLrwh6HkkziIvxSYza K5ipdB0o/BKfaxKRkbky0FU1GJSgQEBa+WR36aaYYNfkudfqTlUW6hHXDTjhAqj1ZaT7pc eJa7a3ZC1DwB4LVZwFaOhzKuPYqduPk= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HV1YifRW; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665678082; a=rsa-sha256; cv=none; b=nE/IXQwZc1MMdeyuvuPcgk/G+SLIanUHn+NHf4BjcxhNPQtenkALWfLVq5r6V3lKKwUAZH yM1uA8QsZMAXLHKVJS0AtNXlXbw/FdfOzNlSlATXJqq+881y4SEVc/2bmjYjbf63ClEiJ8 X7oc2PJIAmhfM6oPWtUwWBFUz+tHKLQ= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HV1YifRW; spf=pass (imf17.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: 7nnf5zg4b5nheefq1ch6dadefhxzupj6 X-Rspamd-Queue-Id: 2009C40031 X-HE-Tag: 1665678081-948196 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> > OK. It is related to a module vmap space allocation when a module is > inserted. I wounder why it requires 2.5MB for a module? It seems a lot > to me. > Indeed. I assume KASAN can go wild when it instruments each and every memory access. >> >> Really looks like only module vmap space. ~ 1 GiB of vmap module space ... >> > If an allocation request for a module is 2.5MB we can load ~400 modules > having 1GB address space. > > "lsmod | wc -l"? How many modules your system has? > ~71, so not even close to 400. >> What I find interesting is that we have these recurring allocations of similar sizes failing. >> I wonder if user space is capable of loading the same kernel module concurrently to >> trigger a massive amount of allocations, and module loading code only figures out >> later that it has already been loaded and backs off. >> > If there is a request about allocating memory it has to be succeeded > unless there are some errors like no space no memory. Yes. But as I found out we're really out of space because module loading code allocates module VMAP space first, before verifying if the module was already loaded or is concurrently getting loaded. See below. [...] > I wrote a small patch to dump a modules address space when a fail occurs: > > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index 83b54beb12fa..88d323310df5 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) > kmem_cache_free(vmap_area_cachep, va); > } > > +static void > +dump_modules_free_space(unsigned long vstart, unsigned long vend) > +{ > + unsigned long va_start, va_end; > + unsigned int total = 0; > + struct vmap_area *va; > + > + if (vend != MODULES_END) > + return; > + > + trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend); > + > + spin_lock(&free_vmap_area_lock); > + list_for_each_entry(va, &free_vmap_area_list, list) { > + va_start = (va->va_start > vstart) ? va->va_start:vstart; > + va_end = (va->va_end < vend) ? va->va_end:vend; > + > + if (va_start >= va_end) > + continue; > + > + if (va_start >= vstart && va_end <= vend) { > + trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n", > + va_start, va_end, va_end - va_start); > + total += (va_end - va_start); > + } > + } > + > + spin_unlock(&free_vmap_area_lock); > + trace_printk("--- Total free: %u ---\n", total); > +} > + > /* > * Allocate a region of KVA of the specified size and alignment, within the > * vstart and vend. > @@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > goto retry; > } > > - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) > + if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) { > pr_warn("vmap allocation for size %lu failed: use vmalloc= to increase size\n", > size); > > + dump_modules_free_space(); > + } > + > kmem_cache_free(vmap_area_cachep, va); > return ERR_PTR(-EBUSY); > } Thanks! I can spot the same module getting loaded over and over again concurrently from user space, only failing after all the allocations when realizing that the module is in fact already loaded in add_unformed_module(), failing with -EEXIST. That looks quite inefficient. Here is how often user space tries to load the same module on that system. Note that I print *after* allocating module VMAP space. # dmesg | grep Loading | cut -d" " -f5 | sort | uniq -c 896 acpi_cpufreq 1 acpi_pad 1 acpi_power_meter 2 ahci 1 cdrom 2 compiled-in 1 coretemp 15 crc32c_intel 307 crc32_pclmul 1 crc64 1 crc64_rocksoft 1 crc64_rocksoft_generic 12 crct10dif_pclmul 16 dca 1 dm_log 1 dm_mirror 1 dm_mod 1 dm_region_hash 1 drm 1 drm_kms_helper 1 drm_shmem_helper 1 fat 1 fb_sys_fops 14 fjes 1 fuse 205 ghash_clmulni_intel 1 i2c_algo_bit 1 i2c_i801 1 i2c_smbus 4 i40e 4 ib_core 1 ib_uverbs 4 ice 403 intel_cstate 1 intel_pch_thermal 1 intel_powerclamp 1 intel_rapl_common 1 intel_rapl_msr 399 intel_uncore 1 intel_uncore_frequency 1 intel_uncore_frequency_common 64 ioatdma 1 ipmi_devintf 1 ipmi_msghandler 1 ipmi_si 1 ipmi_ssif 4 irdma 406 irqbypass 1 isst_if_common 165 isst_if_mbox_msr 300 kvm 408 kvm_intel 1 libahci 2 libata 1 libcrc32c 409 libnvdimm 8 Loading 1 lpc_ich 1 megaraid_sas 1 mei 1 mei_me 1 mgag200 1 nfit 1 pcspkr 1 qrtr 405 rapl 1 rfkill 1 sd_mod 2 sg 409 skx_edac 1 sr_mod 1 syscopyarea 1 sysfillrect 1 sysimgblt 1 t10_pi 1 uas 1 usb_storage 1 vfat 1 wmi 1 x86_pkg_temp_thermal 1 xfs For each if these loading request, we'll reserve module VMAP space, and free it once we realize later that the module was already previously loaded. So with a lot of CPUs we might end up trying to load the same module that often at the same time that we actually run out of module VMAP space. I have a prototype patch that seems to fix this in module loading code. Thanks! -- Thanks, David / dhildenb