From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AA1C5C4332F for ; Wed, 12 Oct 2022 16:36:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D09776B0071; Wed, 12 Oct 2022 12:36:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CB9C56B0073; Wed, 12 Oct 2022 12:36:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B5A84900002; Wed, 12 Oct 2022 12:36:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9F34A6B0071 for ; Wed, 12 Oct 2022 12:36:12 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 55FBCA1262 for ; Wed, 12 Oct 2022 16:36:12 +0000 (UTC) X-FDA: 80012849784.18.5A9E386 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf07.hostedemail.com (Postfix) with ESMTP id C8F6540029 for ; Wed, 12 Oct 2022 16:36:11 +0000 (UTC) Received: by mail-ed1-f52.google.com with SMTP id g27so25210180edf.11 for ; Wed, 12 Oct 2022 09:36:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:from:to:cc:subject:date:message-id:reply-to; bh=Fxm/OXcIXJkYL5yUuQr39NlAJj98QVsfZpBA3iMNlkA=; b=RRz2Zz6Of7gCrjBZnwbZ+LvffpIpMolm02HmXJJZwyktRLaYdDgzpsqWd2qV2B5g5l ahtr8El4Bj/jVRL53j1sUMZs1v2wavn8aGQCPib5qgZGRHoHj8tSx3wTIX/gloMr91kw VaeatV73aSKZn3mrDUEjZ/t8f0d17WELZbMq7hFHMCc5Svfq+ByFHIK+mkH3rsI2n+SB pX+tUbHRMe12RYzWMyLuzAlGITOg6d2TyhRY/Yv3BTB/xy+96QYUgzqRefIzTEoJLWcx qQ75yR9U0Utc/dx8AdeJCjq9pUOhxS5L1Xhbl/zsy1ODJSR6CwHt5i31gdZWVnbu/0Z3 dyzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:date:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=Fxm/OXcIXJkYL5yUuQr39NlAJj98QVsfZpBA3iMNlkA=; b=inZYd7H9wti/9csV4xbaJPtKwrl6K+8aHY27pnUMKlb8VnFpCzwaG1drBBipPrQL+4 pAYnWrNxpo+kP3ee3dyK3EOj2c1lTVnJEq+RMhaAFMuOEufvAKVzHn5n25oZVsdljoXs 9rOrVZU6n6R0fs5tkFMFxQbOumRiwXE5WyeHY8KiX/6gjI5qb7yAaEVtdJ9bfLKWlN54 fF605Gj/WgWycFrRNmv6UeXOIvUVgiP7ILh1vo4tvCPpUJ8B2tD6OrkH0RPu9tc/tW+w 9uACnHx0OJjKn37nepKO12FTUcvhOXwTDPTXOjGABgQBAgugwTEwACO8YM48CisrX32b 6meQ== X-Gm-Message-State: ACrzQf1L14U2hJdB+ZqQZpkju5slT5YW9wQdNLlEpXt2kVAQBbDUiduI 3h9NeHMXr6eNUEVM3mAq/PY= X-Google-Smtp-Source: AMsMyM5PEgD65dBKuFTAoqIKvMQnGOF89v+SSwH0taUA+eW3nr7NHIV4f3hObVAf/YzQBoXTNcJ7uQ== X-Received: by 2002:a05:6402:4310:b0:45c:c16c:5c7d with SMTP id m16-20020a056402431000b0045cc16c5c7dmr2183811edc.246.1665592570220; Wed, 12 Oct 2022 09:36:10 -0700 (PDT) Received: from pc638.lan ([155.137.26.201]) by smtp.gmail.com with ESMTPSA id p23-20020a17090653d700b0077077b59085sm1509029ejo.184.2022.10.12.09.36.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 12 Oct 2022 09:36:09 -0700 (PDT) From: Uladzislau Rezki X-Google-Original-From: Uladzislau Rezki Date: Wed, 12 Oct 2022 18:36:07 +0200 To: David Hildenbrand Cc: Uladzislau Rezki , Alexander Potapenko , Andrey Konovalov , "linux-mm@kvack.org" , Andrey Ryabinin , Dmitry Vyukov , Vincenzo Frascino , kasan-dev@googlegroups.com Subject: Re: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS Message-ID: References: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> <9ce8a3a3-8305-31a4-a097-3719861c234e@redhat.com> <6d75325f-a630-5ae3-5162-65f5bb51caf7@redhat.com> <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <478c93f5-3f06-e426-9266-2c043c3658da@redhat.com> ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RRz2Zz6O; spf=pass (imf07.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665592571; a=rsa-sha256; cv=none; b=0ZAnQg1nk0zgxZ9nw8P6T7Px9Heqh3+Klsocm5ENki2cx2yAuNOurErpW/VC5F6Op2qEl2 JSV+jDUXP2Ydow2d0NMlyJw4rH47hEmxudy9e/zGHW1/iT/AUQzKAKCOp9FAM5JuUjdWim 0eM7H4txBqZeyblqne3nE2AEUyQynH4= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665592571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Fxm/OXcIXJkYL5yUuQr39NlAJj98QVsfZpBA3iMNlkA=; b=FLpMtYwJAAUJfuzUdSa5Kkox9mTpY3DkTkrHkv+dh8EYWrFs9fIkKL+pZfyG0hEXguvmZc iitJ5uSryNh+XM8t4wyk0EJUl9s1UpYoEqY2yDx28ohLHJgK78slk5hw1INkb1OPyefUXp m5IlNZNKc4Ao7n39TDKzW3N1R9vVjZg= X-Rspam-User: X-Stat-Signature: mgsr3jmdqthn86twnkx1fh8wptsmiteu X-Rspamd-Queue-Id: C8F6540029 Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=RRz2Zz6O; spf=pass (imf07.hostedemail.com: domain of urezki@gmail.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=urezki@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspamd-Server: rspam07 X-HE-Tag: 1665592571-517539 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > Was lucky to grab that system again. Compiled a custom 6.0 kernel, whereby I printk all vmap allocation errors, including the range similarly to what you suggested above (but printk only on the failure path). > > So these are the failing allocations: > > # dmesg | grep " -> alloc" > [ 168.862511] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.863020] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.863841] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.864562] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.864646] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.865688] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.865718] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.866098] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.866551] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.866752] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867147] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867210] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867312] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867650] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867767] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.867815] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.868059] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.868463] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.868822] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.868919] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.869843] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.869854] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.870174] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.870611] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.870806] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.870982] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 168.879000] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.449101] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.449834] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.450667] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.451539] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.452326] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.453239] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.454052] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.454697] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.454811] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.455575] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.455754] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.461450] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.805223] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.805507] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.929577] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.930389] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.931244] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.932035] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.932796] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.933592] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.934470] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.935344] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 169.970641] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.191600] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.191875] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.241901] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.242708] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.243465] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.244211] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.245060] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.245868] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.246433] -> alloc 40960 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.246657] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.247451] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.248226] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.248902] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.249704] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.250497] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.251244] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.252076] -> alloc 319488 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.587168] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 170.598995] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 171.865721] -> alloc 2506752 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > [ 172.138557] -> alloc 917504 size, align: 4096, vstart: 18446744072639352832, vend: 18446744073692774400 > OK. It is related to a module vmap space allocation when a module is inserted. I wounder why it requires 2.5MB for a module? It seems a lot to me. > > Really looks like only module vmap space. ~ 1 GiB of vmap module space ... > If an allocation request for a module is 2.5MB we can load ~400 modules having 1GB address space. "lsmod | wc -l"? How many modules your system has? > I did try: > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index dd6cdb201195..199154a2228a 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -72,6 +72,8 @@ early_param("nohugevmalloc", set_nohugevmalloc); > static const bool vmap_allow_huge = false; > #endif /* CONFIG_HAVE_ARCH_HUGE_VMALLOC */ > +static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); > + > bool is_vmalloc_addr(const void *x) > { > unsigned long addr = (unsigned long)kasan_reset_tag(x); > @@ -1574,7 +1576,6 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > struct vmap_area *va; > unsigned long freed; > unsigned long addr; > - int purged = 0; > int ret; > BUG_ON(!size); > @@ -1631,23 +1632,22 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, > return va; > overflow: > - if (!purged) { > + if (atomic_long_read(&vmap_lazy_nr)) { > purge_vmap_area_lazy(); > - purged = 1; > goto retry; > } > freed = 0; > blocking_notifier_call_chain(&vmap_notify_list, 0, &freed); > - if (freed > 0) { > - purged = 0; > + if (freed > 0) > goto retry; > - } > - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) > + if (!(gfp_mask & __GFP_NOWARN)) { > pr_warn("vmap allocation for size %lu failed: use vmalloc= to increase size\n", > size); > + printk("-> alloc %lu size, align: %lu, vstart: %lu, vend: %lu\n", size, align, vstart, vend); > + } > kmem_cache_free(vmap_area_cachep, va); > return ERR_PTR(-EBUSY); > @@ -1690,8 +1690,6 @@ static unsigned long lazy_max_pages(void) > return log * (32UL * 1024 * 1024 / PAGE_SIZE); > } > -static atomic_long_t vmap_lazy_nr = ATOMIC_LONG_INIT(0); > - > > > But that didn't help at all. That system is crazy: > If an allocation fails, the next step is to drain outstanding vmap areas. So a caller does it from its context and then repeat one more time and only after that a fail message is printed. > > # lspci | wc -l > 1117 > So probably you need a lot of modules in order to fully make functional your HW :) > > What I find interesting is that we have these recurring allocations of similar sizes failing. > I wonder if user space is capable of loading the same kernel module concurrently to > trigger a massive amount of allocations, and module loading code only figures out > later that it has already been loaded and backs off. > If there is a request about allocating memory it has to be succeeded unless there are some errors like no space no memory. > > My best guess would be that module loading is serialized completely, but for some reason, > something seems to go wrong with a lot of concurrency ... > lazy_max_pages() depends on number of online CPUs. Probably something related... I wrote a small patch to dump a modules address space when a fail occurs: diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 83b54beb12fa..88d323310df5 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1580,6 +1580,37 @@ preload_this_cpu_lock(spinlock_t *lock, gfp_t gfp_mask, int node) kmem_cache_free(vmap_area_cachep, va); } +static void +dump_modules_free_space(unsigned long vstart, unsigned long vend) +{ + unsigned long va_start, va_end; + unsigned int total = 0; + struct vmap_area *va; + + if (vend != MODULES_END) + return; + + trace_printk("--- Dump a modules address space: 0x%lx - 0x%lx\n", vstart, vend); + + spin_lock(&free_vmap_area_lock); + list_for_each_entry(va, &free_vmap_area_list, list) { + va_start = (va->va_start > vstart) ? va->va_start:vstart; + va_end = (va->va_end < vend) ? va->va_end:vend; + + if (va_start >= va_end) + continue; + + if (va_start >= vstart && va_end <= vend) { + trace_printk(" va_free: 0x%lx - 0x%lx size=%lu\n", + va_start, va_end, va_end - va_start); + total += (va_end - va_start); + } + } + + spin_unlock(&free_vmap_area_lock); + trace_printk("--- Total free: %u ---\n", total); +} + /* * Allocate a region of KVA of the specified size and alignment, within the * vstart and vend. @@ -1663,10 +1694,13 @@ static struct vmap_area *alloc_vmap_area(unsigned long size, goto retry; } - if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) + if (!(gfp_mask & __GFP_NOWARN) && printk_ratelimit()) { pr_warn("vmap allocation for size %lu failed: use vmalloc= to increase size\n", size); + dump_modules_free_space(); + } + kmem_cache_free(vmap_area_cachep, va); return ERR_PTR(-EBUSY); } it would be good to understand whether we are really run out of space? Adding a print of lazy_max_pages() and vmap_lazy_nr would be also good. -- Uladzislau Rezki