From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F276C433F5 for ; Thu, 6 Oct 2022 13:47:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EDC48E0002; Thu, 6 Oct 2022 09:47:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 89D4C8E0001; Thu, 6 Oct 2022 09:47:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 716818E0002; Thu, 6 Oct 2022 09:47:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4F09B8E0001 for ; Thu, 6 Oct 2022 09:47:05 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id E90A1A0C43 for ; Thu, 6 Oct 2022 13:47:04 +0000 (UTC) X-FDA: 79990650768.26.43FFA27 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf16.hostedemail.com (Postfix) with ESMTP id F15E518001A for ; Thu, 6 Oct 2022 13:47:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1665064023; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=Skvbx9xbdFNqMOGk1nVf/Aq3EKcM8w6ila1egEkLVWg=; b=UKKk5LjjH0GgNO2pweohjj8dgggTl8DJsbpQ18YNqtmrdX96MT+KYgZbTXuo/iEPu3owbw cKHkAKFhw6XObjzyXaUq4fceeJlQxGD3RImh4TnDVJgVB0/KZKj+7aFgpo+ZUo/aGjRIZz xw3OenBInr/J+Zbfw6Rx1sIGa/mBuB4= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-467-vdaCl4H0MbCeugIq62PVSA-1; Thu, 06 Oct 2022 09:47:02 -0400 X-MC-Unique: vdaCl4H0MbCeugIq62PVSA-1 Received: by mail-wm1-f71.google.com with SMTP id 84-20020a1c0257000000b003bf088c1a81so1066354wmc.2 for ; Thu, 06 Oct 2022 06:47:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:subject:cc:to:organization:from :content-language:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Skvbx9xbdFNqMOGk1nVf/Aq3EKcM8w6ila1egEkLVWg=; b=yOX8fmtZa/kWifn8cTF/OnZ/DzCE9p/5LGbDsejbxgQ1wEReqS0TPUmXNyYipbF/My iE7Hikq9PvV6AFobzqpXz6GJWwxFECYAWsRmLC7rpFLQcbNxy1rN5CZmFwcBkt9xYYB1 ZephpHY/o99IsWmPM30UFCWsVgHebrlhDYatDLROHqNcUWHB0EvvkBp3UCLvwl9RFLqk F8Jov5QHVN4fW5SJkoIted8VWxUr9C+59wD0E4dy+jgJxEkjum8AoM5LboJ3/u7kRP3O C3BH8P+FzTCVIlHTmixRxm7aN0rwdbL9xy9uPyxnunmDQC37NFmjYtrwRbTLmit+n8ut aIPg== X-Gm-Message-State: ACrzQf0l4M80LyvbLfjY0IgvudPhUFS7+MECwSqI+pGQC+s4TKByDjq5 Pgc7BTI1hLU8AwkeF8L2VMqGlzl3GYr1Ys1wD0BR5lNc7McKZeyfEOwkea4DAvMJjndVDJHCmb/ pLewOTKeu7J4= X-Received: by 2002:a05:6000:1d93:b0:22e:5d8a:c8f8 with SMTP id bk19-20020a0560001d9300b0022e5d8ac8f8mr34333wrb.324.1665064020979; Thu, 06 Oct 2022 06:47:00 -0700 (PDT) X-Google-Smtp-Source: AMsMyM6g/hMX7GZnbACqgW+f6w+mXTExpl1Do6yYNOpEu4ZfMvTC1nA9z+GDUD//tRnAMCVDfBODyg== X-Received: by 2002:a05:6000:1d93:b0:22e:5d8a:c8f8 with SMTP id bk19-20020a0560001d9300b0022e5d8ac8f8mr34318wrb.324.1665064020609; Thu, 06 Oct 2022 06:47:00 -0700 (PDT) Received: from ?IPV6:2003:cb:c705:3700:aed2:a0f8:c270:7f30? (p200300cbc7053700aed2a0f8c2707f30.dip0.t-ipconnect.de. [2003:cb:c705:3700:aed2:a0f8:c270:7f30]) by smtp.gmail.com with ESMTPSA id f62-20020a1c3841000000b003b31fc77407sm5605467wma.30.2022.10.06.06.46.59 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 06 Oct 2022 06:47:00 -0700 (PDT) Message-ID: <8aaaeec8-14a1-cdc4-4c77-4878f4979f3e@redhat.com> Date: Thu, 6 Oct 2022 15:46:59 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.3.1 From: David Hildenbrand Organization: Red Hat To: Alexander Potapenko , "Uladzislau Rezki (Sony)" , Andrey Konovalov Cc: "linux-mm@kvack.org" , Andrey Ryabinin , Dmitry Vyukov , Vincenzo Frascino , kasan-dev@googlegroups.com Subject: KASAN-related VMAP allocation errors in debug kernels with many logical CPUS X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665064024; a=rsa-sha256; cv=none; b=VhG6VSWGbjPPNnVlxlJfW2/BPRUxsLYpT62Oi+pRNd79jddAm1jPm1ITiL1LAyzssmp7iO myXiKvhIBI4TRZVfZXJBlpCYOmAH0ucQXQWJWs6d1sjyUo57wxF8k7JO0zBboLT97KRuRL aTaLHArLslAKpQL+dM35jUoJSz050fs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UKKk5Ljj; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665064024; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Skvbx9xbdFNqMOGk1nVf/Aq3EKcM8w6ila1egEkLVWg=; b=P1Q6UCTArMno4QGsFaUxiX5q+aYwYcbbEOBrsi/ZkvgLxvE9ssLdOsjHUxyBvpJiaTNrY8 TRZbuLSDvF2TwsBaBGaqLkYX4UEjQ4fYb9Ie1yIG/0WDyZ0tK8bkY9zW6AdgslhwF5/x6H i2f2Qe58loD9/qJFO+8TOX+VIVTZkkE= X-Rspamd-Queue-Id: F15E518001A X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=UKKk5Ljj; spf=pass (imf16.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: dd9scdy4m599uz9ytsmtgcceu1ik9hzw X-Rspamd-Server: rspam04 X-HE-Tag: 1665064023-436147 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, we're currently hitting a weird vmap issue in debug kernels with KASAN enabled on fairly large VMs. I reproduced it on v5.19 (did not get the chance to try 6.0 yet because I don't have access to the machine right now, but I suspect it persists). It seems to trigger when udev probes a massive amount of devices in parallel while the system is booting up. Once the system booted, I no longer see any such issues. [ 165.818200] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.836622] vmap allocation for size 315392 failed: use vmalloc= to increase size [ 165.837461] vmap allocation for size 315392 failed: use vmalloc= to increase size [ 165.840573] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.841059] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.841428] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.841819] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.842123] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.843359] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.844894] vmap allocation for size 2498560 failed: use vmalloc= to increase size [ 165.847028] CPU: 253 PID: 4995 Comm: systemd-udevd Not tainted 5.19.0 #2 [ 165.935689] Hardware name: Lenovo ThinkSystem SR950 -[7X12ABC1WW]-/-[7X12ABC1WW]-, BIOS -[PSE130O-1.81]- 05/20/2020 [ 165.947343] Call Trace: [ 165.950075] [ 165.952425] dump_stack_lvl+0x57/0x81 [ 165.956532] warn_alloc.cold+0x95/0x18a [ 165.960836] ? zone_watermark_ok_safe+0x240/0x240 [ 165.966100] ? slab_free_freelist_hook+0x11d/0x1d0 [ 165.971461] ? __get_vm_area_node+0x2af/0x360 [ 165.976341] ? __get_vm_area_node+0x2af/0x360 [ 165.981219] __vmalloc_node_range+0x291/0x560 [ 165.986087] ? __mutex_unlock_slowpath+0x161/0x5e0 [ 165.991447] ? move_module+0x4c/0x630 [ 165.995547] ? vfree_atomic+0xa0/0xa0 [ 165.999647] ? move_module+0x4c/0x630 [ 166.003741] module_alloc+0xe7/0x170 [ 166.007747] ? move_module+0x4c/0x630 [ 166.011840] move_module+0x4c/0x630 [ 166.015751] layout_and_allocate+0x32c/0x560 [ 166.020519] load_module+0x8e0/0x25c0 [ 166.024623] ? layout_and_allocate+0x560/0x560 [ 166.029586] ? kernel_read_file+0x286/0x6b0 [ 166.034269] ? __x64_sys_fspick+0x290/0x290 [ 166.038946] ? userfaultfd_unmap_prep+0x430/0x430 [ 166.044203] ? lock_downgrade+0x130/0x130 [ 166.048698] ? __do_sys_finit_module+0x11a/0x1c0 [ 166.053854] __do_sys_finit_module+0x11a/0x1c0 [ 166.058818] ? __ia32_sys_init_module+0xa0/0xa0 [ 166.063882] ? __seccomp_filter+0x92/0x930 [ 166.068494] do_syscall_64+0x59/0x90 [ 166.072492] ? do_syscall_64+0x69/0x90 [ 166.076679] ? do_syscall_64+0x69/0x90 [ 166.080864] ? do_syscall_64+0x69/0x90 [ 166.085047] ? asm_sysvec_apic_timer_interrupt+0x16/0x20 [ 166.090984] ? lockdep_hardirqs_on+0x79/0x100 [ 166.095855] entry_SYSCALL_64_after_hwframe+0x63/0xcd Some facts: 1. The #CPUs seems to be more important than the #MEM Initially we thought the memory size would be the relevant trigger, because we've only seen it on 8TiB machines. But I was able to reproduce also on a "small" machine with ~450GiB. We've seen this issue only on machines with a lot (~448) logical CPUs. On such systems, I was not able to reproduce when booting the kernel with "nosmt" so far, which could indicate some kind of concurrency problem. 2. CONFIG_KASAN_INLINE seems to be relevant This issue only seems to trigger with KASAN enabled, and what I can tell, only with CONFIG_KASAN_INLINE=y: CONFIG_KASAN_INLINE: "but makes the kernel's .text size much bigger.", that should include kernel module to be loaded. 3. All systems have 8, equally sized NUMA nodes ... which implies, that at least one node is practically completely filled with KASAN data. I remember adjusting the system size with "mem=", such that some nodes were memory-less but NODE 0 would still have some free memory. I remember that it still triggered. My current best guess is that this is a combination of large VMAP demands (e.g., kernel modules with quite a size due to CONFIG_KASAN_INLINE) and eventually a concurrency issue with large #CPUs. But I might be wrong and this might be something zone/node related. Does any of that ring a bell -- especially why it would fail with 448 logical CPUs but succeed with 224 logical CPUs (nosmt)? My best guess would be that the purge_vmap_area_lazy() logic in alloc_vmap_area() might not be sufficient when there is a lot of concurrency: simply purging once and then failing might be problematic in corner cases where there is a lot of concurrent vmap action going on. But that's just my best guess. Cheers! -- Thanks, David / dhildenb