From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8D96EEB64D7 for ; Fri, 30 Jun 2023 08:28:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE2F38D0002; Fri, 30 Jun 2023 04:28:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A92868D0001; Fri, 30 Jun 2023 04:28:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9337E8D0002; Fri, 30 Jun 2023 04:28:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 803248D0001 for ; Fri, 30 Jun 2023 04:28:12 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 34AA340DA3 for ; Fri, 30 Jun 2023 08:28:12 +0000 (UTC) X-FDA: 80958736824.01.72A31EC Received: from mail-wm1-f51.google.com (mail-wm1-f51.google.com [209.85.128.51]) by imf12.hostedemail.com (Postfix) with ESMTP id 31F3E40005 for ; Fri, 30 Jun 2023 08:28:09 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of jirislaby@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=jirislaby@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688113690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cRYyHPJ0l3JptHpg3mFpGfjchjllb0zu8LBlb7XT5hw=; b=g/HvD3mWbHvI8/z8Or2qUT3n1iJ7BIMrMFCnYznXHP85AFbw1yrfIbc+FWrkjWwsCf6lUD fONrHtirjjFc+gLuwqgN6ogi63lF/3xo/kQFUACzSOfV+eQKJ3mKy7AAEi0lrzvYFVN9SK LqsCail80nNagrvHaI4AFSTAA6fS4WU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688113690; a=rsa-sha256; cv=none; b=KQfXbLJBazrnU5u/azWhCT3EsiQV6PnIIz5xO9aa1xSHm5NNGzGTOeAQF65m1PEZ6KvoJr Xbh5L0H760soTJVXUVhxrZLjuSPaL00yA94B/bb7E9hEV/B+vbYYKHtXzIj6njKyIbHsBs UzQ3zeLPcKfJ8gbXJXYEQOK6karyneU= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=none; spf=pass (imf12.hostedemail.com: domain of jirislaby@gmail.com designates 209.85.128.51 as permitted sender) smtp.mailfrom=jirislaby@gmail.com; dmarc=fail reason="SPF not aligned (relaxed), No valid DKIM" header.from=kernel.org (policy=none) Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-3fbc244d386so10538655e9.2 for ; Fri, 30 Jun 2023 01:28:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688113689; x=1690705689; h=content-transfer-encoding:in-reply-to:subject:references:cc:to:from :content-language:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cRYyHPJ0l3JptHpg3mFpGfjchjllb0zu8LBlb7XT5hw=; b=Um8xkBc05jiYy+vh5RZl4UOoz2CN2cYbfKnvRuLVV7IhyGBF6Z/52izhP8dkCIAiPJ b4jARAmp1f5Vixw7HGLmGRti0brO6vWE4pivXat4znzo+TGc97AnsatDuQyzgtKHSdXh 2C2AeuJui30IWr2HwwSZUHTlBKf2iwiA90yTZjAUqju9qDfLIJHcvC7n4aQYXifKhRPs Wt8hDO75o97FieyjgRtrIP4fZ0Iij8MenaibmpZLu5vxe2+TT4KflVE1o/9V2O0psfjc D/QhTThnh50jRSEIq7yn9VlmHclhop0r3S2dtmQc372VE7Omb0vSZ+mY7+1qNPpEPGgi gx8g== X-Gm-Message-State: AC+VfDwJxhqUQbS6+RWDlbV9n8TrFiLidB/P5nOJw2xXOYy0lJ2EThsK LBiioesgSZfNN8i8lDOx3bs= X-Google-Smtp-Source: ACHHUZ4ThngeqilmApKJMZhIhCcwEYSoxdwUR+ukZaaoc8YnAj64SIh0BEAZ3duaqDYdxup994pnqw== X-Received: by 2002:a7b:cb88:0:b0:3fb:b890:128e with SMTP id m8-20020a7bcb88000000b003fbb890128emr1255564wmi.33.1688113688346; Fri, 30 Jun 2023 01:28:08 -0700 (PDT) Received: from ?IPV6:2a0b:e7c0:0:107::aaaa:59? ([2a0b:e7c0:0:107::aaaa:59]) by smtp.gmail.com with ESMTPSA id o23-20020a05600c379700b003fae92e7a8dsm12201517wmr.27.2023.06.30.01.28.06 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 30 Jun 2023 01:28:07 -0700 (PDT) Message-ID: Date: Fri, 30 Jun 2023 10:28:05 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Content-Language: en-US From: Jiri Slaby To: Suren Baghdasaryan Cc: akpm@linux-foundation.org, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com References: <20230227173632.3292573-1-surenb@google.com> <20230227173632.3292573-30-surenb@google.com> <9a8d788c-b8ba-1b8a-fd79-0e25b1b60bed@kernel.org> Subject: Re: [PATCH v4 29/33] x86/mm: try VMA lock-based page fault handling first In-Reply-To: <9a8d788c-b8ba-1b8a-fd79-0e25b1b60bed@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: o4w8688bfnr7sb149hzitq9ffrzac71h X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 31F3E40005 X-Rspam-User: X-HE-Tag: 1688113689-705885 X-HE-Meta: U2FsdGVkX1+PwtQTZVi+iWL7NaWu5Shy+KzuhSI4zEmQRiYKYKVj1ITjjiKPB6Uc8XN4336jOxlPIeunjE8MhRAkjMeAUbtjkzlqGDQd7wRbT+qvxxitlmy8eT8jOYUzEtMVWprp3MTa8kBGk0isqLdfreHFza537v6gSxNkSv/08TbXKc7A536qtpoRI1vRhqXtt+iynLNuc/jmHj25RXp/iUY8gxAzZ5Iuotry1ijZGW26BhNXZIa7EHnEb4tIIHsmS0nopu2n93TBOTow5s3wmh29BkcQcfcLM7BNogryAvP7AbzS8wmNJRQA0yNNn0EaRwgHElVJZwBrxcjrLjTIXNUV4jXb5kKgsJgA+yvypuoX9VSd7sjvNhy62ycgA7JyOu9HS+sqMfCWWFG1c44BigmRL5qQ/iUZTzqxkshu6u/2kuR2tOKXsZvPa9okUavEuLdN8wH7UyZ476+Jda1kVkwChZzXhs+nMfZnUGIMjNWGe01UOwXzhdPBuaCJDSqilHaTqUpXBAm8kLALu7ALNG56lMSfQPIIO1Df7KLU/fEs4YUY5bVAfq3VCMYoCfOWo1dpMHPBsh6z19xVVmtG5wq/kflfqK6yAxttCBtYp7c8ZsxynduObls75GMlnALJw+4ZMoyjo7hy6VSttv6mH3lQiUP2xzgxUM8Ncpi7NAhD3BqcoL2yphZ44tUzMUCXJpncml95ttJd8klK6XUUYvmYvsRHYpwzsikzR6dmDRYIpHnA2eogVmuMA+lshVMcLm6ll+tYX3scY8hKri9LdN3cWmC7JkazdgBpk26XTfj5C+iPKXW4pfL2/duDnbAhL4JymF47prziTaB3N+3oPtNOidw3SbKHXtMZu806DuK5cMXrftju6Y1akatP9tEjJ1zGllHFR/K5bQ1egm+naKmmuL14oOgb7DDynnvs2xDBSTFAxhCWMrV9ptWeNL1Jg5j+JlkeOa/UwF4 LgqRQAWd 0ZMWEW7vNzD19bRRpV02Try8EKJQELL8iuSFc3hS1HFwUOWOBOfGE86o90M8q5RD0PfO8wc+3s/YEHfLUai7wYbQxn2eZQCoi3PPnC7N8NPdKfES2HBskEU6dRIYxtdkYM8L6KULIYlD+jbH67tztBeqtXJptHfNdPuo9xtODfw0qTXpa5Se+2JhwFvig6ofmwnWh2/P16nmTYJWN4g7N689sYZklwQmbkpfPNLRaEAYbHgnfn9MrScf63T7wdLmbEapJHmHKQwEZ7vTiuZJfyz1C0l3Zt0Z9rW8kKrSqroIce+bIsNYn0qhDSI/+rrCdw+yjAuvCUKCdT9N+wDsC+9KteME4Sxu3eB69NoIth8adnOAcDbQWxP0T2R1pMm7y5YK7MFCs5+qEMci05260nGDQOKgl6F2wLTbGJUdO73GmBYpMWyPoBumg+vhSgX4QvVrkpFTt7En1gF1utgJSpRTt5e19/yuKXXHpp9wsxvkC1eXlMWSf6IcznlZDuAQv0VRmXO2KPz3eFjOi/h1pMxdGRl0lGN8MSB0LdqWtZ6H/L3VngxtAruPSooKvsdTT+HcILLdD8aq5Mx2TDuyELouJMWBXqRfpSN01V6mzlkJNCz7KSRc3adjU8w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 30. 06. 23, 8:35, Jiri Slaby wrote: > On 29. 06. 23, 17:30, Suren Baghdasaryan wrote: >> On Thu, Jun 29, 2023 at 7:40 AM Jiri Slaby wrote: >>> >>> Hi, >>> >>> On 27. 02. 23, 18:36, Suren Baghdasaryan wrote: >>>> Attempt VMA lock-based page fault handling first, and fall back to the >>>> existing mmap_lock-based handling if that fails. >>>> >>>> Signed-off-by: Suren Baghdasaryan >>>> --- >>>>    arch/x86/Kconfig    |  1 + >>>>    arch/x86/mm/fault.c | 36 ++++++++++++++++++++++++++++++++++++ >>>>    2 files changed, 37 insertions(+) >>>> >>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig >>>> index a825bf031f49..df21fba77db1 100644 >>>> --- a/arch/x86/Kconfig >>>> +++ b/arch/x86/Kconfig >>>> @@ -27,6 +27,7 @@ config X86_64 >>>>        # Options that are inherently 64-bit kernel only: >>>>        select ARCH_HAS_GIGANTIC_PAGE >>>>        select ARCH_SUPPORTS_INT128 if CC_HAS_INT128 >>>> +     select ARCH_SUPPORTS_PER_VMA_LOCK >>>>        select ARCH_USE_CMPXCHG_LOCKREF >>>>        select HAVE_ARCH_SOFT_DIRTY >>>>        select MODULES_USE_ELF_RELA >>>> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c >>>> index a498ae1fbe66..e4399983c50c 100644 >>>> --- a/arch/x86/mm/fault.c >>>> +++ b/arch/x86/mm/fault.c >>>> @@ -19,6 +19,7 @@ >>>>    #include           /* >>>> faulthandler_disabled()      */ >>>>    #include                       /* >>>> efi_crash_gracefully_on_page_fault()*/ >>>>    #include >>>> +#include                         /* find_and_lock_vma() */ >>>> >>>>    #include          /* boot_cpu_has, >>>> ...            */ >>>>    #include                       /* dotraplinkage, >>>> ...           */ >>>> @@ -1333,6 +1334,38 @@ void do_user_addr_fault(struct pt_regs *regs, >>>>        } >>>>    #endif >>>> >>>> +#ifdef CONFIG_PER_VMA_LOCK >>>> +     if (!(flags & FAULT_FLAG_USER)) >>>> +             goto lock_mmap; >>>> + >>>> +     vma = lock_vma_under_rcu(mm, address); >>>> +     if (!vma) >>>> +             goto lock_mmap; >>>> + >>>> +     if (unlikely(access_error(error_code, vma))) { >>>> +             vma_end_read(vma); >>>> +             goto lock_mmap; >>>> +     } >>>> +     fault = handle_mm_fault(vma, address, flags | >>>> FAULT_FLAG_VMA_LOCK, regs); >>>> +     vma_end_read(vma); >>>> + >>>> +     if (!(fault & VM_FAULT_RETRY)) { >>>> +             count_vm_vma_lock_event(VMA_LOCK_SUCCESS); >>>> +             goto done; >>>> +     } >>>> +     count_vm_vma_lock_event(VMA_LOCK_RETRY); >>> >>> This is apparently not strong enough as it causes go build failures >>> like: >>> >>> [  409s] strconv >>> [  409s] releasep: m=0x579e2000 m->p=0x5781c600 p->m=0x0 p->status=2 >>> [  409s] fatal error: releasep: invalid p state >>> [  409s] >>> >>> [  325s] hash/adler32 >>> [  325s] hash/crc32 >>> [  325s] cmd/internal/codesign >>> [  336s] fatal error: runtime: out of memory >> >> Hi Jiri, >> Thanks for reporting! I'm not familiar with go builds. Could you >> please explain the error to me or point me to some documentation to >> decipher that error? > > Sorry, we are on the same boat -- me neither. It only popped up in our > (openSUSE) build system and I only tracked it down by bisection. Let me > know if I can try something (like a patch or gathering some debug info). FWIW, a failed build log: https://decibel.fi.muni.cz/~xslaby/n/vma/log.txt and a strace for it: https://decibel.fi.muni.cz/~xslaby/n/vma/strace.txt An excerpt from the log: [ 55s] runtime: marked free object in span 0x7fca6824bec8, elemsize=192 freeindex=0 (bad use of unsafe.Pointer? try -d=checkptr) [ 55s] 0xc0002f2000 alloc marked [ 55s] 0xc0002f20c0 alloc marked [ 55s] 0xc0002f2180 alloc marked [ 55s] 0xc0002f2240 free unmarked [ 55s] 0xc0002f2300 alloc marked [ 55s] 0xc0002f23c0 alloc marked [ 55s] 0xc0002f2480 alloc marked [ 55s] 0xc0002f2540 alloc marked [ 55s] 0xc0002f2600 alloc marked [ 55s] 0xc0002f26c0 alloc marked [ 55s] 0xc0002f2780 alloc marked [ 55s] 0xc0002f2840 alloc marked [ 55s] 0xc0002f2900 alloc marked [ 55s] 0xc0002f29c0 free unmarked [ 55s] 0xc0002f2a80 alloc marked [ 55s] 0xc0002f2b40 alloc marked [ 55s] 0xc0002f2c00 alloc marked [ 55s] 0xc0002f2cc0 alloc marked [ 55s] 0xc0002f2d80 alloc marked [ 55s] 0xc0002f2e40 alloc marked [ 55s] 0xc0002f2f00 alloc marked [ 55s] 0xc0002f2fc0 alloc marked [ 55s] 0xc0002f3080 alloc marked [ 55s] 0xc0002f3140 alloc marked [ 55s] 0xc0002f3200 alloc marked [ 55s] 0xc0002f32c0 alloc marked [ 55s] 0xc0002f3380 free unmarked [ 55s] 0xc0002f3440 free marked zombie An excerpt from strace: > 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa6a1b990, parent_tid=0x7fcaa6a1b990, exit_signal=0, stack=0x7fcaa621b000, stack_size=0x7ffe00, tls=0x7fcaa6a1b6c0} => {parent_tid=[2350]}, 88) = 2350 > 2348 clone3({flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, child_tid=0x7fcaa5882990, parent_tid=0x7fcaa5882990, exit_signal=0, stack=0x7fcaa5082000, stack_size=0x7ffe00, tls=0x7fcaa58826c0} => {parent_tid=[2351]}, 88) = 2351 > 2350 <... clone3 resumed> => {parent_tid=[2372]}, 88) = 2372 > 2351 <... clone3 resumed> => {parent_tid=[2354]}, 88) = 2354 > 2351 <... clone3 resumed> => {parent_tid=[2357]}, 88) = 2357 > 2354 <... clone3 resumed> => {parent_tid=[2355]}, 88) = 2355 > 2355 <... clone3 resumed> => {parent_tid=[2370]}, 88) = 2370 > 2370 mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 > 2370 <... mmap resumed>) = 0x7fca68249000 > 2372 <... clone3 resumed> => {parent_tid=[2384]}, 88) = 2384 > 2384 <... clone3 resumed> => {parent_tid=[2388]}, 88) = 2388 > 2388 <... clone3 resumed> => {parent_tid=[2392]}, 88) = 2392 > 2392 <... clone3 resumed> => {parent_tid=[2395]}, 88) = 2395 > 2395 write(2, "runtime: marked free object in s"..., 36 I.e. IIUC, all are threads (CLONE_VM) and thread 2370 mapped ANON 0x7fca68249000 - 0x7fca6827ffff and go in thread 2395 thinks for some reason 0x7fca6824bec8 in that region is "bad". > thanks,-- -- js suse labs