From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 48C58C433EF for ; Thu, 7 Apr 2022 14:37:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B44276B0072; Thu, 7 Apr 2022 10:36:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AF3A76B0073; Thu, 7 Apr 2022 10:36:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9BAF06B0074; Thu, 7 Apr 2022 10:36:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0095.hostedemail.com [216.40.44.95]) by kanga.kvack.org (Postfix) with ESMTP id 8C6AB6B0072 for ; Thu, 7 Apr 2022 10:36:57 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 49C82A7294 for ; Thu, 7 Apr 2022 14:36:47 +0000 (UTC) X-FDA: 79330334454.26.5CEAB2C Received: from mail-wr1-f50.google.com (mail-wr1-f50.google.com [209.85.221.50]) by imf28.hostedemail.com (Postfix) with ESMTP id 8B548C0008 for ; Thu, 7 Apr 2022 14:36:46 +0000 (UTC) Received: by mail-wr1-f50.google.com with SMTP id d29so8129076wra.10 for ; Thu, 07 Apr 2022 07:36:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chrisdown.name; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=jhJtQXcRIdMjtZ+UGoFA7MPjRpiS33QZzck/6eOv4G4=; b=juK/0ZEhQEEU8TugD7hRogiYaFXObvP+OzFp8Y3sWrhGfdsTGz+Eso7xTXosySzjFe nQ7T3dBp+NqtZmJSDlHw5r6lAd6Cn+Kk8m2pbm6yXUSI/u9MT7SnbBO90FFM/1EaXayy fZwdLoD9uvcuhZIREkEgjViy2RC3CTTtUlFtk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=jhJtQXcRIdMjtZ+UGoFA7MPjRpiS33QZzck/6eOv4G4=; b=eNkleBBsU0LKxy7VFafca1mcaPOm/VfoeKvOka2tfW2uQYKU2vCYcPzwKxleEeMQnn cW7TynNFS7san1eZqdDzlci4fUQaE/g4ySknJ8ZS2mMEd+tqLtxbgy9QinjJw8XLHjsc AqqctD1vekLmbSPOzwQ9eEIPMwzeG6enSLTrYpZG0k/3OEg1ToKkSVQBaFTpT+4fdbqV ADegYxxnZM0RE1eDa1J3A2PfoEA6IKN/aJ+pDnN3SaB/jj/L/f7Y2UjFtCj3zIk9ytbG el2osClAT/qyI5GC1kamnSluRa0LoYx1S4w/DBDQCpA77j7MDsNMCQPmycHGToCG4KJw 2tKg== X-Gm-Message-State: AOAM530eEf7K0PgJFLb6Xi4V3k2VH6JZqF7YBKdkb1KCVhWrrmHoUQk4 GcwTF8Obkr9Q8j+rSFnp8aePqw== X-Google-Smtp-Source: ABdhPJxfrfLzjnBT2A0FhcXr0LsT33tqg402ZkX6HencpqDgzsEIppCtYEVfbGikekrxc5RzNxt72Q== X-Received: by 2002:adf:bc0d:0:b0:206:14a6:817f with SMTP id s13-20020adfbc0d000000b0020614a6817fmr11163069wrg.136.1649342205025; Thu, 07 Apr 2022 07:36:45 -0700 (PDT) Received: from localhost ([93.115.193.42]) by smtp.gmail.com with ESMTPSA id o8-20020a5d6488000000b002051f1028f6sm20471173wri.111.2022.04.07.07.36.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 07:36:44 -0700 (PDT) Date: Thu, 7 Apr 2022 15:36:43 +0100 From: Chris Down To: Omar Sandoval Cc: linux-mm@kvack.org, kexec@lists.infradead.org, Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Baoquan He , x86@kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2] mm/vmalloc: fix spinning drain_vmap_work after reading from /proc/vmcore Message-ID: References: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <52f819991051f9b865e9ce25605509bfdbacadcd.1649277321.git.osandov@fb.com> User-Agent: Mutt/2.2.2 (aa28abe8) (2022-03-25) X-Stat-Signature: 1nmsrwy4fc5ahuu5ji44fuk5wnapax81 Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=chrisdown.name header.s=google header.b="juK/0ZEh"; spf=pass (imf28.hostedemail.com: domain of chris@chrisdown.name designates 209.85.221.50 as permitted sender) smtp.mailfrom=chris@chrisdown.name; dmarc=pass (policy=none) header.from=chrisdown.name X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 8B548C0008 X-HE-Tag: 1649342206-701791 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Omar Sandoval writes: >From: Omar Sandoval > >Commit 3ee48b6af49c ("mm, x86: Saving vmcore with non-lazy freeing of >vmas") introduced set_iounmap_nonlazy(), which sets vmap_lazy_nr to >lazy_max_pages() + 1, ensuring that any future vunmaps() immediately >purge the vmap areas instead of doing it lazily. > >Commit 690467c81b1a ("mm/vmalloc: Move draining areas out of caller >context") moved the purging from the vunmap() caller to a worker thread. >Unfortunately, set_iounmap_nonlazy() can cause the worker thread to spin >(possibly forever). For example, consider the following scenario: > >1. Thread reads from /proc/vmcore. This eventually calls > __copy_oldmem_page() -> set_iounmap_nonlazy(), which sets > vmap_lazy_nr to lazy_max_pages() + 1. >2. Then it calls free_vmap_area_noflush() (via iounmap()), which adds 2 > pages (one page plus the guard page) to the purge list and > vmap_lazy_nr. vmap_lazy_nr is now lazy_max_pages() + 3, so the > drain_vmap_work is scheduled. >3. Thread returns from the kernel and is scheduled out. >4. Worker thread is scheduled in and calls drain_vmap_area_work(). It > frees the 2 pages on the purge list. vmap_lazy_nr is now > lazy_max_pages() + 1. >5. This is still over the threshold, so it tries to purge areas again, > but doesn't find anything. >6. Repeat 5. > >If the system is running with only one CPU (which is typicial for kdump) >and preemption is disabled, then this will never make forward progress: >there aren't any more pages to purge, so it hangs. If there is more than >one CPU or preemption is enabled, then the worker thread will spin >forever in the background. (Note that if there were already pages to be >purged at the time that set_iounmap_nonlazy() was called, this bug is >avoided.) > >This can be reproduced with anything that reads from /proc/vmcore >multiple times. E.g., vmcore-dmesg /proc/vmcore. > >It turns out that improvements to vmap() over the years have obsoleted >the need for this "optimization". I benchmarked >`dd if=/proc/vmcore of=/dev/null` with 4k and 1M read sizes on a system >with a 32GB vmcore. The test was run on 5.17, 5.18-rc1 with a fix that >avoided the hang, and 5.18-rc1 with set_iounmap_nonlazy() removed >entirely: > > |5.17 |5.18+fix|5.18+removal >4k|40.86s| 40.09s| 26.73s >1M|24.47s| 23.98s| 21.84s > >The removal was the fastest (by a wide margin with 4k reads). This patch >removes set_iounmap_nonlazy(). > >Signed-off-by: Omar Sandoval It probably doesn't matter, but maybe worth adding in a Fixes tag just to make sure anyone getting this without context understands that 690467c81b1a ("mm/vmalloc: Move draining areas out of caller context") shouldn't reach further rcs without this. Unlikely that would happen anyway, though. Nice use of a bug as an impetus to clean things up :-) Thanks! Acked-by: Chris Down >--- >Changes from v1: > >- Remove set_iounmap_nonlazy() entirely instead of fixing it. > > arch/x86/include/asm/io.h | 2 -- > arch/x86/kernel/crash_dump_64.c | 1 - > mm/vmalloc.c | 11 ----------- > 3 files changed, 14 deletions(-) > >diff --git a/arch/x86/include/asm/io.h b/arch/x86/include/asm/io.h >index f6d91ecb8026..e9736af126b2 100644 >--- a/arch/x86/include/asm/io.h >+++ b/arch/x86/include/asm/io.h >@@ -210,8 +210,6 @@ void __iomem *ioremap(resource_size_t offset, unsigned long size); > extern void iounmap(volatile void __iomem *addr); > #define iounmap iounmap > >-extern void set_iounmap_nonlazy(void); >- > #ifdef __KERNEL__ > > void memcpy_fromio(void *, const volatile void __iomem *, size_t); >diff --git a/arch/x86/kernel/crash_dump_64.c b/arch/x86/kernel/crash_dump_64.c >index a7f617a3981d..97529552dd24 100644 >--- a/arch/x86/kernel/crash_dump_64.c >+++ b/arch/x86/kernel/crash_dump_64.c >@@ -37,7 +37,6 @@ static ssize_t __copy_oldmem_page(unsigned long pfn, char *buf, size_t csize, > } else > memcpy(buf, vaddr + offset, csize); > >- set_iounmap_nonlazy(); > iounmap((void __iomem *)vaddr); > return csize; > } >diff --git a/mm/vmalloc.c b/mm/vmalloc.c >index e163372d3967..0b17498a34f1 100644 >--- a/mm/vmalloc.c >+++ b/mm/vmalloc.c >@@ -1671,17 +1671,6 @@ static DEFINE_MUTEX(vmap_purge_lock); > /* for per-CPU blocks */ > static void purge_fragmented_blocks_allcpus(void); > >-#ifdef CONFIG_X86_64 >-/* >- * called before a call to iounmap() if the caller wants vm_area_struct's >- * immediately freed. >- */ >-void set_iounmap_nonlazy(void) >-{ >- atomic_long_set(&vmap_lazy_nr, lazy_max_pages()+1); >-} >-#endif /* CONFIG_X86_64 */ >- > /* > * Purges all lazily-freed vmap areas. > */ >-- >2.35.1 > >