From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2276EB64DD for ; Mon, 24 Jul 2023 05:54:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C48F16B0071; Mon, 24 Jul 2023 01:54:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BF8326B0074; Mon, 24 Jul 2023 01:54:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE9676B0075; Mon, 24 Jul 2023 01:54:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 9C24B6B0071 for ; Mon, 24 Jul 2023 01:54:48 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 66F69C032A for ; Mon, 24 Jul 2023 05:54:48 +0000 (UTC) X-FDA: 81045441456.22.57E1BD9 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf18.hostedemail.com (Postfix) with ESMTP id 9F36D1C0002 for ; Mon, 24 Jul 2023 05:54:45 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf18.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1690178086; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Se/jDDh323WW/+QxZPKUD/SMHTJ7tHnRMekGjbMKPLs=; b=lRnxkL6JX85Ay2uIVz/+5iuDViZSAWQ0KxbpDCNL9YCksnJhkg8o0gj59sAB3XDd8jw1Ht OcMcSZTfvfbVll+8pOPkfcza7LiYE/x4PxlfInLYvWGtvMyXppylZnHjrRo56MqfMULkep OdhFOxG09CAakQjGkSxNJwvmc2pkWN4= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf18.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1690178086; a=rsa-sha256; cv=none; b=cf4FdoNvpduBLiNxlUchniuTjgD4TbTnfXwVX1lex/4S4CGzXCzHw4L4cjZNhNONEsYYTu q2B1akkFRBgShfjghze7g3Mc8Odc1tWeWcLh6z2LvvSY7HNmXyqaeIKgCNsqdVF2fm9yU+ 6meFOl7kXR4unJml1gt96sjTSgXiqX0= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 9A2DDDE0; Sun, 23 Jul 2023 22:55:27 -0700 (PDT) Received: from [10.162.41.7] (a077893.blr.arm.com [10.162.41.7]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E157F3F67D; Sun, 23 Jul 2023 22:54:40 -0700 (PDT) Message-ID: Date: Mon, 24 Jul 2023 11:24:37 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.11.0 Subject: Re: [RFC PATCH] arm64: mm: Fix kernel page tables incorrectly deleted during memory removal Content-Language: en-US To: mawupeng , will@kernel.org, David Hildenbrand Cc: catalin.marinas@arm.com, akpm@linux-foundation.org, sudaraja@codeaurora.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, wangkefeng.wang@huawei.com, linux-arm-kernel@lists.infradead.org, mark.rutland@arm.com References: <20230717115150.1806954-1-mawupeng1@huawei.com> <20230721103628.GA12601@willie-the-truck> <35a0dad6-4f3b-f2c3-f835-b13c1e899f8d@huawei.com> From: Anshuman Khandual In-Reply-To: <35a0dad6-4f3b-f2c3-f835-b13c1e899f8d@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 9F36D1C0002 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 6sdo7qtnaq8kngomgacsihwq7ed93t1z X-HE-Tag: 1690178085-345214 X-HE-Meta: U2FsdGVkX18YQXZkUKe1fnnzUvl3SWxSfDdOHIw0ntw8iPFkyovWORQy9eVtWRElTxFUJ4oElEbLO4/I7Hh2Fo1IYvLTw1aQA8lAnS9jNWfNK7dDYINw+5UPhXWbaXg2MgeJHevSEwlYVPQ51ChGAJYgqFlbgmGesc4w1ib5+U1glyXIbxASXVrcARbDxsUZkSHgLVCXeNdAKFy+oEdmYIwQECX9eJX/9BcgfsDPH/J9o8C2UbIEF6UUj2v99EVgRILCHN3kWnWx5diSaIruQlMYDhi2I4JJc5+LW06Al9h66ZzLA3RQAY5RdToPbOFfNfho2CBDT5a5uFYs6iCZ9sbkXJ/RvfNLffE2+oH95WwazFu/6qZkx8xd+078sE+AG8e6SUDhEfSko+Q2MrpxrYP+iywNahLmh8co98JKJhGRngLdfmxvJz8ht8+MlpOioX+DOXSkRR/ltGHftWzry2FMG8I+H6j0c+OepONNNw6VEyzg+IJB7SN5V8KDctre+HfXj1wC0cmU2IM4uDkT7YXfnABRikIrArL1SXoR1v4zNqS3JFNGjDO+pPMGED4e/1Sv9lwUifjYCU6DM+rqGhwbNCwoNIAWost7GLdlJOz8KzWidXyLbDIdElWfgXCJjml97Fo/90/VciZEVJgJgSOntOR4Lnp022cZcwQnMBwZPq19WN0yW9X3U9eZBQD/bz1GBISo7jbwi4KR23jmnyCQwMwzAXt+HbnerZxD4ftyWvizjsIwHn5WIQD/IYo5wU0ORn5wqamYhUhlXUdUghi0UyE58Tozu3gVKiDB00tSSUM2fR97MpsYAA/7v1V4WrRo2kizvdLUaEgXLiJrP2MqSOOZolJ06dS+9MKH60Req2XvxIc+QMDc01PsgmSx6wwOPPszNSAERxsc5XCDIkIgAvjC2SdimGLencJIhOWrB92SCrIR7gfqEQ3y1qE9DIoQc3IK4RjFR0JGDGY +og1SvJx fgMcHVDsRObGoWO3qJF71rAVT5CyBH7JlNpCzFjFdHy3Q6S3tcLiNnrKJG/PzJHzB2Q/CFUp6cyAXCOjZb83fTylQ7lPX31ATyG9respKZrSswY9bRPUBWgIPrGP9GtojDgiogpZHpG2QT5+0rValPyMEw3VwEGRwuSc8+SqcVznqqu3w6qEZfn39EmMWxML8PryHtkmZGNzMzWIwxbfBUwsPEw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/24/23 06:55, mawupeng wrote: > > On 2023/7/21 18:36, Will Deacon wrote: >> On Mon, Jul 17, 2023 at 07:51:50PM +0800, Wupeng Ma wrote: >>> From: Ma Wupeng >>> >>> During our test, we found that kernel page table may be unexpectedly >>> cleared with rodata off. The root cause is that the kernel page is >>> initialized with pud size(1G block mapping) while offline is memory >>> block size(MIN_MEMORY_BLOCK_SIZE 128M), eg, if 2G memory is hot-added, >>> when offline a memory block, the call trace is shown below, >>> >>> offline_and_remove_memory >>> try_remove_memory >>> arch_remove_memory >>> __remove_pgd_mapping >>> unmap_hotplug_range >>> unmap_hotplug_p4d_range >>> unmap_hotplug_pud_range >>> if (pud_sect(pud)) >>> pud_clear(pudp); >> Sorry, but I'm struggling to understand the problem here. If we're adding >> and removing a 2G memory region, why _wouldn't_ we want to use large 1GiB >> mappings? > >> Or are you saying that only a subset of the memory is removed, >> but we then accidentally unmap the whole thing? > Yes, umap a subset but the whole thing page table entry is removed. > >>> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c >>> index 95d360805f8a..44c724ce4f70 100644 >>> --- a/arch/arm64/mm/mmu.c >>> +++ b/arch/arm64/mm/mmu.c >>> @@ -44,6 +44,7 @@ >>> #define NO_BLOCK_MAPPINGS BIT(0) >>> #define NO_CONT_MAPPINGS BIT(1) >>> #define NO_EXEC_MAPPINGS BIT(2) /* assumes FEAT_HPDS is not used */ >>> +#define NO_PUD_MAPPINGS BIT(3) >>> >>> int idmap_t0sz __ro_after_init; >>> >>> @@ -344,7 +345,7 @@ static void alloc_init_pud(pgd_t *pgdp, unsigned long addr, unsigned long end, >>> */ >>> if (pud_sect_supported() && >>> ((addr | next | phys) & ~PUD_MASK) == 0 && >>> - (flags & NO_BLOCK_MAPPINGS) == 0) { >>> + (flags & (NO_BLOCK_MAPPINGS | NO_PUD_MAPPINGS)) == 0) { >>> pud_set_huge(pudp, phys, prot); >>> >>> /* >>> @@ -1305,7 +1306,7 @@ struct range arch_get_mappable_range(void) >>> int arch_add_memory(int nid, u64 start, u64 size, >>> struct mhp_params *params) >>> { >>> - int ret, flags = NO_EXEC_MAPPINGS; >>> + int ret, flags = NO_EXEC_MAPPINGS | NO_PUD_MAPPINGS; >> I think we should allow large mappings here and instead prevent partial >> removal of the block, if that's what is causing the issue. > This could solve this problem. > Or we can prevent partial removal? Or rebulid page table entry which is not removed? + David Hildenbrand Splitting the block mapping and rebuilding page table entry to reflect non-removed areas will require additional information such as flags and pgtable alloc function as in __create_pgd_mapping(), which need to be passed along, depending on whether it's tearing down vmemmap (would not have PUD block map) or linear mapping. But I am just wondering if we have to go in that direction at all or just prevent partial memory block removal as suggested by Will. - arch_remove_memory() does not have return type, core MM hotremove would not fail because arch_remove_memory() failed or warned - core MM hotremove does check_hotplug_memory_range() which ensures the range and start address are memory_block_size_bytes() aligned - Default memory_block_size_bytes() is dependent on SECTION_SIZE_BITS which on arm64 now can be less than PUD_SIZE triggering this problem. #define MIN_MEMORY_BLOCK_SIZE (1UL << SECTION_SIZE_BITS) unsigned long __weak memory_block_size_bytes(void) { return MIN_MEMORY_BLOCK_SIZE; } EXPORT_SYMBOL_GPL(memory_block_size_bytes); - We would need to override memory_block_size_bytes() on arm64 to accommodate such scenarios here Something like this might work (built but not tested) commit 2eb8dc0d08dfe0b2a3bb71df93b12f7bf74a2ca6 (HEAD) Author: Anshuman Khandual Date: Mon Jul 24 06:45:34 2023 +0100 arm64/mm: Define memory_block_size_bytes() Define memory_block_size_bytes() on arm64 platforms to set minimum hot plug and remove granularity as PUD_SIZE in case where MIN_MEMORY_BLOCK_SIZE just falls below PUD_SIZE. Otherwise a complete PUD block mapping will be teared down while unmapping MIN_MEMORY_BLOCK_SIZE range. Signed-off-by: Anshuman Khandual diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index 95d360805f8a..1918459b3460 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1157,6 +1157,17 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node, } #ifdef CONFIG_MEMORY_HOTPLUG +unsigned long memory_block_size_bytes(void) +{ + /* + * Linear mappings might include PUD based block mappings which + * cannot be teared down in part during memory hotremove. Hence + * PUD_SIZE needs to be the minimum granularity, for memory hot + * removal in case MIN_MEMORY_BLOCK_SIZE falls below. + */ + return max_t(unsigned long, MIN_MEMORY_BLOCK_SIZE, PUD_SIZE); +} + void vmemmap_free(unsigned long start, unsigned long end, struct vmem_altmap *altmap) {