From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-11.5 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_2 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 497A2C433E0 for ; Wed, 29 Jul 2020 08:33:27 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 16127206D4 for ; Wed, 29 Jul 2020 08:33:27 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="H/8SLoKR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 16127206D4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=Huawei.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID: Subject:To:From:Date:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=L45G9OGVVyVnZrz1Q9/a10XbmnTMumRjvERtO9+cxCA=; b=H/8SLoKRZZMX8lCflU8W67TvN fmYg+wBFcc0IAvRFj8t7bCM+sZTvKWvDYD3DEF8nHHz+2abUJqneBI2WUHWoidsAl9a2TKaxjKzFV pJVbTShJTq4GmBjk+D7tvgSdFjXyn428rhxWBZauXJY7FaIbPnugo6Kv6+sWdFsGkqmg9TNjUHgQ3 gKrr5cAdlFqA86nCnrLRk2bQ9CjyFcZCmYmg17NZ3OGLmLkFvG/T5vJ7Z9TBdzWphMKUlicVQxzlk qw/D+feaGbsFwbCY2lZ1MhsXltO88LovcvAAR2HLZOMGzgqN7oGQhVWQrhA3JWKQ0iS2BEipwZnHk sffPXK7Hw==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1k0hVP-0006kI-PG; Wed, 29 Jul 2020 08:32:07 +0000 Received: from lhrrgout.huawei.com ([185.176.76.210] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1k0hVM-0006iU-UP; Wed, 29 Jul 2020 08:32:06 +0000 Received: from lhreml710-chm.china.huawei.com (unknown [172.18.7.106]) by Forcepoint Email with ESMTP id 2439AD46260FB3D2AD69; Wed, 29 Jul 2020 09:31:58 +0100 (IST) Received: from localhost (10.52.120.141) by lhreml710-chm.china.huawei.com (10.201.108.61) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1913.5; Wed, 29 Jul 2020 09:31:56 +0100 Date: Wed, 29 Jul 2020 09:30:31 +0100 From: Jonathan Cameron To: Mike Rapoport Subject: Re: [PATCH 04/15] arm64: numa: simplify dummy_numa_init() Message-ID: <20200729093031.0000316b@Huawei.com> In-Reply-To: <20200728051153.1590-5-rppt@kernel.org> References: <20200728051153.1590-1-rppt@kernel.org> <20200728051153.1590-5-rppt@kernel.org> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 3.17.4 (GTK+ 2.24.32; i686-w64-mingw32) MIME-Version: 1.0 X-Originating-IP: [10.52.120.141] X-ClientProxiedBy: lhreml738-chm.china.huawei.com (10.201.108.188) To lhreml710-chm.china.huawei.com (10.201.108.61) X-CFilter-Loop: Reflected X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200729_043205_110108_48926770 X-CRM114-Status: GOOD ( 35.03 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-sh@vger.kernel.org, Peter Zijlstra , Catalin Marinas , Dave Hansen , linux-kernel@vger.kernel.org, Max Filippov , Paul Mackerras , sparclinux@vger.kernel.org, linux-riscv@lists.infradead.org, Will Deacon , Thomas Gleixner , linux-s390@vger.kernel.org, linux-c6x-dev@linux-c6x.org, Yoshinori Sato , Michael Ellerman , x86@kernel.org, Russell King , Mike Rapoport , clang-built-linux@googlegroups.com, Ingo Molnar , Christoph Hellwig , Benjamin Herrenschmidt , uclinux-h8-devel@lists.sourceforge.jp, linux-xtensa@linux-xtensa.org, openrisc@lists.librecores.org, Borislav Petkov , Andy Lutomirski , Paul Walmsley , Stafford Horne , linux-arm-kernel@lists.infradead.org, Michal Simek , linux-mm@kvack.org, linux-mips@vger.kernel.org, iommu@lists.linux-foundation.org, Palmer Dabbelt , Andrew Morton , linuxppc-dev@lists.ozlabs.org Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org On Tue, 28 Jul 2020 08:11:42 +0300 Mike Rapoport wrote: > From: Mike Rapoport > > dummy_numa_init() loops over memblock.memory and passes nid=0 to > numa_add_memblk() which essentially wraps memblock_set_node(). However, > memblock_set_node() can cope with entire memory span itself, so the loop > over memblock.memory regions is redundant. > > Replace the loop with a single call to memblock_set_node() to the entire > memory. Hi Mike, I had a similar patch I was going to post shortly so can add a bit more on the advantages of this one. Beyond cleaning up, it also fixes an issue with a buggy ACPI firmware in which the SRAT table covers some but not all of the memory in the EFI memory map. Stealing bits from the draft cover letter I had for that... > This issue can be easily triggered by having an SRAT table which fails > to cover all elements of the EFI memory map. > > This firmware error is detected and a warning printed. e.g. > "NUMA: Warning: invalid memblk node 64 [mem 0x240000000-0x27fffffff]" > At that point we fall back to dummy_numa_init(). > > However, the failed ACPI init has left us with our memblocks all broken > up as we split them when trying to assign them to NUMA nodes. > > We then iterate over the memblocks and add them to node 0. > > for_each_memblock(memory, mblk) { > ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size); > if (!ret) > continue; > pr_err("NUMA init failed\n"); > return ret; > } > > numa_add_memblk() calls memblock_set_node() which merges regions that > were previously split up during the earlier attempt to add them to different > nodes during parsing of SRAT. > > This means elements are moved in the memblock array and we can end up > in a different memblock after the call to numa_add_memblk(). > Result is: > > Unable to handle kernel paging request at virtual address 0000000000003a40 > Mem abort info: > ESR = 0x96000004 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > Data abort info: > ISV = 0, ISS = 0x00000004 > CM = 0, WnR = 0 > [0000000000003a40] user address but active_mm is swapper > Internal error: Oops: 96000004 [#1] PREEMPT SMP > > ... > > Call trace: > sparse_init_nid+0x5c/0x2b0 > sparse_init+0x138/0x170 > bootmem_init+0x80/0xe0 > setup_arch+0x2a0/0x5fc > start_kernel+0x8c/0x648 > > As an illustrative example: > EFI table has one block of memory. > memblks[0] = [0...0x2f] so we start with a single memblock. > > SRAT has > [0x00...0x0f] in node 0 > [0x10...0x1f] in node 1 > but no entry covering > [0x20...0x2f]. > > Whilst parsing SRAT the single memblock is broken into 3. > memblks[0] = [0x00...0x0f] in node 0 > memblks[1] = [0x10...0x1f] in node 1 > memblks[2] = [0x20...0x2f] in node MAX_NUM_NODES (invalid value) > > A sanity check parse then detects the invalid section and acpi_numa_init > fails. We then fall back to the dummy path. > > That iterates over the memblocks. We'll use i an index in the array of memblocks > > i = 0; > memblks[0] = [0x00...0x0f] set to node0. > merge doesn't do anything because the neighbouring memblock is still in node1. > > i = 1 > memblks[1] = [0x10...0x1f] set to node 0. > merge combines memblock 0 and 1 to give a new set of memblocks. > > memblks[0] = [0x00..0x1f] in node 0 > memblks[1] = [0x20..0x2f] in node MAX_NUM_NODES. > > i = 2 off the end of the now reduced array of memblocks, so exit the loop. > (if we restart the loop here everything will be fine). > > Later sparse_init_nid tries to use the node of the second memblock to index > somethings and boom. > > Signed-off-by: Mike Rapoport Acked-by: Jonathan Cameron > --- > arch/arm64/mm/numa.c | 13 +++++-------- > 1 file changed, 5 insertions(+), 8 deletions(-) > > diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c > index aafcee3e3f7e..0cbdbcc885fb 100644 > --- a/arch/arm64/mm/numa.c > +++ b/arch/arm64/mm/numa.c > @@ -423,19 +423,16 @@ static int __init numa_init(int (*init_func)(void)) > */ > static int __init dummy_numa_init(void) > { > + phys_addr_t start = memblock_start_of_DRAM(); > + phys_addr_t end = memblock_end_of_DRAM(); > int ret; > - struct memblock_region *mblk; > > if (numa_off) > pr_info("NUMA disabled\n"); /* Forced off on command line. */ > - pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", > - memblock_start_of_DRAM(), memblock_end_of_DRAM() - 1); > - > - for_each_memblock(memory, mblk) { > - ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size); > - if (!ret) > - continue; > + pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n", start, end - 1); > > + ret = numa_add_memblk(0, start, end); > + if (ret) { > pr_err("NUMA init failed\n"); > return ret; > } _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel