From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:25425 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729589AbgC3I2V (ORCPT ); Mon, 30 Mar 2020 04:28:21 -0400 Date: Mon, 30 Mar 2020 16:28:09 +0800 From: Baoquan He Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Message-ID: <20200330082809.GB6352@MiWiFi-R3L-srv> References: <1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com> <20200330074246.GA14243@dhcp22.suse.cz> <20200330081659.GA6352@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200330081659.GA6352@MiWiFi-R3L-srv> Sender: linux-s390-owner@vger.kernel.org List-ID: To: Michal Hocko Cc: Hoan Tran , Catalin Marinas , Will Deacon , Andrew Morton , Vlastimil Babka , Oscar Salvador , Pavel Tatashin , Mike Rapoport , Alexander Duyck , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H. Peter Anvin" , "David S. Miller" , Heiko Carstens , Vasily Gorbik , Christian Borntraeger , "open list:MEMORY MANAGEMENT" , linux-arm-kernel@lists.infradead.org, linux-s390@vger.kernel.org, sparclinux@vger.kernel.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, lho@amperecomputing.com, mmorana@amperecomputing.com On 03/30/20 at 04:16pm, Baoquan He wrote: > On 03/30/20 at 09:42am, Michal Hocko wrote: > > On Sat 28-03-20 11:31:17, Hoan Tran wrote: > > > In NUMA layout which nodes have memory ranges that span across other nodes, > > > the mm driver can detect the memory node id incorrectly. > > > > > > For example, with layout below > > > Node 0 address: 0000 xxxx 0000 xxxx > > > Node 1 address: xxxx 1111 xxxx 1111 > > > > > > Note: > > > - Memory from low to high > > > - 0/1: Node id > > > - x: Invalid memory of a node > > > > > > When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES > > > config, mm only checks the memory validity but not the node id. > > > Because of that, Node 1 also detects the memory from node 0 as below > > > when it scans from the start address to the end address of node 1. > > > > > > Node 0 address: 0000 xxxx xxxx xxxx > > > Node 1 address: xxxx 1111 1111 1111 > > > > > > This layout could occur on any architecture. Most of them enables > > > this config by default with CONFIG_NUMA. This patch, by default, enables > > > CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA. > > > > I am not opposed to this at all. It reduces the config space and that is > > a good thing on its own. The history has shown that meory layout might > > be really wild wrt NUMA. The config is only used for early_pfn_in_nid > > which is clearly an overkill. > > > > Your description doesn't really explain why this is safe though. The > > history of this config is somehow messy, though. Mike has tried > > to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent > > NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd > > ("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any > > reasoning what so ever. This doesn't make it really easy see whether > > reasons for reintroduction are still there. Maybe there are some subtle > > dependencies. I do not see any TBH but that might be burried deep in an > > arch specific code. > > Yeah, since early_pfnnid_cache was added, we do not need worry about the > performance. But when I read the mem init code on x86 again, I do see there > are codes to handle the node overlapping, e.g in numa_cleanup_meminfo(), > when store node id into memblock. But the thing is if we have > encountered the node overlapping, we just return ahead of time, leave > something uninitialized. I am wondering if the system with node > overlapping can still run heathily. Ok, I didn't read code carefully. That is handling case where memblock with different node id overlap, it needs return. In the example Hoan gave, it has no problem, system can run well. Please ignore above comment. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.0 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 283CBC43331 for ; Mon, 30 Mar 2020 08:30:20 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [203.11.71.2]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6D33C2073B for ; Mon, 30 Mar 2020 08:30:19 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="c4JjCTBb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6D33C2073B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 48rQd472j0zDqjV for ; Mon, 30 Mar 2020 19:30:16 +1100 (AEDT) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=redhat.com (client-ip=63.128.21.74; helo=us-smtp-delivery-74.mimecast.com; envelope-from=bhe@redhat.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: lists.ozlabs.org; dkim=pass (1024-bit key; unprotected) header.d=redhat.com header.i=@redhat.com header.a=rsa-sha256 header.s=mimecast20190719 header.b=c4JjCTBb; dkim-atps=neutral Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [63.128.21.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 48rQZx2yLHzDqQK for ; Mon, 30 Mar 2020 19:28:22 +1100 (AEDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585556899; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=urh/QqSLIbmDRa1+bqKhLnHsVy5DBePTPVKLtVq9St4=; b=c4JjCTBbvKYOmY0hdbzMsL38r5krdPa+qcOcOmzgmqvhggwM7pLxCLyQJUIRoXVdTb7Pqd RL8mrIxa6iUlkcPNqT2r9fqNR0zFbhnjBNfXaL6PZahUT9UQVI2FOY8Lwaj74mBeDPV6Ki hLaljFtgGShVW63W0McVq9qoQYf4yn8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-472-SBmE0C2qM4iblPlKXUQf2w-1; Mon, 30 Mar 2020 04:28:18 -0400 X-MC-Unique: SBmE0C2qM4iblPlKXUQf2w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9A90F18CA243; Mon, 30 Mar 2020 08:28:14 +0000 (UTC) Received: from localhost (ovpn-12-53.pek2.redhat.com [10.72.12.53]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4451B100EBAF; Mon, 30 Mar 2020 08:28:13 +0000 (UTC) Date: Mon, 30 Mar 2020 16:28:09 +0800 From: Baoquan He To: Michal Hocko Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Message-ID: <20200330082809.GB6352@MiWiFi-R3L-srv> References: <1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com> <20200330074246.GA14243@dhcp22.suse.cz> <20200330081659.GA6352@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200330081659.GA6352@MiWiFi-R3L-srv> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mmorana@amperecomputing.com, Catalin Marinas , Heiko Carstens , "open list:MEMORY MANAGEMENT" , Paul Mackerras , "H. Peter Anvin" , sparclinux@vger.kernel.org, Alexander Duyck , linux-s390@vger.kernel.org, x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Hoan Tran , Pavel Tatashin , lho@amperecomputing.com, Vasily Gorbik , Vlastimil Babka , Will Deacon , Borislav Petkov , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Oscar Salvador , linux-kernel@vger.kernel.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "David S. Miller" Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On 03/30/20 at 04:16pm, Baoquan He wrote: > On 03/30/20 at 09:42am, Michal Hocko wrote: > > On Sat 28-03-20 11:31:17, Hoan Tran wrote: > > > In NUMA layout which nodes have memory ranges that span across other nodes, > > > the mm driver can detect the memory node id incorrectly. > > > > > > For example, with layout below > > > Node 0 address: 0000 xxxx 0000 xxxx > > > Node 1 address: xxxx 1111 xxxx 1111 > > > > > > Note: > > > - Memory from low to high > > > - 0/1: Node id > > > - x: Invalid memory of a node > > > > > > When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES > > > config, mm only checks the memory validity but not the node id. > > > Because of that, Node 1 also detects the memory from node 0 as below > > > when it scans from the start address to the end address of node 1. > > > > > > Node 0 address: 0000 xxxx xxxx xxxx > > > Node 1 address: xxxx 1111 1111 1111 > > > > > > This layout could occur on any architecture. Most of them enables > > > this config by default with CONFIG_NUMA. This patch, by default, enables > > > CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA. > > > > I am not opposed to this at all. It reduces the config space and that is > > a good thing on its own. The history has shown that meory layout might > > be really wild wrt NUMA. The config is only used for early_pfn_in_nid > > which is clearly an overkill. > > > > Your description doesn't really explain why this is safe though. The > > history of this config is somehow messy, though. Mike has tried > > to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent > > NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd > > ("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any > > reasoning what so ever. This doesn't make it really easy see whether > > reasons for reintroduction are still there. Maybe there are some subtle > > dependencies. I do not see any TBH but that might be burried deep in an > > arch specific code. > > Yeah, since early_pfnnid_cache was added, we do not need worry about the > performance. But when I read the mem init code on x86 again, I do see there > are codes to handle the node overlapping, e.g in numa_cleanup_meminfo(), > when store node id into memblock. But the thing is if we have > encountered the node overlapping, we just return ahead of time, leave > something uninitialized. I am wondering if the system with node > overlapping can still run heathily. Ok, I didn't read code carefully. That is handling case where memblock with different node id overlap, it needs return. In the example Hoan gave, it has no problem, system can run well. Please ignore above comment. From mboxrd@z Thu Jan 1 00:00:00 1970 From: Baoquan He Date: Mon, 30 Mar 2020 08:28:09 +0000 Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Message-Id: <20200330082809.GB6352@MiWiFi-R3L-srv> List-Id: References: <1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com> <20200330074246.GA14243@dhcp22.suse.cz> <20200330081659.GA6352@MiWiFi-R3L-srv> In-Reply-To: <20200330081659.GA6352@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: Michal Hocko Cc: mmorana@amperecomputing.com, Catalin Marinas , Heiko Carstens , "open list:MEMORY MANAGEMENT" , Paul Mackerras , "H. Peter Anvin" , sparclinux@vger.kernel.org, Alexander Duyck , linux-s390@vger.kernel.org, Michael Ellerman , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Hoan Tran , Benjamin Herrenschmidt , Pavel Tatashin , lho@amperecomputing.com, Vasily Gorbik , Vlastimil Babka , Will Deacon , Borislav Petkov , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Oscar Salvador , linux-kernel@vger.kernel.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "David S. Miller" On 03/30/20 at 04:16pm, Baoquan He wrote: > On 03/30/20 at 09:42am, Michal Hocko wrote: > > On Sat 28-03-20 11:31:17, Hoan Tran wrote: > > > In NUMA layout which nodes have memory ranges that span across other nodes, > > > the mm driver can detect the memory node id incorrectly. > > > > > > For example, with layout below > > > Node 0 address: 0000 xxxx 0000 xxxx > > > Node 1 address: xxxx 1111 xxxx 1111 > > > > > > Note: > > > - Memory from low to high > > > - 0/1: Node id > > > - x: Invalid memory of a node > > > > > > When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES > > > config, mm only checks the memory validity but not the node id. > > > Because of that, Node 1 also detects the memory from node 0 as below > > > when it scans from the start address to the end address of node 1. > > > > > > Node 0 address: 0000 xxxx xxxx xxxx > > > Node 1 address: xxxx 1111 1111 1111 > > > > > > This layout could occur on any architecture. Most of them enables > > > this config by default with CONFIG_NUMA. This patch, by default, enables > > > CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA. > > > > I am not opposed to this at all. It reduces the config space and that is > > a good thing on its own. The history has shown that meory layout might > > be really wild wrt NUMA. The config is only used for early_pfn_in_nid > > which is clearly an overkill. > > > > Your description doesn't really explain why this is safe though. The > > history of this config is somehow messy, though. Mike has tried > > to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent > > NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd > > ("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any > > reasoning what so ever. This doesn't make it really easy see whether > > reasons for reintroduction are still there. Maybe there are some subtle > > dependencies. I do not see any TBH but that might be burried deep in an > > arch specific code. > > Yeah, since early_pfnnid_cache was added, we do not need worry about the > performance. But when I read the mem init code on x86 again, I do see there > are codes to handle the node overlapping, e.g in numa_cleanup_meminfo(), > when store node id into memblock. But the thing is if we have > encountered the node overlapping, we just return ahead of time, leave > something uninitialized. I am wondering if the system with node > overlapping can still run heathily. Ok, I didn't read code carefully. That is handling case where memblock with different node id overlap, it needs return. In the example Hoan gave, it has no problem, system can run well. Please ignore above comment. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.4 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40A49C43331 for ; Mon, 30 Mar 2020 08:28:31 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 14A3120748 for ; Mon, 30 Mar 2020 08:28:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="C+Hw7NYv"; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="RDRIsWDZ" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 14A3120748 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=zwR3CHRgHPC2JRpO43ze1/sRUtO6CzUCP19j/6VPPsc=; b=C+Hw7NYveaBzeu Ykqfk7rig2T6qYtAJxg6dq48jsKz6orJNwE20JKuoeDrcmH2HHpr5Zutcu0v7mQd9O1UrVYtsy/G9 Hxp0Kv+oZaWC5O5GO3AQOvWnyTCeA5LIDC165i98TiR1jF5/fskIzqVjxKbIO8he2S9y9ST37qCY8 eAonLhtdilG8UG3EegAcp1oNEQSzGcWgzGNlcKW5irqoRL2Wpyz052JHdyRPRTIQ6zfxoowvDACL1 Ez1vsu+ndicsZYjKK7SET5t6l6gi5MaoPOcOb35XDhLvYLd0x3MEK3hwvUEQ6bvvkgzH7xTjWfe8L 10y+qRjolWGuLISs3zZQ==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1jIpmU-0008Lb-Tp; Mon, 30 Mar 2020 08:28:26 +0000 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]) by bombadil.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1jIpmR-0008KN-CN for linux-arm-kernel@lists.infradead.org; Mon, 30 Mar 2020 08:28:24 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585556902; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=urh/QqSLIbmDRa1+bqKhLnHsVy5DBePTPVKLtVq9St4=; b=RDRIsWDZFQYmVItK4QYZ9ZsMBDzN7geVGGLvHT1rj7VccrFV5PwZzxc6OYwbb05LZVliZx gExY/YwplOt6yDZiLRePJsQOj0Hq5GTc7S2YH4pPugqZHXV4npCcddLxNEknHG7ncgTV9y f9oHfr7NXvYbKTSZdTm1Dl9DrR+n+ec= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-472-SBmE0C2qM4iblPlKXUQf2w-1; Mon, 30 Mar 2020 04:28:18 -0400 X-MC-Unique: SBmE0C2qM4iblPlKXUQf2w-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9A90F18CA243; Mon, 30 Mar 2020 08:28:14 +0000 (UTC) Received: from localhost (ovpn-12-53.pek2.redhat.com [10.72.12.53]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4451B100EBAF; Mon, 30 Mar 2020 08:28:13 +0000 (UTC) Date: Mon, 30 Mar 2020 16:28:09 +0800 From: Baoquan He To: Michal Hocko Subject: Re: [PATCH v3 0/5] mm: Enable CONFIG_NODES_SPAN_OTHER_NODES by default for NUMA Message-ID: <20200330082809.GB6352@MiWiFi-R3L-srv> References: <1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com> <20200330074246.GA14243@dhcp22.suse.cz> <20200330081659.GA6352@MiWiFi-R3L-srv> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20200330081659.GA6352@MiWiFi-R3L-srv> User-Agent: Mutt/1.10.1 (2018-07-13) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20200330_012823_497739_850EA53C X-CRM114-Status: GOOD ( 25.46 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: mmorana@amperecomputing.com, Catalin Marinas , Heiko Carstens , "open list:MEMORY MANAGEMENT" , Paul Mackerras , "H. Peter Anvin" , sparclinux@vger.kernel.org, Alexander Duyck , linux-s390@vger.kernel.org, Michael Ellerman , x86@kernel.org, Mike Rapoport , Christian Borntraeger , Ingo Molnar , Hoan Tran , Benjamin Herrenschmidt , Pavel Tatashin , lho@amperecomputing.com, Vasily Gorbik , Vlastimil Babka , Will Deacon , Borislav Petkov , Thomas Gleixner , linux-arm-kernel@lists.infradead.org, Oscar Salvador , linux-kernel@vger.kernel.org, Andrew Morton , linuxppc-dev@lists.ozlabs.org, "David S. Miller" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org On 03/30/20 at 04:16pm, Baoquan He wrote: > On 03/30/20 at 09:42am, Michal Hocko wrote: > > On Sat 28-03-20 11:31:17, Hoan Tran wrote: > > > In NUMA layout which nodes have memory ranges that span across other nodes, > > > the mm driver can detect the memory node id incorrectly. > > > > > > For example, with layout below > > > Node 0 address: 0000 xxxx 0000 xxxx > > > Node 1 address: xxxx 1111 xxxx 1111 > > > > > > Note: > > > - Memory from low to high > > > - 0/1: Node id > > > - x: Invalid memory of a node > > > > > > When mm probes the memory map, without CONFIG_NODES_SPAN_OTHER_NODES > > > config, mm only checks the memory validity but not the node id. > > > Because of that, Node 1 also detects the memory from node 0 as below > > > when it scans from the start address to the end address of node 1. > > > > > > Node 0 address: 0000 xxxx xxxx xxxx > > > Node 1 address: xxxx 1111 1111 1111 > > > > > > This layout could occur on any architecture. Most of them enables > > > this config by default with CONFIG_NUMA. This patch, by default, enables > > > CONFIG_NODES_SPAN_OTHER_NODES or uses early_pfn_in_nid() for NUMA. > > > > I am not opposed to this at all. It reduces the config space and that is > > a good thing on its own. The history has shown that meory layout might > > be really wild wrt NUMA. The config is only used for early_pfn_in_nid > > which is clearly an overkill. > > > > Your description doesn't really explain why this is safe though. The > > history of this config is somehow messy, though. Mike has tried > > to remove it a94b3ab7eab4 ("[PATCH] mm: remove arch independent > > NODES_SPAN_OTHER_NODES") just to be reintroduced by 7516795739bd > > ("[PATCH] Reintroduce NODES_SPAN_OTHER_NODES for powerpc") without any > > reasoning what so ever. This doesn't make it really easy see whether > > reasons for reintroduction are still there. Maybe there are some subtle > > dependencies. I do not see any TBH but that might be burried deep in an > > arch specific code. > > Yeah, since early_pfnnid_cache was added, we do not need worry about the > performance. But when I read the mem init code on x86 again, I do see there > are codes to handle the node overlapping, e.g in numa_cleanup_meminfo(), > when store node id into memblock. But the thing is if we have > encountered the node overlapping, we just return ahead of time, leave > something uninitialized. I am wondering if the system with node > overlapping can still run heathily. Ok, I didn't read code carefully. That is handling case where memblock with different node id overlap, it needs return. In the example Hoan gave, it has no problem, system can run well. Please ignore above comment. _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel