From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 955DAC433DF for ; Tue, 18 Aug 2020 18:48:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 7D90E20888 for ; Tue, 18 Aug 2020 18:48:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726779AbgHRSsH (ORCPT ); Tue, 18 Aug 2020 14:48:07 -0400 Received: from mga11.intel.com ([192.55.52.93]:55906 "EHLO mga11.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726576AbgHRSsE (ORCPT ); Tue, 18 Aug 2020 14:48:04 -0400 IronPort-SDR: bblNWjOQNmnN5QAdWgN6ywOuwII7fg1uDQ2cyxfWIxhMOXMonAxB+aTeSwnLsemy+eNkGyXsh9 xzwKFjlMq90Q== X-IronPort-AV: E=McAfee;i="6000,8403,9717"; a="152610600" X-IronPort-AV: E=Sophos;i="5.76,328,1592895600"; d="scan'208";a="152610600" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Aug 2020 11:48:03 -0700 IronPort-SDR: /pdwXXntItm9kxQ5j43iTpw+7WR+wKQ/J0WmAEJGua4yA4qY458tGlR0Q48yZa2fPoaHgqWZ0q 5jnzzLmQLMog== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.76,328,1592895600"; d="scan'208";a="279475927" Received: from viggo.jf.intel.com (HELO localhost.localdomain) ([10.54.77.144]) by fmsmga008.fm.intel.com with ESMTP; 18 Aug 2020 11:48:03 -0700 Subject: [RFC][PATCH 1/9] mm/numa: node demotion data structure and lookup To: linux-kernel@vger.kernel.org Cc: Dave Hansen , yang.shi@linux.alibaba.com, rientjes@google.com, ying.huang@intel.com, dan.j.williams@intel.com From: Dave Hansen Date: Tue, 18 Aug 2020 11:41:24 -0700 References: <20200818184122.29C415DF@viggo.jf.intel.com> In-Reply-To: <20200818184122.29C415DF@viggo.jf.intel.com> Message-Id: <20200818184124.0951CF98@viggo.jf.intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Dave Hansen Prepare for the kernel to auto-migrate pages to other memory nodes with a user defined node migration table. This allows creating single migration target for each NUMA node to enable the kernel to do NUMA page migrations instead of simply reclaiming colder pages. A node with no target is a "terminal node", so reclaim acts normally there. The migration target does not fundamentally _need_ to be a single node, but this implementation starts there to limit complexity. If you consider the migration path as a graph, cycles (loops) in the graph are disallowed. This avoids wasting resources by constantly migrating (A->B, B->A, A->B ...). The expectation is that cycles will never be allowed. Signed-off-by: Dave Hansen Cc: Yang Shi Cc: David Rientjes Cc: Huang Ying Cc: Dan Williams -- changes in July 2020: - Remove loop from next_demotion_node() and get_online_mems(). This means that the node returned by next_demotion_node() might now be offline, but the worst case is that the allocation fails. That's fine since it is transient. --- b/mm/migrate.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff -puN mm/migrate.c~0006-node-Define-and-export-memory-migration-path mm/migrate.c --- a/mm/migrate.c~0006-node-Define-and-export-memory-migration-path 2020-08-18 11:36:47.910583191 -0700 +++ b/mm/migrate.c 2020-08-18 11:36:47.914583191 -0700 @@ -1160,6 +1160,22 @@ out: return rc; } +static int node_demotion[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE}; + +/** + * next_demotion_node() - Get the next node in the demotion path + * @node: The starting node to lookup the next node + * + * @returns: node id for next memory node in the demotion path hierarchy + * from @node; NUMA_NO_NODE if @node is terminal. This does not keep + * @node online or guarantee that it *continues* to be the next demotion + * target. + */ +int next_demotion_node(int node) +{ + return node_demotion[node]; +} + /* * Obtain the lock on page, remove all ptes and migrate the page * to the newly allocated page in newpage. _