From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CA81C433EF for ; Thu, 12 May 2022 07:36:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 89D746B0075; Thu, 12 May 2022 03:36:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 826368D0002; Thu, 12 May 2022 03:36:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6784C8D0001; Thu, 12 May 2022 03:36:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 57BC26B0075 for ; Thu, 12 May 2022 03:36:46 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 3303E3171A for ; Thu, 12 May 2022 07:36:46 +0000 (UTC) X-FDA: 79456284012.19.471F68B Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf18.hostedemail.com (Postfix) with ESMTP id A2A8B1C009C for ; Thu, 12 May 2022 07:36:34 +0000 (UTC) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 24C640oA012149; Thu, 12 May 2022 07:36:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=from : to : cc : subject : in-reply-to : references : date : message-id : mime-version : content-type; s=pp1; bh=sWPxOUMCtrF885mkcMhLxmucNHefmQGu8RUGsftx3+E=; b=MHuNYWJ9UhTziERjej4EZRjp5LFE9QKMvYJBLBCpFMeUyJz1KbxxYc6c+uIduunNiEI+ 8fTI4sNo70MAeXC472u6QV94gA4wDsWArXpb36X0Y+tX3ZTltB2v9n7zoFfkkBDR0/UW 45GKbUOvGXe1F3IEpYtJkkG9+aSvagGWX6xoj3Cdk1J0L8XgpJM+DXdXBv5aiUTVjnvh S5R7M1oimFX7zFKzP4aI8uwioCej2HHAKTWL2Ynclhr0r2TSZu7krnCIQWhKXV2izOao V5zkJ1BJo99O+uk5K609e36C4uU0bcoIOKkVrzRtIM047vCfFUuQidRl5DPO43S69VLj oQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g0rcmdnr4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:23 +0000 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.17.1.5/8.17.1.5) with ESMTP id 24C7SxJ3026296; Thu, 12 May 2022 07:36:22 GMT Received: from ppma02dal.us.ibm.com (a.bd.3ea9.ip4.static.sl-reverse.com [169.62.189.10]) by mx0b-001b2d01.pphosted.com (PPS) with ESMTPS id 3g0rcmdnqp-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:22 +0000 Received: from pps.filterd (ppma02dal.us.ibm.com [127.0.0.1]) by ppma02dal.us.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 24C7KRXH020085; Thu, 12 May 2022 07:36:21 GMT Received: from b03cxnp08025.gho.boulder.ibm.com (b03cxnp08025.gho.boulder.ibm.com [9.17.130.17]) by ppma02dal.us.ibm.com with ESMTP id 3fwgdasymx-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 12 May 2022 07:36:21 +0000 Received: from b03ledav001.gho.boulder.ibm.com (b03ledav001.gho.boulder.ibm.com [9.17.130.232]) by b03cxnp08025.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 24C7aKYH24576494 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 12 May 2022 07:36:20 GMT Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 3FAD26E060; Thu, 12 May 2022 07:36:20 +0000 (GMT) Received: from b03ledav001.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 5231F6E04E; Thu, 12 May 2022 07:36:13 +0000 (GMT) Received: from skywalker.linux.ibm.com (unknown [9.43.96.94]) by b03ledav001.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 12 May 2022 07:36:12 +0000 (GMT) X-Mailer: emacs 29.0.50 (via feedmail 11-beta-1 I) From: "Aneesh Kumar K.V" To: Wei Xu Cc: "ying.huang@intel.com" , Andrew Morton , Greg Thelen , Yang Shi , Linux Kernel Mailing List , Jagdish Gediya , Michal Hocko , Tim C Chen , Dave Hansen , Alistair Popple , Baolin Wang , Feng Tang , Jonathan Cameron , Davidlohr Bueso , Dan Williams , David Rientjes , Linux MM , Brice Goglin , Hesham Almatary Subject: Re: RFC: Memory Tiering Kernel Interfaces (v2) In-Reply-To: References: <56b41ce6922ed5f640d9bd46a603fa27576532a9.camel@intel.com> Date: Thu, 12 May 2022 13:06:10 +0530 Message-ID: <87y1z7jj85.fsf@linux.ibm.com> MIME-Version: 1.0 Content-Type: text/plain X-TM-AS-GCONF: 00 X-Proofpoint-GUID: AuowE5mPoUqTDU3ID6VFxKdVpkr8UUhI X-Proofpoint-ORIG-GUID: JzQJHaqMgmaIHuZUNN--X5eKjkwIpBU5 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.858,Hydra:6.0.486,FMLib:17.11.64.514 definitions=2022-05-11_07,2022-05-12_01,2022-02-23_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 suspectscore=0 priorityscore=1501 malwarescore=0 impostorscore=0 lowpriorityscore=0 mlxlogscore=999 phishscore=0 spamscore=0 mlxscore=0 adultscore=0 bulkscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2202240000 definitions=main-2205120034 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=MHuNYWJ9; spf=pass (imf18.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: A2A8B1C009C X-Stat-Signature: mkfsszy8948ttxz5tm7qursw97j5g48n X-HE-Tag: 1652340994-852330 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Wei Xu writes: > On Thu, May 12, 2022 at 12:12 AM Aneesh Kumar K V > wrote: >> >> On 5/12/22 12:33 PM, ying.huang@intel.com wrote: >> > On Wed, 2022-05-11 at 23:22 -0700, Wei Xu wrote: >> >> Sysfs Interfaces >> >> ================ >> >> >> >> * /sys/devices/system/memtier/memtierN/nodelist >> >> >> >> where N = 0, 1, 2 (the kernel supports only 3 tiers for now). >> >> >> >> Format: node_list >> >> >> >> Read-only. When read, list the memory nodes in the specified tier. >> >> >> >> Tier 0 is the highest tier, while tier 2 is the lowest tier. >> >> >> >> The absolute value of a tier id number has no specific meaning. >> >> What matters is the relative order of the tier id numbers. >> >> >> >> When a memory tier has no nodes, the kernel can hide its memtier >> >> sysfs files. >> >> >> >> * /sys/devices/system/node/nodeN/memtier >> >> >> >> where N = 0, 1, ... >> >> >> >> Format: int or empty >> >> >> >> When read, list the memory tier that the node belongs to. Its value >> >> is empty for a CPU-only NUMA node. >> >> >> >> When written, the kernel moves the node into the specified memory >> >> tier if the move is allowed. The tier assignment of all other nodes >> >> are not affected. >> >> >> >> Initially, we can make this interface read-only. >> > >> > It seems that "/sys/devices/system/node/nodeN/memtier" has all >> > information we needed. Do we really need >> > "/sys/devices/system/memtier/memtierN/nodelist"? >> > >> > That can be gotten via a simple shell command line, >> > >> > $ grep . /sys/devices/system/node/nodeN/memtier | sort -n -k 2 -t ':' >> > >> >> It will be really useful to fetch the memory tier node list in an easy >> fashion rather than reading multiple sysfs directories. If we don't have >> other attributes for memorytier, we could keep >> "/sys/devices/system/memtier/memtierN" a NUMA node list there by >> avoiding /sys/devices/system/memtier/memtierN/nodelist >> >> -aneesh > > It is harder to implement memtierN as just a file and doesn't follow > the existing sysfs pattern, either. Besides, it is extensible to have > memtierN as a directory. diff --git a/drivers/base/node.c b/drivers/base/node.c index 6248326f944d..251f38ec3816 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -1097,12 +1097,49 @@ static struct attribute *node_state_attrs[] = { NULL }; +#define MAX_TIER 3 +nodemask_t memory_tier[MAX_TIER]; + +#define _TIER_ATTR_RO(name, tier_index) \ + { __ATTR(name, 0444, show_tier, NULL), tier_index, NULL } + +struct memory_tier_attr { + struct device_attribute attr; + int tier_index; + int (*write)(nodemask_t nodes); +}; + +static ssize_t show_tier(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct memory_tier_attr *mt = container_of(attr, struct memory_tier_attr, attr); + + return sysfs_emit(buf, "%*pbl\n", + nodemask_pr_args(&memory_tier[mt->tier_index])); +} + static const struct attribute_group memory_root_attr_group = { .attrs = node_state_attrs, }; + +#define TOP_TIER 0 +static struct memory_tier_attr memory_tiers[] = { + [0] = _TIER_ATTR_RO(memory_top_tier, TOP_TIER), +}; + +static struct attribute *memory_tier_attrs[] = { + &memory_tiers[0].attr.attr, + NULL +}; + +static const struct attribute_group memory_tier_attr_group = { + .attrs = memory_tier_attrs, +}; + static const struct attribute_group *cpu_root_attr_groups[] = { &memory_root_attr_group, + &memory_tier_attr_group, NULL, }; As long as we have the ability to see the nodelist, I am good with the proposal. -aneesh