From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3ECB6CDB483 for ; Wed, 18 Oct 2023 08:09:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A757E8D0140; Wed, 18 Oct 2023 04:09:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A24108D0016; Wed, 18 Oct 2023 04:09:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F4AB8D0140; Wed, 18 Oct 2023 04:09:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7F9138D0016 for ; Wed, 18 Oct 2023 04:09:36 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4FF9F1CBE5B for ; Wed, 18 Oct 2023 08:09:36 +0000 (UTC) X-FDA: 81357857952.19.EB9A087 Received: from NAM12-DM6-obe.outbound.protection.outlook.com (mail-dm6nam12on2073.outbound.protection.outlook.com [40.107.243.73]) by imf04.hostedemail.com (Postfix) with ESMTP id 74E7640010 for ; Wed, 18 Oct 2023 08:09:33 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=AhQ9Fg0S; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf04.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.243.73 as permitted sender) smtp.mailfrom=gregory.price@memverge.com; dmarc=pass (policy=none) header.from=memverge.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697616573; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HNuA5tqe1+zL1D4/D8kuBzXjhawGhrAa6vwNLsU1DiY=; b=OJvWpgzau/nDEL/fXVlkPBwmZOTncNc1e6IU+xunS+MLG2IwIoTzI+igPrPlznKdUR5KGu s+HkCQ4svSHk6AFI0n2cKLn6kqXYyQkUT8RWC4NwMLGzGHoDcEeLnEJhijUfelS55ZwEsj Is8a3qpy9VoQnYkpZN997/DKVDrvQ58= ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1697616573; a=rsa-sha256; cv=pass; b=Y4cujZBW5JLaIfz73R3P6EyrdKeulNPq/PV66/We8S9xYtOlDf6OShKj6kxCzNEgzQAELn c34bRbGc3ZKqpUzhan2Mw+G/98jRv6MDgKoLNVoXZKjFz7ZpcJvXA+PL0GZDZFFldk2ATU 3FSwxt3xfgAzIBqK6XHztnOmEJkQ/S0= ARC-Authentication-Results: i=2; imf04.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=AhQ9Fg0S; arc=pass ("microsoft.com:s=arcselector9901:i=1"); spf=pass (imf04.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.243.73 as permitted sender) smtp.mailfrom=gregory.price@memverge.com; dmarc=pass (policy=none) header.from=memverge.com ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=etgcIqMKj/uDzBEoFT++p1Ujpo92OLUCZ7zP08rbct4DQ0+z7HPc4Y+W6qkdKgOObuaQkAi1S/B2MXnp8ImzX5mZAg1NUkEDYIiDmLsgqiFbzw8T5j9aye+GQm0ee7FvJpt94u5xFUmgKVdSllBWnbKyl6yx3kU7+N9XWVdZAD8RihztmP+M6+Y8ntvAo6mqWIGHdqlO76R0oHdDoq0rCkTMRIJxaNbxdmxRbnIyz1ToyxGh3V7JJv+kQi8r02PsM1a8HFUFAiWgdDzX5wbcGqEb0k5kKMC7/qxOZCUgASuszACUV6YNYKPRmX+SSLxprTfyGVpo3PpfFYFjJ0RLSg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=HNuA5tqe1+zL1D4/D8kuBzXjhawGhrAa6vwNLsU1DiY=; b=edfUD4/XFgZqbvcQRiug8cMmEOBPlSZ3vtp7yLugGL9356eDuiYRrejCSnN+BvR6mzNGEGZj0+XVC4Zl3sNT/LDhFeyrLa40DJyfuXe9G53sZJjlnV7m8mu7Su4JEzF3JYn9FJT9U48dixoSx3S8d0AdhgCNXQ8Kffq7iQDWk/TcZa7d+T6/Zr/02ktfHxBLKocIocDYD3ly2WOtv5P7hQDLxsROE8Kx2FxrsCRtDDSxTIWY4i8/lvI3P75aBQrqfJs8F0q+4s8cS4XUoJacricnF/sebpdyFZ50WPhWlZsqlHLOcWPXVSJQz2qD86WSR8FbATq2BeTGbLjDjfX21w== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=memverge.com; dmarc=pass action=none header.from=memverge.com; dkim=pass header.d=memverge.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=memverge.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=HNuA5tqe1+zL1D4/D8kuBzXjhawGhrAa6vwNLsU1DiY=; b=AhQ9Fg0SemD1I5bx6F7gqB/QiId1LGIkwdKtJFx3mD6lxFmcwNxVvdVL3Qcuy1I1r+f0cOkFN1DXpNUsWpSmJzLUZaCmAKzdl3J0XyMoPit0FQyH+z5CAaHNtkvKUcJr4mxrpN5OE29B1nsl6lWIkry0+Wsc8/lF5zN3LcG1USc= Received: from SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) by PH0PR17MB5501.namprd17.prod.outlook.com (2603:10b6:510:b5::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6886.35; Wed, 18 Oct 2023 08:09:28 +0000 Received: from SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::3cf6:989a:f717:7c20]) by SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::3cf6:989a:f717:7c20%4]) with mapi id 15.20.6907.022; Wed, 18 Oct 2023 08:09:28 +0000 Date: Mon, 16 Oct 2023 21:28:33 -0400 From: Gregory Price To: "Huang, Ying" Cc: Gregory Price , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, akpm@linux-foundation.org, sthanneeru@micron.com Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving Message-ID: References: <20231009204259.875232-1-gregory.price@memverge.com> <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> X-ClientProxiedBy: PH7PR10CA0010.namprd10.prod.outlook.com (2603:10b6:510:23d::19) To SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR17MB5512:EE_|PH0PR17MB5501:EE_ X-MS-Office365-Filtering-Correlation-Id: 5c54cc13-28c0-450f-fae0-08dbcfb18c77 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: LzmOwDT32dDptlBFxM7ysdq002JvDj/18IW6X9BIb4um09hKx7I997tao36vGxOjmNuMfNro7WvaigTb6AnKiDxE7ZYPWHaFkgS5cLRi8gyBeS0eHjgR7lf5zx43m0fk/RFI1gVjFBOmTOOeW04dOx7l8KXQyqVcBAlBQURIeKJktS2cFdTUtqNO0mvRZPbyTqfg6MyHA/2c+rBcIQjpbDkv5HDuSJ98FtHEKoeV1BNWeD5HbxvC8xrzNW6pseh6HB1JcD+4axTIrh7ZZ9bvbqRnU4WxlrUhg5mpqNWhpl/IdZR84KalCMjpXaBLrfxFnObDRG6Oyc+QjHNuRqNi0NpkEC3hEMNr6K7vWUoifp0A32/VeEdICcC9qlW7QUbjkhsq8Dt5p0a5TagYIOvyTMVHKODLdi9PVPVqY7YmQCY7Mvw5kyr2L6XRBTA1MAdpFYRcAIkNPHHA6WqN7r7BYHrdJ9AlTnYKv1xS0vSzcRzIvgzNXOQy9zw5DiMRKtgDCRf8e8EslXXqA8srDldJQulfp93I9g2UNy3EMxvvGnRod98gi039Uq44RdfpXimk3DBcT8KclLc+vUC+ZVRtlXUbjZkMSymyeTkAGeFkIT5Ta3v81UdrkBioJbD3/xiF X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR17MB5512.namprd17.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(396003)(346002)(136003)(39850400004)(366004)(376002)(230922051799003)(186009)(1800799009)(451199024)(64100799003)(41300700001)(66556008)(66476007)(6666004)(478600001)(66946007)(6916009)(6486002)(6506007)(26005)(6512007)(316002)(2616005)(4326008)(8676002)(5660300002)(8936002)(2906002)(36756003)(44832011)(38100700002)(83380400001)(86362001)(16393002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?DQFqEj+G35VFSgkkKYBNBirMgIayiBtcbCLCgd2jSRjrGOCyvq3wkNLcwRBJ?= =?us-ascii?Q?WVKkdhDTzrX6nSLL8NYUw3HgJhXVt+kh6PDPFM3go8XKVqjWV6wVc5kYCol4?= =?us-ascii?Q?X7HzvSUNdZ7cwf2RIqM/MdB7LwXxvkulXnNCEVeL0Dckf9RvzKEd1o3thLFd?= =?us-ascii?Q?kR5oo04+QTyZFvAjITn6w5shnkWTZvFL1l3BdE5A160Lvvw+BRnlQsN003vt?= =?us-ascii?Q?cNRWKLYYpPM9Fe6vw20mk+HxDO45+lMruhDDeRqU/47+xHLXshywvApIyrX4?= =?us-ascii?Q?418Atmz39KqgoPKP3Z41QVOJWJEk8e5Kb1uvqmsj1ylHWQ1jp8rC6Q08cO8y?= =?us-ascii?Q?LMQZzgvVVLr303X7U45vRyz5mdxv7dR9SrIgnrNG28E2zcn+mqNWDYEGiHrA?= =?us-ascii?Q?Rfp90DytcTANiL+7QRPJpf0dcfWhCkrXPGQEcaAjvZ3PsBTiuTfnRe4knj2z?= =?us-ascii?Q?CcuZzPBOUHoyRUGMctxEsQLp9kGj1zFhDTG/eJW08zmnuoSvbAd7pjXh1cvh?= =?us-ascii?Q?6N3BiI6DD95uHoWTGqao9drKlJZ37NYyAVfmsEdhpKmYpGZ+Y82l4m8EIHMN?= =?us-ascii?Q?vCckvPpdPqp5ixDYt0NWJTaAl+PAK4X89Q4QUcIuVw2CVBDRJOTJRkU7z1SH?= =?us-ascii?Q?IlWPilZXIuKTM6TywpNac1bDd40xzV1Oo6zxhknCQT8OmhkMxSZg9JRh5ycZ?= =?us-ascii?Q?D//X6aW4X5PzQqKF95hMe9HjyMPKFEcYtzFw38HCS4zifrGOxvAN/XU1kWhP?= =?us-ascii?Q?RB0et+2YgX2BuznRg0ZZkSwCTP18g2t09oi/YcEV5Izve+K3/FLZgtFaIHDb?= =?us-ascii?Q?gYSUP1HmAPwXmCUQ8TgBsNe6xM7qTJo2C3hJQXxUKd0gMH5LS3Stbn8XOgao?= =?us-ascii?Q?o6EEDipKxMgXVmLTRrhkrrvDmxCJtHTRvLiFmD1hVy+xyosVEiLpDHTVqhi3?= =?us-ascii?Q?YmgIk9trw6DG1YWFrHaUnIyFMio3zvbJWjNhKSdS62tQftHqojpSPr+GbZWU?= =?us-ascii?Q?FZnqdc0j+a0hymRSugdmqyOwndDNqr8hdRv3NoMdCeWEDniDDWCY7KtHRlWK?= =?us-ascii?Q?MFe2lRC/CtARvnONm34K3+LkAv7BfgAxnER+OOWZ0CLohoKilvskYCeN0bRQ?= =?us-ascii?Q?NczB4599WyIg2P04aq8JwGr/FomLXkoM3SHYn2HP28VXWzW1Xm9T9NCNK8vx?= =?us-ascii?Q?rDvPFIUJ3StbI4qmGDjVOlidLAJMdOn8YtDC3jaqVqCRiDcDVTZNOT1y3ZDX?= =?us-ascii?Q?dj4HjoQBtaEyeWGn5VpXkwhYQPlZLihsDz8TE2JjvZ8IEAOdbGNL4je/fOLC?= =?us-ascii?Q?5JBhM9G7WWzUxq1nFJ8Lmqnu16/YmhNHL6AgMZuqWglszy1GmTFMdmc/sjMq?= =?us-ascii?Q?5P0O3E1p82XUFQiaJpDekkiwkcG3kesyuO5cS8ydAsr//K+AuT1H5xMUsOW5?= =?us-ascii?Q?F5dn3d7+fO6Z8sXIeWMRFRKwOav170uVzT9PG8Shjd5UzeU61S88tOtpFenk?= =?us-ascii?Q?+792zvoaTf9SZC/ifLjrldXN+D3EPQwqQYrfNvYVgRBaaV+h6J+c1lX5lT1+?= =?us-ascii?Q?EorFFHJu34eXayZKNrqvZcf+llhRCiI4Bd/yDUHzjgRhlSXhRW3+zVt93Iuy?= =?us-ascii?Q?/Q=3D=3D?= X-OriginatorOrg: memverge.com X-MS-Exchange-CrossTenant-Network-Message-Id: 5c54cc13-28c0-450f-fae0-08dbcfb18c77 X-MS-Exchange-CrossTenant-AuthSource: SJ0PR17MB5512.namprd17.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Oct 2023 08:09:27.7749 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5c90cb59-37e7-4c81-9c07-00473d5fb682 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 52kwi2ZX0BXDeAh2pLv8s4m9733stXfWgf/WlcfhtGaWzyOBAWlu28b82P3vGfgRL6ztRso98A1KMC25buxq5ap0AuEoGdfQvXxD4w986DY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR17MB5501 X-Stat-Signature: s6bsxtcnxdm6hsnpn9wzuxoshpfckmya X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 74E7640010 X-Rspam-User: X-HE-Tag: 1697616573-894367 X-HE-Meta: U2FsdGVkX1+pGtOuUcrc3i9f3OYu8F6P71ajbsYlBJkG08iAAbfYUQRitt6U0lmE2OQfOisd0rEXjQfnrSDyeoIrt+uEsJcN/ol8nhCGXBcEz6oX4uZYGd/FifnQEZ/igv0SYZFMVwUup6UEV28LnhZZnZi80oO9qLK4+9lTmp272B/xG7vQB3UnCUplON0IchO9OhI8cJ3NyD37MSRLNskS5lkMY0bHF+0gvbvPT00zq8t5SAPyGWVJktV1BP+qcJbdtBJ0UtB3bXn+GaKvgqGmVmADxI1ARm0EbsDanknoJNPZtV9vAyTRHRJxSHdHMkbZDZGqbfHVqxKLbyiv9WN/LDQcpWSTBfDfMeNGtMD93GsO9d0cziUnai6k/IdRbAvY1YFzp/koEZdr0cOIO3jAfNW63S3+jhkhNfN643hgzB5o9SVZwwHDxUiZVscZOf4iIom+mhsyVACprdq5JJynFhmzlpTwA+uo4unyFq0gser3YkBgCfC3RY8k+SRXKe1PM7uucOuS1x91cP0dzbEN3mvOLzUsvRbulHB4O2mGjPKMUEFig5bI5pvQBVYTaxTv1r6SNhdyZPQb48c9jift4apFfZpCl3zDRUNgDHr4vwngMgMlfT37BmgQ7kA+ymwmJLixDMk4cvdpixD9lqWROvf0YxwTBlLsrOk4moCCz0JE5Vt9V8BGw8iq2nzZ2M77NoKjPf/IDUt+iabrm4powfoDesDjMQJ3/QUfocz0GG2HEa+H10qkmVUR1ujzt8mUtsbh44qqVXVmSEgUdGjrCDCOc3UmF9N1Lv5ktme0JrWVNnT+7J7zBW67cw5GwYB7x+R2a8iiGErBKo2cd5HWUdjkfEWQxf0Zy+K4f+/MgA5vBT8XN8I9Iv7hizSpE4CFS3enXrHZzom7Bbm7pd8oj2i/P2GcZFrSDhusu5Lifkp9qQKLiHbOBQhEQiRWIKAF9mLA4vbiSkC/SAs rhNs7KsL seXifHB2Dlh132TQsfg2asgNvf4HrbH63GCgoX2uTe+jBlwEjM5Y5pr64VP3OYwywvfQcdMtapKIF+WHrLDkOx4jnY+257WhaK26tgZLKr5vDxAUKYkYTEBSP1Ay2vf/JDMgtCkHUvgtaPlw5UmmyrbnMR7YdbAu9sUw32E77Cmx0c/Y1LbIC4Q1s5zQc+8lYr59/0vJJz/t/T45oU3rCytM2EwpRsI4W3HxOqaknH/t5DoR/JW9lp3rQKUearlZt0yr+E9dLXGOv0k5b2ClNE5GeEp5QyR9cxjaXoKzOVKE2hC8lY2MnVty9/pcgjztg9lQqoGp7Q0nacy/IeC/8KxXZKDjkovqjZhxG X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 16, 2023 at 03:57:52PM +0800, Huang, Ying wrote: > Gregory Price writes: > > > == Mutex to Semaphore change: > > > > Since it is expected that many threads will be accessing this data > > during allocations, a mutex is not appropriate. > > IIUC, this is a change for performance. If so, please show some > performance data. > This change will be dropped in v3 in favor of the existing RCU mechanism in memory-tiers.c as pointed out by Matthew. > > == Source-node relative weighting: > > > > 1. Set weights for DDR (tier4) and CXL(teir22) tiers. > > echo source_node:weight > /path/to/interleave_weight > > If source_node is considered, why not consider target_node too? On a > system with only 1 tier (DRAM), do you want weighted interleaving among > NUMA nodes? If so, why tie weighted interleaving with memory tiers? > Why not just introduce weighted interleaving for NUMA nodes? > The short answer: Practicality and ease-of-use. The long answer: We have been discussing how to make this more flexible.. Personally, I agree with you. If Task A is on Socket 0, the weight on Socket 0 DRAM should not be the same as the weight on Socket 1 DRAM. However, right now, DRAM nodes are lumped into the same tier together, resulting in them having the same weight. If you scrollback through the list, you'll find an RFC I posted for set_mempolicy2 which implements weighted interleave in mm/mempolicy. However, mm/mempolicy is extremely `current-centric` at the moment, so that makes changing weights at runtime (in response to a hotplug event, for example) very difficult. I still think there is room to extend set_mempolicy to allow task-defined weights to take preference over tier defined weights. We have discussed adding the following features to memory-tiers: 1) breaking up tiers to allow 1 tier per node, as opposed to defaulting to lumping all nodes of a simlar quality into the same tier 2) enabling movemnet of nodes between tiers (for the purpose of reconfiguring due to hotplug and other situations) For users that require fine-grained control over each individual node, this would allow for weights to be applied per-node, because a node=tier. For the majority of use cases, it would allow clumping of nodes into tiers based on physical topology and performance class, and then allow for the general weighting to apply. This seems like the most obvious use-case that a majority of users would use, and also the easiest to set-up in the short term. That said, there are probably 3 or 4 different ways/places to implement this feature. The question is what is the clear and obvious way? I don't have a definitive answer for that, hence the RFC. There are at least 5 proposals that i know of at the moment 1) mempolicy 2) memory-tiers 3) memory-block interleaving? (weighting among blocks inside a node) Maybe relevant if Dynamic Capacity devices arrive, but it seems like the wrong place to do this. 4) multi-device nodes (e.g. cxl create-region ... mem0 mem1...) 5) "just do it in hardware" > > # Set tier4 weight from node 0 to 85 > > echo 0:85 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight > > # Set tier4 weight from node 1 to 65 > > echo 1:65 > /sys/devices/virtual/memory_tiering/memory_tier4/interleave_weight > > # Set tier22 weight from node 0 to 15 > > echo 0:15 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight > > # Set tier22 weight from node 1 to 10 > > echo 1:10 > /sys/devices/virtual/memory_tiering/memory_tier22/interleave_weight > > -- > Best Regards, > Huang, Ying