From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A7244C48BDF for ; Sat, 19 Jun 2021 00:56:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3A53E61261 for ; Sat, 19 Jun 2021 00:56:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3A53E61261 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7BB5F6B006E; Fri, 18 Jun 2021 20:56:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 76BBF6B0070; Fri, 18 Jun 2021 20:56:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E5496B0072; Fri, 18 Jun 2021 20:56:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 2A1706B006E for ; Fri, 18 Jun 2021 20:56:16 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C451E180AD817 for ; Sat, 19 Jun 2021 00:56:15 +0000 (UTC) X-FDA: 78268657110.10.D5CA4BF Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by imf12.hostedemail.com (Postfix) with ESMTP id 17F4D561 for ; Sat, 19 Jun 2021 00:56:10 +0000 (UTC) IronPort-SDR: 3dRSzqzc3v2e+BzTbcAygVoOJpWjkj3SxjRhnJhbjtprEMCEtNsnD4QxjCVXKErCKInjvQJ5Yl VvRPccri9Uqw== X-IronPort-AV: E=McAfee;i="6200,9189,10019"; a="186333042" X-IronPort-AV: E=Sophos;i="5.83,284,1616482800"; d="scan'208";a="186333042" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jun 2021 17:56:05 -0700 IronPort-SDR: Yj8DJAxq3E8EzdrSpJY/QlELmLf0ELKJqkBpEYqwFQNhbyEV9cDyoIoRIrNrF1lnT86F6J3GQe 6Rt72a4gc/iA== X-IronPort-AV: E=Sophos;i="5.83,284,1616482800"; d="scan'208";a="485871641" Received: from schen9-mobl.amr.corp.intel.com ([10.212.173.244]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 18 Jun 2021 17:56:05 -0700 Subject: Re: [LSF/MM TOPIC] Tiered memory accounting and management To: Shakeel Butt Cc: Yang Shi , lsf-pc@lists.linux-foundation.org, Linux MM , Michal Hocko , Dan Williams , Dave Hansen , David Rientjes , Wei Xu , Greg Thelen References: <475cbc62-a430-2c60-34cc-72ea8baebf2c@linux.intel.com> <82ffac56-e3fb-2d2d-1601-64130310bfc1@linux.intel.com> From: Tim Chen Message-ID: Date: Fri, 18 Jun 2021 17:56:04 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.6.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Authentication-Results: imf12.hostedemail.com; dkim=none; spf=none (imf12.hostedemail.com: domain of tim.c.chen@linux.intel.com has no SPF policy when checking 192.55.52.136) smtp.mailfrom=tim.c.chen@linux.intel.com; dmarc=fail reason="No valid SPF, No valid DKIM" header.from=intel.com (policy=none) X-Stat-Signature: 1eu1nhzybdodsah4n1f4dty5ms57ppp9 X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 17F4D561 X-HE-Tag: 1624064170-991703 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/18/21 4:59 PM, Shakeel Butt wrote: > On Fri, Jun 18, 2021 at 3:11 PM Tim Chen wrote: >> >> >> >> On 6/17/21 11:48 AM, Shakeel Butt wrote: > [...] >>> >>> At the moment "personally" I am more inclined towards a passive >>> approach towards the memcg accounting of memory tiers. By that I mean, >>> let's start by providing a 'usage' interface and get more >>> production/real-world data to motivate the 'limit' interfaces. (One >>> minor reason is that defining the 'limit' interface will force us to >>> make the decision on defining tiers i.e. numa or a set of numa or >>> others). >> >> Probably we could first start with accounting the memory used in each >> NUMA node for a cgroup and exposing this information to user space. >> I think that is useful regardless. >> > > Is memory.numa_stat not good enough? Yeah, forgot numa_stat is already there. Thanks for reminding me. > This interface does miss > __GFP_ACCOUNT non-slab allocations, percpu and sock. numa_stat should be good enough for now. > >> There is still a question of whether we want to define a set of >> numa node or tier and extend the accounting and management at that >> memory tier abstraction level. >> > [...] >>> >>> To give a more concrete example: Let's say we have a system with two >>> memory tiers and multiple low and high priority jobs. For high >>> priority jobs, set the allocation try list from high to low tier and >>> for low priority jobs the reverse of that (I am not sure if we can do >>> that out of the box with today's kernel). In the background we migrate >>> cold memory down the tiers and hot memory in the reverse direction. >>> >>> In this background mechanism we can enforce all different limiting >>> policies like Yang's original high and low tier percentage or >>> something like X% of accesses of high priority jobs should be from >>> high tier. >> >> If I understand what you are saying is you desire the kernel to provide >> the interface to expose performance information like >> "X% of accesses of high priority jobs is from high tier", > > I think we can estimate "X% of accesses to high tier" using existing > perf/PMU counters. So, no new interface. Using a perf counter will be okay to do for user space daemon, but I think there will be objections from people that the kernel take away a perf counter to collect perf data in kernel. Tim