From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 658BAC433ED for ; Mon, 12 Apr 2021 19:20:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D8C2F60200 for ; Mon, 12 Apr 2021 19:20:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D8C2F60200 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5BB216B006C; Mon, 12 Apr 2021 15:20:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 544746B006E; Mon, 12 Apr 2021 15:20:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 396D96B0070; Mon, 12 Apr 2021 15:20:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id 166436B006C for ; Mon, 12 Apr 2021 15:20:36 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C11A6824999B for ; Mon, 12 Apr 2021 19:20:35 +0000 (UTC) X-FDA: 78024681630.03.6F028AD Received: from mail-lf1-f52.google.com (mail-lf1-f52.google.com [209.85.167.52]) by imf05.hostedemail.com (Postfix) with ESMTP id BD5E3E000108 for ; Mon, 12 Apr 2021 19:20:34 +0000 (UTC) Received: by mail-lf1-f52.google.com with SMTP id z13so6154276lfd.9 for ; Mon, 12 Apr 2021 12:20:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=njLi9+INls9UJu6xo/OSSkdSSq3w2QSnMYDEcop1v+w=; b=E6Csoera0jTVwBjqvhuypbBsJEYw/+XzR+REf5XoXOCW9nP2fSC/MJcGVjsDnKa90T bk1YHPdwhSkhmjMzdnOnZOJWy/R8XS/WHmV61w9P+OwAyU46e7LrRdhPgfrb+v1IMfRU f2nUnjawlUJ+FaLhWS9x3b4/VZbxb7T/ou4E/fqqZqi/7C4RaudewxQdsCBggHQUvsnE yKaZNFtbZtIKwo9Gi03kxEFbuSaK+cfntXwPznqzcJz1p6Mir6oOeWJ2RFqqINVWeJ9D gqA+lMxXvj9aQFgW5DjHDQQexEHddJrpScogfSMuuD83sWwwzty18aqMU61BOw130fsI nzpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=njLi9+INls9UJu6xo/OSSkdSSq3w2QSnMYDEcop1v+w=; b=d3uIiy5niu5LYiAn3guMrhz2f8Ql9SyI18ZPmkJ8qevWxNyvWQ+qx/m5vmdnTiMvy8 UjIqHjEtTqcnHGxwLSweV8ujchtEaJ5RSi/aCNwX1rqdq55BizBY3xl21HjUTfzZgYV5 igJKvMsZm+36DI3zboTgM33jtMecwZMNpAlRB8n904V0DexpwAJ6XpIahrK8TjMLhj2o qxk5ft8D56WvQvE21v00vyugDTfWzt5A2N4LlnaTqtXEbb10CMjIcr1J9+kVnUGNXAgL EvEJrEB5e1p92EeuEuADzhWGEX0y0uhGW16uXXAIs40To4tG9J51+HY8ZrNYeMY/gS4Q GqaA== X-Gm-Message-State: AOAM532SaKPNfnBVqGG5gZOv4cLBRpyAxXE1RqPgVS2IoPSLK4IgllD1 0Uc/lH+0hXIU5nmKVQ+xLsq+zpsfcPrRtg7CKdIbFQ== X-Google-Smtp-Source: ABdhPJyNOJCLrlMIuqzkkDobbCYe2ZeSEPBKfmWexDb42kAbrvSwXQZRN1a3l6xgUTnC1ZNrNVglaPZ7Ol2UF87uSrA= X-Received: by 2002:a19:3804:: with SMTP id f4mr20738610lfa.117.1618255233440; Mon, 12 Apr 2021 12:20:33 -0700 (PDT) MIME-Version: 1.0 References: <58e5dcc9-c134-78de-6965-7980f8596b57@linux.intel.com> In-Reply-To: <58e5dcc9-c134-78de-6965-7980f8596b57@linux.intel.com> From: Shakeel Butt Date: Mon, 12 Apr 2021 12:20:22 -0700 Message-ID: Subject: Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory To: Tim Chen Cc: Michal Hocko , Johannes Weiner , Andrew Morton , Dave Hansen , Ying Huang , Dan Williams , David Rientjes , Linux MM , Cgroups , LKML , Greg Thelen , Wei Xu Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 6a54pf8m6mbq6o5tsard537juitd49xk X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: BD5E3E000108 Received-SPF: none (google.com>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from=""; helo=mail-lf1-f52.google.com; client-ip=209.85.167.52 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1618255234-705236 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Apr 9, 2021 at 4:26 PM Tim Chen wrote: > > > On 4/8/21 4:52 AM, Michal Hocko wrote: > > >> The top tier memory used is reported in > >> > >> memory.toptier_usage_in_bytes > >> > >> The amount of top tier memory usable by each cgroup without > >> triggering page reclaim is controlled by the > >> > >> memory.toptier_soft_limit_in_bytes > > > > Michal, > > Thanks for your comments. I will like to take a step back and > look at the eventual goal we envision: a mechanism to partition the > tiered memory between the cgroups. > > A typical use case may be a system with two set of tasks. > One set of task is very latency sensitive and we desire instantaneous > response from them. Another set of tasks will be running batch jobs > were latency and performance is not critical. In this case, > we want to carve out enough top tier memory such that the working set > of the latency sensitive tasks can fit entirely in the top tier memory. > The rest of the top tier memory can be assigned to the background tasks. > > To achieve such cgroup based tiered memory management, we probably want > something like the following. > > For generalization let's say that there are N tiers of memory t_0, t_1 ... t_N-1, > where tier t_0 sits at the top and demotes to the lower tier. > We envision for this top tier memory t0 the following knobs and counters > in the cgroup memory controller > > memory_t0.current Current usage of tier 0 memory by the cgroup. > > memory_t0.min If tier 0 memory used by the cgroup falls below this low > boundary, the memory will not be subjected to demotion > to lower tiers to free up memory at tier 0. > > memory_t0.low Above this boundary, the tier 0 memory will be subjected > to demotion. The demotion pressure will be proportional > to the overage. > > memory_t0.high If tier 0 memory used by the cgroup exceeds this high > boundary, allocation of tier 0 memory by the cgroup will > be throttled. The tier 0 memory used by this cgroup > will also be subjected to heavy demotion. > > memory_t0.max This will be a hard usage limit of tier 0 memory on the cgroup. > > If needed, memory_t[12...].current/min/low/high for additional tiers can be added. > This follows closely with the design of the general memory controller interface. > > Will such an interface looks sane and acceptable with everyone? > I have a couple of questions. Let's suppose we have a two socket system. Node 0 (DRAM+CPUs), Node 1 (DRAM+CPUs), Node 2 (PMEM on socket 0 along with Node 0) and Node 3 (PMEM on socket 1 along with Node 1). Based on the tier definition of this patch series, tier_0: {node_0, node_1} and tier_1: {node_2, node_3}. My questions are: 1) Can we assume that the cost of access within a tier will always be less than the cost of access from the tier? (node_0 <-> node_1 vs node_0 <-> node_2) 2) If yes to (1), is that assumption future proof? Will the future systems with DRAM over CXL support have the same characteristics? 3) Will the cost of access from tier_0 to tier_1 be uniform? (node_0 <-> node_2 vs node_0 <-> node_3). For jobs running on node_0, node_3 might be third tier and similarly for jobs running on node_1, node_2 might be third tier. The reason I am asking these questions is that the statically partitioning memory nodes into tiers will inherently add platform specific assumptions in the user API. Assumptions like: 1) Access within tier is always cheaper than across tier. 2) Access from tier_i to tier_i+1 has uniform cost. The reason I am more inclined towards having numa centric control is that we don't have to make these assumptions. Though the usability will be more difficult. Greg (CCed) has some ideas on making it better and we will share our proposal after polishing it a bit more.