cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron-aYUidmrrA3LQT0dZR+AlfA@public.gmane.org>
To: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
	Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Andrew Morton
	<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
	Dave Hansen <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Ying Huang <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	Dan Williams
	<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
	Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
	Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
	Wei Xu <weixugc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory
Date: Wed, 14 Apr 2021 09:59:58 +0100	[thread overview]
Message-ID: <20210414095958.000008c4@Huawei.com> (raw)
In-Reply-To: <CALvZod4zXB6-3Mshu_TnTsQaDErfYkPTw9REYNRptSvPSRmKVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Mon, 12 Apr 2021 12:20:22 -0700
Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:

> On Fri, Apr 9, 2021 at 4:26 PM Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> >
> >
> > On 4/8/21 4:52 AM, Michal Hocko wrote:
> >  
> > >> The top tier memory used is reported in
> > >>
> > >> memory.toptier_usage_in_bytes
> > >>
> > >> The amount of top tier memory usable by each cgroup without
> > >> triggering page reclaim is controlled by the
> > >>
> > >> memory.toptier_soft_limit_in_bytes  
> > >  
> >
> > Michal,
> >
> > Thanks for your comments.  I will like to take a step back and
> > look at the eventual goal we envision: a mechanism to partition the
> > tiered memory between the cgroups.
> >
> > A typical use case may be a system with two set of tasks.
> > One set of task is very latency sensitive and we desire instantaneous
> > response from them. Another set of tasks will be running batch jobs
> > were latency and performance is not critical.   In this case,
> > we want to carve out enough top tier memory such that the working set
> > of the latency sensitive tasks can fit entirely in the top tier memory.
> > The rest of the top tier memory can be assigned to the background tasks.
> >
> > To achieve such cgroup based tiered memory management, we probably want
> > something like the following.
> >
> > For generalization let's say that there are N tiers of memory t_0, t_1 ... t_N-1,
> > where tier t_0 sits at the top and demotes to the lower tier.
> > We envision for this top tier memory t0 the following knobs and counters
> > in the cgroup memory controller
> >
> > memory_t0.current       Current usage of tier 0 memory by the cgroup.
> >
> > memory_t0.min           If tier 0 memory used by the cgroup falls below this low
> >                         boundary, the memory will not be subjected to demotion
> >                         to lower tiers to free up memory at tier 0.
> >
> > memory_t0.low           Above this boundary, the tier 0 memory will be subjected
> >                         to demotion.  The demotion pressure will be proportional
> >                         to the overage.
> >
> > memory_t0.high          If tier 0 memory used by the cgroup exceeds this high
> >                         boundary, allocation of tier 0 memory by the cgroup will
> >                         be throttled. The tier 0 memory used by this cgroup
> >                         will also be subjected to heavy demotion.
> >
> > memory_t0.max           This will be a hard usage limit of tier 0 memory on the cgroup.
> >
> > If needed, memory_t[12...].current/min/low/high for additional tiers can be added.
> > This follows closely with the design of the general memory controller interface.
> >
> > Will such an interface looks sane and acceptable with everyone?
> >  
> 
> I have a couple of questions. Let's suppose we have a two socket
> system. Node 0 (DRAM+CPUs), Node 1 (DRAM+CPUs), Node 2 (PMEM on socket
> 0 along with Node 0) and Node 3 (PMEM on socket 1 along with Node 1).
> Based on the tier definition of this patch series, tier_0: {node_0,
> node_1} and tier_1: {node_2, node_3}.
> 
> My questions are:
> 
> 1) Can we assume that the cost of access within a tier will always be
> less than the cost of access from the tier? (node_0 <-> node_1 vs
> node_0 <-> node_2)

No in large systems even it we can make this assumption in 2 socket ones.

> 2) If yes to (1), is that assumption future proof? Will the future
> systems with DRAM over CXL support have the same characteristics?
> 3) Will the cost of access from tier_0 to tier_1 be uniform? (node_0
> <-> node_2 vs node_0 <-> node_3). For jobs running on node_0, node_3
> might be third tier and similarly for jobs running on node_1, node_2
> might be third tier.
> 
> The reason I am asking these questions is that the statically
> partitioning memory nodes into tiers will inherently add platform
> specific assumptions in the user API.

Absolutely agree.

> 
> Assumptions like:
> 1) Access within tier is always cheaper than across tier.
> 2) Access from tier_i to tier_i+1 has uniform cost.
> 
> The reason I am more inclined towards having numa centric control is
> that we don't have to make these assumptions. Though the usability
> will be more difficult. Greg (CCed) has some ideas on making it better
> and we will share our proposal after polishing it a bit more.
> 

Sounds good, will look out for that.

Jonathan


  parent reply	other threads:[~2021-04-14  8:59 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-05 17:08 [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Tim Chen
     [not found] ` <cover.1617642417.git.tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-05 17:08   ` [RFC PATCH v1 01/11] mm: Define top tier memory node mask Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 02/11] mm: Add soft memory limit for mem cgroup Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 04/11] mm: Report top tier memory usage in sysfs Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 05/11] mm: Add soft_limit_top_tier tree for mem cgroup Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 06/11] mm: Handle top tier memory in cgroup soft limit memory tree utilities Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 07/11] mm: Account the total top tier memory in use Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 08/11] mm: Add toptier option for mem_cgroup_soft_limit_reclaim() Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 09/11] mm: Use kswapd to demote pages when toptier memory is tight Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 10/11] mm: Set toptier_scale_factor via sysctl Tim Chen
2021-04-05 17:08   ` [RFC PATCH v1 11/11] mm: Wakeup kswapd if toptier memory need soft reclaim Tim Chen
2021-04-06  9:08   ` [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Michal Hocko
     [not found]     ` <YGwlGrHtDJPQF7UG-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-07 22:33       ` Tim Chen
     [not found]         ` <c615a610-eb4b-7e1e-16d1-4bc12938b08a-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-08 11:52           ` Michal Hocko
     [not found]             ` <YG7ugXZZ9BcXyGGk-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-09 23:26               ` Tim Chen
     [not found]                 ` <58e5dcc9-c134-78de-6965-7980f8596b57-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-12 19:20                   ` Shakeel Butt
     [not found]                     ` <CALvZod4zXB6-3Mshu_TnTsQaDErfYkPTw9REYNRptSvPSRmKVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-14  8:59                       ` Jonathan Cameron [this message]
2021-04-15  0:42                       ` Tim Chen
2021-04-13  2:15                   ` Huang, Ying
2021-04-13  8:33                   ` Michal Hocko
2021-04-12 14:03               ` Shakeel Butt
2021-04-08 17:18   ` Shakeel Butt
2021-04-08 18:00     ` Yang Shi
     [not found]       ` <CAHbLzkrPD6s9vRy89cgQ36e+1cs6JbLqV84se7nnvP9MByizXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-08 20:29         ` Shakeel Butt
     [not found]           ` <CALvZod69-GcS2W57hAUvjbWBCD6B2dTeVsFbtpQuZOM2DphwCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-08 20:50             ` Yang Shi
     [not found]               ` <CAHbLzkoce41b-pJ5x=6nRhex_xBdC-+cYACBw9HKtA87H71A-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-12 14:03                 ` Shakeel Butt
2021-04-09  7:24             ` Michal Hocko
     [not found]               ` <YHABLBYU0UgzwOZi-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-15 22:31                 ` Tim Chen
     [not found]                   ` <4a864946-a316-3d9c-8780-64c6281276d1-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-16  6:38                     ` Michal Hocko
2021-04-14 23:22             ` Tim Chen
2021-04-09  2:58         ` Huang, Ying
     [not found]           ` <87eefkxiys.fsf-fFUE1NP8JkwztNwN1K6W+PooFf0ArEBIu+b9c/7xato@public.gmane.org>
2021-04-09 20:50             ` Yang Shi
     [not found]     ` <CALvZod7StYJCPnWRNLnYQV8S5CBLtE0w4r2rH-wZzNs9jGJSRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-15 22:25       ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 03/11] mm: Account the top tier memory usage per cgroup Tim Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210414095958.000008c4@Huawei.com \
    --to=jonathan.cameron-ayuidmrra3lqt0dzr+alfa@public.gmane.org \
    --cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
    --cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    --cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
    --cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
    --cc=mhocko-IBi9RG/b67k@public.gmane.org \
    --cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
    --cc=weixugc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    --cc=ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).