From: Jonathan Cameron <Jonathan.Cameron-aYUidmrrA3LQT0dZR+AlfA@public.gmane.org>
To: Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>,
Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>,
Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
Andrew Morton
<akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>,
Dave Hansen <dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Ying Huang <ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Dan Williams
<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
David Rientjes <rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Linux MM <linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org>,
Cgroups <cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
LKML <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
Greg Thelen <gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>,
Wei Xu <weixugc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Subject: Re: [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory
Date: Wed, 14 Apr 2021 09:59:58 +0100 [thread overview]
Message-ID: <20210414095958.000008c4@Huawei.com> (raw)
In-Reply-To: <CALvZod4zXB6-3Mshu_TnTsQaDErfYkPTw9REYNRptSvPSRmKVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
On Mon, 12 Apr 2021 12:20:22 -0700
Shakeel Butt <shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org> wrote:
> On Fri, Apr 9, 2021 at 4:26 PM Tim Chen <tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org> wrote:
> >
> >
> > On 4/8/21 4:52 AM, Michal Hocko wrote:
> >
> > >> The top tier memory used is reported in
> > >>
> > >> memory.toptier_usage_in_bytes
> > >>
> > >> The amount of top tier memory usable by each cgroup without
> > >> triggering page reclaim is controlled by the
> > >>
> > >> memory.toptier_soft_limit_in_bytes
> > >
> >
> > Michal,
> >
> > Thanks for your comments. I will like to take a step back and
> > look at the eventual goal we envision: a mechanism to partition the
> > tiered memory between the cgroups.
> >
> > A typical use case may be a system with two set of tasks.
> > One set of task is very latency sensitive and we desire instantaneous
> > response from them. Another set of tasks will be running batch jobs
> > were latency and performance is not critical. In this case,
> > we want to carve out enough top tier memory such that the working set
> > of the latency sensitive tasks can fit entirely in the top tier memory.
> > The rest of the top tier memory can be assigned to the background tasks.
> >
> > To achieve such cgroup based tiered memory management, we probably want
> > something like the following.
> >
> > For generalization let's say that there are N tiers of memory t_0, t_1 ... t_N-1,
> > where tier t_0 sits at the top and demotes to the lower tier.
> > We envision for this top tier memory t0 the following knobs and counters
> > in the cgroup memory controller
> >
> > memory_t0.current Current usage of tier 0 memory by the cgroup.
> >
> > memory_t0.min If tier 0 memory used by the cgroup falls below this low
> > boundary, the memory will not be subjected to demotion
> > to lower tiers to free up memory at tier 0.
> >
> > memory_t0.low Above this boundary, the tier 0 memory will be subjected
> > to demotion. The demotion pressure will be proportional
> > to the overage.
> >
> > memory_t0.high If tier 0 memory used by the cgroup exceeds this high
> > boundary, allocation of tier 0 memory by the cgroup will
> > be throttled. The tier 0 memory used by this cgroup
> > will also be subjected to heavy demotion.
> >
> > memory_t0.max This will be a hard usage limit of tier 0 memory on the cgroup.
> >
> > If needed, memory_t[12...].current/min/low/high for additional tiers can be added.
> > This follows closely with the design of the general memory controller interface.
> >
> > Will such an interface looks sane and acceptable with everyone?
> >
>
> I have a couple of questions. Let's suppose we have a two socket
> system. Node 0 (DRAM+CPUs), Node 1 (DRAM+CPUs), Node 2 (PMEM on socket
> 0 along with Node 0) and Node 3 (PMEM on socket 1 along with Node 1).
> Based on the tier definition of this patch series, tier_0: {node_0,
> node_1} and tier_1: {node_2, node_3}.
>
> My questions are:
>
> 1) Can we assume that the cost of access within a tier will always be
> less than the cost of access from the tier? (node_0 <-> node_1 vs
> node_0 <-> node_2)
No in large systems even it we can make this assumption in 2 socket ones.
> 2) If yes to (1), is that assumption future proof? Will the future
> systems with DRAM over CXL support have the same characteristics?
> 3) Will the cost of access from tier_0 to tier_1 be uniform? (node_0
> <-> node_2 vs node_0 <-> node_3). For jobs running on node_0, node_3
> might be third tier and similarly for jobs running on node_1, node_2
> might be third tier.
>
> The reason I am asking these questions is that the statically
> partitioning memory nodes into tiers will inherently add platform
> specific assumptions in the user API.
Absolutely agree.
>
> Assumptions like:
> 1) Access within tier is always cheaper than across tier.
> 2) Access from tier_i to tier_i+1 has uniform cost.
>
> The reason I am more inclined towards having numa centric control is
> that we don't have to make these assumptions. Though the usability
> will be more difficult. Greg (CCed) has some ideas on making it better
> and we will share our proposal after polishing it a bit more.
>
Sounds good, will look out for that.
Jonathan
next prev parent reply other threads:[~2021-04-14 8:59 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-05 17:08 [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Tim Chen
[not found] ` <cover.1617642417.git.tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-05 17:08 ` [RFC PATCH v1 01/11] mm: Define top tier memory node mask Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 02/11] mm: Add soft memory limit for mem cgroup Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 04/11] mm: Report top tier memory usage in sysfs Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 05/11] mm: Add soft_limit_top_tier tree for mem cgroup Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 06/11] mm: Handle top tier memory in cgroup soft limit memory tree utilities Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 07/11] mm: Account the total top tier memory in use Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 08/11] mm: Add toptier option for mem_cgroup_soft_limit_reclaim() Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 09/11] mm: Use kswapd to demote pages when toptier memory is tight Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 10/11] mm: Set toptier_scale_factor via sysctl Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 11/11] mm: Wakeup kswapd if toptier memory need soft reclaim Tim Chen
2021-04-06 9:08 ` [RFC PATCH v1 00/11] Manage the top tier memory in a tiered memory Michal Hocko
[not found] ` <YGwlGrHtDJPQF7UG-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-07 22:33 ` Tim Chen
[not found] ` <c615a610-eb4b-7e1e-16d1-4bc12938b08a-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-08 11:52 ` Michal Hocko
[not found] ` <YG7ugXZZ9BcXyGGk-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-09 23:26 ` Tim Chen
[not found] ` <58e5dcc9-c134-78de-6965-7980f8596b57-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-12 19:20 ` Shakeel Butt
[not found] ` <CALvZod4zXB6-3Mshu_TnTsQaDErfYkPTw9REYNRptSvPSRmKVA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-14 8:59 ` Jonathan Cameron [this message]
2021-04-15 0:42 ` Tim Chen
2021-04-13 2:15 ` Huang, Ying
2021-04-13 8:33 ` Michal Hocko
2021-04-12 14:03 ` Shakeel Butt
2021-04-08 17:18 ` Shakeel Butt
2021-04-08 18:00 ` Yang Shi
[not found] ` <CAHbLzkrPD6s9vRy89cgQ36e+1cs6JbLqV84se7nnvP9MByizXA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-08 20:29 ` Shakeel Butt
[not found] ` <CALvZod69-GcS2W57hAUvjbWBCD6B2dTeVsFbtpQuZOM2DphwCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-08 20:50 ` Yang Shi
[not found] ` <CAHbLzkoce41b-pJ5x=6nRhex_xBdC-+cYACBw9HKtA87H71A-Q-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-12 14:03 ` Shakeel Butt
2021-04-09 7:24 ` Michal Hocko
[not found] ` <YHABLBYU0UgzwOZi-2MMpYkNvuYDjFM9bn6wA6Q@public.gmane.org>
2021-04-15 22:31 ` Tim Chen
[not found] ` <4a864946-a316-3d9c-8780-64c6281276d1-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
2021-04-16 6:38 ` Michal Hocko
2021-04-14 23:22 ` Tim Chen
2021-04-09 2:58 ` Huang, Ying
[not found] ` <87eefkxiys.fsf-fFUE1NP8JkwztNwN1K6W+PooFf0ArEBIu+b9c/7xato@public.gmane.org>
2021-04-09 20:50 ` Yang Shi
[not found] ` <CALvZod7StYJCPnWRNLnYQV8S5CBLtE0w4r2rH-wZzNs9jGJSRg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2021-04-15 22:25 ` Tim Chen
2021-04-05 17:08 ` [RFC PATCH v1 03/11] mm: Account the top tier memory usage per cgroup Tim Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210414095958.000008c4@Huawei.com \
--to=jonathan.cameron-ayuidmrra3lqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=dave.hansen-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
--cc=gthelen-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org \
--cc=mhocko-IBi9RG/b67k@public.gmane.org \
--cc=rientjes-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=shakeelb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=tim.c.chen-VuQAYsv1563Yd54FQh9/CA@public.gmane.org \
--cc=weixugc-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
--cc=ying.huang-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).