Re: [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Tim Chen <tim.c.chen@linux.intel.com>
To: Johannes Weiner <hannes@cmpxchg.org>
Cc: linux-mm@kvack.org, Hao Wang <haowang3@fb.com>,
	Abhishek Dhanotia <abhishekd@fb.com>,
	"Huang, Ying" <ying.huang@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Yang Shi <yang.shi@linux.alibaba.com>,
	 Davidlohr Bueso <dave@stgolabs.net>,
	Adam Manzanares <a.manzanares@samsung.com>,
	linux-kernel@vger.kernel.org,  kernel-team@fb.com,
	Hasan Al Maruf <hasanalmaruf@fb.com>
Subject: Re: [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes
Date: Wed, 08 Jun 2022 16:40:53 -0700	[thread overview]
Message-ID: <aabc9a7645ce50f706ac117e6e8fc0f15a967c6c.camel@linux.intel.com> (raw)
In-Reply-To: <YqD0/tzFwXvJ1gK6@cmpxchg.org>

On Wed, 2022-06-08 at 15:14 -0400, Johannes Weiner wrote:
> Hi Tim,
> 
> On Wed, Jun 08, 2022 at 11:15:27AM -0700, Tim Chen wrote:
> > On Tue, 2022-06-07 at 13:19 -0400, Johannes Weiner wrote:
> > >  /* Do dynamic interleaving for a process */
> > >  static unsigned interleave_nodes(struct mempolicy *policy)
> > >  {
> > >  	unsigned next;
> > >  	struct task_struct *me = current;
> > >  
> > > -	next = next_node_in(me->il_prev, policy->nodes);
> > > +	if (numa_tier_interleave[0] > 1 || numa_tier_interleave[1] > 1) {
> > 
> > When we have three memory tiers, do we expect an N:M:K policy?
> > Like interleaving between DDR5, DDR4 and PMEM memory.
> > Or we expect an N:M policy still by interleaving between two specific tiers?
> 
> In the context of the proposed 'explicit tiers' interface, I think it
> would make sense to have a per-tier 'interleave_ratio knob. Because
> the ratio is configured based on hardware properties, it can be
> configured meaningfully for the entire tier hierarchy, even if
> individual tasks or vmas interleave over only a subset of nodes.

I think that makes sense.  So if have 3 tiers of memory whose bandwidth ratio are
4:2:1, then it makes sense to interleave according to this ratio, even if we choose
to interleave for a subset of nodes.  Say between tier 1 and tier 3, the
interleave ratio will be 4:1 as I can read 4 lines of data from tier 3 while
I got 1 line of data from tier 3.

> 
> > The other question is whether we will need multiple interleave policies depending
> > on cgroup?
> > One policy could be interleave between tier1, tier2, tier3.
> > Another could be interleave between tier1 and tier2.
> 
> This is a good question.
> 
> One thing that has defined cgroup development in recent years is the
> concept of "work conservation". Moving away from fixed limits and hard
> partitioning, cgroups are increasingly configured with weights,
> priorities, and guarantees (cpu.weight, io.latency/io.cost.qos,
> memory.low). These weights and priorities are enforced when cgroups
> are directly competing over a resource; but if there is no contention,
> any active cgroup, regardless of priority, has full access to the
> surplus (which could be the entire host if the main load is idle).
> 
> With that background, yes, we likely want some way of prioritizing
> tier access when multiple cgroups are competing. But we ALSO want the
> ability to say that if resources are NOT contended, a cgroup should
> interleave memory over all tiers according to optimal bandwidth.
> 
> That means that regardless of how the competitive cgroup rules for
> tier access end up looking like, it makes sense to have global
> interleaving weights based on hardware properties as proposed here.
> 
> The effective cgroup IL ratio for each tier could then be something
> like cgroup.tier_weight[tier] * tier/interleave_weight.

Thanks. I agree that a interleave ratio that's proportional to hardware
properties of each tier will suffice.

Tim

next prev parent reply	other threads:[~2022-06-08 23:41 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-07 17:19 [PATCH] mm: mempolicy: N:M interleave policy for tiered memory nodes Johannes Weiner
2022-06-08  4:19 ` Ying Huang
2022-06-08 14:16   ` Johannes Weiner
2022-06-08 18:15 ` Tim Chen
2022-06-08 19:14   ` Johannes Weiner
2022-06-08 23:40     ` Tim Chen [this message]
2022-06-08 23:44 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aabc9a7645ce50f706ac117e6e8fc0f15a967c6c.camel@linux.intel.com \
    --to=tim.c.chen@linux.intel.com \
    --cc=a.manzanares@samsung.com \
    --cc=abhishekd@fb.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=haowang3@fb.com \
    --cc=hasanalmaruf@fb.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=yang.shi@linux.alibaba.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).