Re: [RFC] memory tiering: use small chunk size and more tiers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Alistair Popple <apopple@nvidia.com>,
	Bharata B Rao <bharata@amd.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	Hesham Almatary <hesham.almatary@huawei.com>,
	Jagdish Gediya <jvgediya.oss@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Michal Hocko <mhocko@kernel.org>, Tim Chen <tim.c.chen@intel.com>,
	Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>
Subject: Re: [RFC] memory tiering: use small chunk size and more tiers
Date: Fri, 28 Oct 2022 10:35:44 +0530	[thread overview]
Message-ID: <59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com> (raw)
In-Reply-To: <877d0kk5uf.fsf@yhuang6-desk2.ccr.corp.intel.com>

On 10/28/22 8:33 AM, Huang, Ying wrote:
> Hi, Aneesh,
> 
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
> 
>> On 10/27/22 12:29 PM, Huang Ying wrote:
>>> We need some way to override the system default memory tiers.  For
>>> the example system as follows,
>>>
>>> type		abstract distance
>>> ----		-----------------
>>> HBM		300
>>> DRAM		1000
>>> CXL_MEM		5000
>>> PMEM		5100
>>>
>>> Given the memory tier chunk size is 100, the default memory tiers
>>> could be,
>>>
>>> tier		abstract distance	types
>>>                 range
>>> ----		-----------------       -----
>>> 3		300-400			HBM
>>> 10		1000-1100		DRAM
>>> 50		5000-5100		CXL_MEM
>>> 51		5100-5200		PMEM
>>>
>>> If we want to group CXL MEM and PMEM into one tier, we have 2 choices.
>>>
>>> 1) Override the abstract distance of CXL_MEM or PMEM.  For example, if
>>> we change the abstract distance of PMEM to 5050, the memory tiers
>>> become,
>>>
>>> tier		abstract distance	types
>>>                 range
>>> ----		-----------------       -----
>>> 3		300-400			HBM
>>> 10		1000-1100		DRAM
>>> 50		5000-5100		CXL_MEM, PMEM
>>>
>>> 2) Override the memory tier chunk size.  For example, if we change the
>>> memory tier chunk size to 200, the memory tiers become,
>>>
>>> tier		abstract distance	types
>>>                 range
>>> ----		-----------------       -----
>>> 1		200-400			HBM
>>> 5		1000-1200		DRAM
>>> 25		5000-5200		CXL_MEM, PMEM
>>>
>>> But after some thoughts, I think choice 2) may be not good.  The
>>> problem is that even if 2 abstract distances are almost same, they may
>>> be put in 2 tier if they sit in the different sides of the tier
>>> boundary.  For example, if the abstract distance of CXL_MEM is 4990,
>>> while the abstract distance of PMEM is 5010.  Although the difference
>>> of the abstract distances is only 20, CXL_MEM and PMEM will put in
>>> different tiers if the tier chunk size is 50, 100, 200, 250, 500, ....
>>> This makes choice 2) hard to be used, it may become tricky to find out
>>> the appropriate tier chunk size that satisfying all requirements.
>>>
>>
>> Shouldn't we wait for gaining experience w.r.t how we would end up
>> mapping devices with different latencies and bandwidth before tuning these values? 
> 
> Just want to discuss the overall design.
> 
>>> So I suggest to abandon choice 2) and use choice 1) only.  This makes
>>> the overall design and user space interface to be simpler and easier
>>> to be used.  The overall design of the abstract distance could be,
>>>
>>> 1. Use decimal for abstract distance and its chunk size.  This makes
>>>    them more user friendly.
>>>
>>> 2. Make the tier chunk size as small as possible.  For example, 10.
>>>    This will put different memory types in one memory tier only if their
>>>    performance is almost same by default.  And we will not provide the
>>>    interface to override the chunk size.
>>>
>>
>> this could also mean we can end up with lots of memory tiers with relative
>> smaller performance difference between them. Again it depends how HMAT
>> attributes will be used to map to abstract distance.
> 
> Per my understanding, there will not be many memory types in a system.
> So, there will not be many memory tiers too.  In most systems, there are
> only 2 or 3 memory tiers in the system, for example, HBM, DRAM, CXL,
> etc. 

So we don't need the chunk size to be 10 because we don't forsee us needing
to group devices into that many tiers. 

> Do you know systems with many memory types?  The basic idea is to
> put different memory types in different memory tiers by default.  If
> users want to group them, they can do that via overriding the abstract
> distance of some memory type.
> 

with small chunk size and depending on how we are going to derive abstract distance,
I am wondering whether we would end up with lots of memory tiers with no 
real value. Hence my suggestion to wait making a change like this till we have
code that map HMAT/CDAT attributes to abstract distance. 




>>
>>> 3. Make the abstract distance of normal DRAM large enough.  For
>>>    example, 1000, then 100 tiers can be defined below DRAM, this is
>>>    more than enough in practice.
>>
>> Why 100? Will we really have that many tiers below/faster than DRAM? As of now 
>> I see only HBM below it.
> 
> Yes.  100 is more than enough.  We just want to avoid to group different
> memory types by default.
> 
> Best Regards,
> Huang, Ying
>

next prev parent reply	other threads:[~2022-10-28  5:06 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-27  6:59 [RFC] memory tiering: use small chunk size and more tiers Huang Ying
2022-10-27 10:45 ` Aneesh Kumar K V
2022-10-28  3:03   ` Huang, Ying
2022-10-28  5:05     ` Aneesh Kumar K V [this message]
2022-10-28  5:46       ` Huang, Ying
2022-10-28  8:04         ` Bharata B Rao
2022-10-28  8:33           ` Huang, Ying
2022-10-28 13:53             ` Bharata B Rao
2022-10-31  1:33               ` Huang, Ying
2022-11-01 14:34                 ` Michal Hocko
2022-11-02  0:39                   ` Huang, Ying
2022-11-02  7:51                     ` Michal Hocko
2022-11-02  8:02                       ` Huang, Ying
2022-11-02  8:17                         ` Michal Hocko
2022-11-02  8:28                           ` Huang, Ying
2022-11-02  8:39                             ` Michal Hocko
2022-11-02  8:45                               ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com \
    --to=aneesh.kumar@linux.ibm.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=apopple@nvidia.com \
    --cc=bharata@amd.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave@stgolabs.net \
    --cc=hannes@cmpxchg.org \
    --cc=hesham.almatary@huawei.com \
    --cc=jvgediya.oss@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=tim.c.chen@intel.com \
    --cc=weixugc@google.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).