From: Aneesh Kumar K V <aneesh.kumar@linux.ibm.com>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Alistair Popple <apopple@nvidia.com>,
Bharata B Rao <bharata@amd.com>,
Dan Williams <dan.j.williams@intel.com>,
Dave Hansen <dave.hansen@intel.com>,
Davidlohr Bueso <dave@stgolabs.net>,
Hesham Almatary <hesham.almatary@huawei.com>,
Jagdish Gediya <jvgediya.oss@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
Michal Hocko <mhocko@kernel.org>, Tim Chen <tim.c.chen@intel.com>,
Wei Xu <weixugc@google.com>, Yang Shi <shy828301@gmail.com>
Subject: Re: [RFC] memory tiering: use small chunk size and more tiers
Date: Fri, 28 Oct 2022 10:35:44 +0530 [thread overview]
Message-ID: <59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com> (raw)
In-Reply-To: <877d0kk5uf.fsf@yhuang6-desk2.ccr.corp.intel.com>
On 10/28/22 8:33 AM, Huang, Ying wrote:
> Hi, Aneesh,
>
> Aneesh Kumar K V <aneesh.kumar@linux.ibm.com> writes:
>
>> On 10/27/22 12:29 PM, Huang Ying wrote:
>>> We need some way to override the system default memory tiers. For
>>> the example system as follows,
>>>
>>> type abstract distance
>>> ---- -----------------
>>> HBM 300
>>> DRAM 1000
>>> CXL_MEM 5000
>>> PMEM 5100
>>>
>>> Given the memory tier chunk size is 100, the default memory tiers
>>> could be,
>>>
>>> tier abstract distance types
>>> range
>>> ---- ----------------- -----
>>> 3 300-400 HBM
>>> 10 1000-1100 DRAM
>>> 50 5000-5100 CXL_MEM
>>> 51 5100-5200 PMEM
>>>
>>> If we want to group CXL MEM and PMEM into one tier, we have 2 choices.
>>>
>>> 1) Override the abstract distance of CXL_MEM or PMEM. For example, if
>>> we change the abstract distance of PMEM to 5050, the memory tiers
>>> become,
>>>
>>> tier abstract distance types
>>> range
>>> ---- ----------------- -----
>>> 3 300-400 HBM
>>> 10 1000-1100 DRAM
>>> 50 5000-5100 CXL_MEM, PMEM
>>>
>>> 2) Override the memory tier chunk size. For example, if we change the
>>> memory tier chunk size to 200, the memory tiers become,
>>>
>>> tier abstract distance types
>>> range
>>> ---- ----------------- -----
>>> 1 200-400 HBM
>>> 5 1000-1200 DRAM
>>> 25 5000-5200 CXL_MEM, PMEM
>>>
>>> But after some thoughts, I think choice 2) may be not good. The
>>> problem is that even if 2 abstract distances are almost same, they may
>>> be put in 2 tier if they sit in the different sides of the tier
>>> boundary. For example, if the abstract distance of CXL_MEM is 4990,
>>> while the abstract distance of PMEM is 5010. Although the difference
>>> of the abstract distances is only 20, CXL_MEM and PMEM will put in
>>> different tiers if the tier chunk size is 50, 100, 200, 250, 500, ....
>>> This makes choice 2) hard to be used, it may become tricky to find out
>>> the appropriate tier chunk size that satisfying all requirements.
>>>
>>
>> Shouldn't we wait for gaining experience w.r.t how we would end up
>> mapping devices with different latencies and bandwidth before tuning these values?
>
> Just want to discuss the overall design.
>
>>> So I suggest to abandon choice 2) and use choice 1) only. This makes
>>> the overall design and user space interface to be simpler and easier
>>> to be used. The overall design of the abstract distance could be,
>>>
>>> 1. Use decimal for abstract distance and its chunk size. This makes
>>> them more user friendly.
>>>
>>> 2. Make the tier chunk size as small as possible. For example, 10.
>>> This will put different memory types in one memory tier only if their
>>> performance is almost same by default. And we will not provide the
>>> interface to override the chunk size.
>>>
>>
>> this could also mean we can end up with lots of memory tiers with relative
>> smaller performance difference between them. Again it depends how HMAT
>> attributes will be used to map to abstract distance.
>
> Per my understanding, there will not be many memory types in a system.
> So, there will not be many memory tiers too. In most systems, there are
> only 2 or 3 memory tiers in the system, for example, HBM, DRAM, CXL,
> etc.
So we don't need the chunk size to be 10 because we don't forsee us needing
to group devices into that many tiers.
> Do you know systems with many memory types? The basic idea is to
> put different memory types in different memory tiers by default. If
> users want to group them, they can do that via overriding the abstract
> distance of some memory type.
>
with small chunk size and depending on how we are going to derive abstract distance,
I am wondering whether we would end up with lots of memory tiers with no
real value. Hence my suggestion to wait making a change like this till we have
code that map HMAT/CDAT attributes to abstract distance.
>>
>>> 3. Make the abstract distance of normal DRAM large enough. For
>>> example, 1000, then 100 tiers can be defined below DRAM, this is
>>> more than enough in practice.
>>
>> Why 100? Will we really have that many tiers below/faster than DRAM? As of now
>> I see only HBM below it.
>
> Yes. 100 is more than enough. We just want to avoid to group different
> memory types by default.
>
> Best Regards,
> Huang, Ying
>
next prev parent reply other threads:[~2022-10-28 5:06 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-27 6:59 [RFC] memory tiering: use small chunk size and more tiers Huang Ying
2022-10-27 10:45 ` Aneesh Kumar K V
2022-10-28 3:03 ` Huang, Ying
2022-10-28 5:05 ` Aneesh Kumar K V [this message]
2022-10-28 5:46 ` Huang, Ying
2022-10-28 8:04 ` Bharata B Rao
2022-10-28 8:33 ` Huang, Ying
2022-10-28 13:53 ` Bharata B Rao
2022-10-31 1:33 ` Huang, Ying
2022-11-01 14:34 ` Michal Hocko
2022-11-02 0:39 ` Huang, Ying
2022-11-02 7:51 ` Michal Hocko
2022-11-02 8:02 ` Huang, Ying
2022-11-02 8:17 ` Michal Hocko
2022-11-02 8:28 ` Huang, Ying
2022-11-02 8:39 ` Michal Hocko
2022-11-02 8:45 ` Huang, Ying
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com \
--to=aneesh.kumar@linux.ibm.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=akpm@linux-foundation.org \
--cc=apopple@nvidia.com \
--cc=bharata@amd.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=dave@stgolabs.net \
--cc=hannes@cmpxchg.org \
--cc=hesham.almatary@huawei.com \
--cc=jvgediya.oss@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=shy828301@gmail.com \
--cc=tim.c.chen@intel.com \
--cc=weixugc@google.com \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).