Re: [PATCH v2 1/1] memory tier: consolidate the initialization of memory tiers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: "Ho-Ren (Jack) Chuang" <horen.chuang@linux.dev>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: "Jonathan Cameron" <Jonathan.Cameron@huawei.com>,
	"Gregory Price" <gourry.memverge@gmail.com>,
	aneesh.kumar@linux.ibm.com, mhocko@suse.com, tj@kernel.org,
	john@jagalactic.com, "Eishan Mirakhur" <emirakhur@micron.com>,
	"Vinicius Tavares Petrucci" <vtavarespetr@micron.com>,
	"Ravis OpenSrc" <Ravis.OpenSrc@micron.com>,
	"Alistair Popple" <apopple@nvidia.com>,
	"Srinivasulu Thanneeru" <sthanneeru@micron.com>,
	"SeongJae Park" <sj@kernel.org>,
	"Rafael J.  Wysocki" <rafael@kernel.org>,
	"Len Brown" <lenb@kernel.org>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	"Dave Jiang" <dave.jiang@intel.com>,
	"Dan  Williams" <dan.j.williams@intel.com>,
	linux-acpi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, "Ho-Ren (Jack)  Chuang" <horenc@vt.edu>,
	"Ho-Ren (Jack) Chuang" <horenchuang@bytedance.com>,
	"Ho-Ren (Jack) Chuang" <horenchuang@gmail.com>,
	linux-cxl@vger.kernel.org, qemu-devel@nongnu.org
Subject: Re: [PATCH v2 1/1] memory tier: consolidate the initialization of memory tiers
Date: Tue, 02 Jul 2024 05:37:47 +0000	[thread overview]
Message-ID: <970686ea8a7aba6f5daa752a39609f5ab7a48a06@linux.dev> (raw)
In-Reply-To: <87tth9ofsi.fsf@yhuang6-desk2.ccr.corp.intel.com>

Hi Huang, Ying,

Thanks for your feedback and helpful suggestions. Replies inlined.


June 30, 2024 at 10:13 PM, "Huang, Ying" <ying.huang@intel.com> wrote:
> 
> Hi, Jack,
> 
> "Ho-Ren (Jack) Chuang" <horen.chuang@linux.dev> writes:
> 
> I suggest you to merge the [0/1] with the change log here. [0/1]
> 
> describes why do we need the patch. The below text describes some
> 
> details. Just don't use "---" to separate them. We need both parts in
> 
> the final commit message.
> 

Sounds good! I will merge them into 1 patch in the v3.

> > 
> > If we simply move the set_node_memory_tier() from memory_tier_init()
> > 
> >  to late_initcall(), it will result in HMAT not registering
> > 
> >  the mt_adistance_algorithm callback function, because
> > 
> >  set_node_memory_tier() is not performed during the memory tiering
> > 
> >  initialization phase, leading to a lack of correct default_dram
> > 
> >  information.
> > 
> >  Therefore, we introduced a nodemask to pass the information of the
> > 
> >  default DRAM nodes. The reason for not choosing to reuse
> > 
> >  default_dram_type->nodes is that it is not clean enough. So in the end,
> > 
> >  we use a __initdata variable, which is a variable that is released once
> > 
> >  initialization is complete, including both CPU and memory nodes for HMAT
> > 
> >  to iterate through.
> > 
> >  Besides, since default_dram_type may be checked/used during the
> > 
> >  initialization process of HMAT and drivers, it is better to keep the
> > 
> >  allocation of default_dram_type in memory_tier_init().
> > 
> 
> Why do we need it? IIRC, we have deleted its usage in hmat.c.
> 

Although default_dram_type is still used in set_node_memory_tier(),
I can totally remove this description to remove confusion.

> > 
> > Signed-off-by: Ho-Ren (Jack) Chuang <horenchuang@bytedance.com>
> > 
> >  Suggested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > 
> >  ---
> > 
> >  drivers/acpi/numa/hmat.c | 5 +--
> > 
> >  include/linux/memory-tiers.h | 2 ++
> > 
> >  mm/memory-tiers.c | 59 +++++++++++++++---------------------
> > 
> >  3 files changed, 28 insertions(+), 38 deletions(-)
> > 
> >  diff --git a/drivers/acpi/numa/hmat.c b/drivers/acpi/numa/hmat.c
> > 
> >  index 2c8ccc91ebe6..a2f9e7a4b479 100644
> > 
> >  --- a/drivers/acpi/numa/hmat.c
> > 
> >  +++ b/drivers/acpi/numa/hmat.c
> > 
> >  @@ -940,10 +940,7 @@ static int hmat_set_default_dram_perf(void)
> > 
> >  struct memory_target *target;
> > 
> >  struct access_coordinate *attrs;
> > 
> >  
> > 
> >  - if (!default_dram_type)
> > 
> >  - return -EIO;
> > 
> >  -
> > 
> >  - for_each_node_mask(nid, default_dram_type->nodes) {
> > 
> >  + for_each_node_mask(nid, default_dram_nodes) {
> > 
> >  pxm = node_to_pxm(nid);
> > 
> >  target = find_mem_target(pxm);
> > 
> >  if (!target)
> > 
> >  diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> > 
> >  index 0d70788558f4..fa61ad9c4d75 100644
> > 
> >  --- a/include/linux/memory-tiers.h
> > 
> >  +++ b/include/linux/memory-tiers.h
> > 
> >  @@ -38,6 +38,7 @@ struct access_coordinate;
> > 
> >  #ifdef CONFIG_NUMA
> > 
> >  extern bool numa_demotion_enabled;
> > 
> >  extern struct memory_dev_type *default_dram_type;
> > 
> 
> Can we remove the above line?
> 

Yes, you are right. Good catch, thanks! Will remove it in the v3.

> > 
> > +extern nodemask_t default_dram_nodes __initdata;
> > 
> 
> We don't need to use __initdata in variable declaration.
> 

Thank you for your guidance! Will remove __initdata it in the v3.

> > 
> > struct memory_dev_type *alloc_memory_type(int adistance);
> > 
> >  void put_memory_type(struct memory_dev_type *memtype);
> > 
> >  void init_node_memory_type(int node, struct memory_dev_type *default_type);
> > 
> >  @@ -76,6 +77,7 @@ static inline bool node_is_toptier(int node)
> > 
> >  
> > 
> >  #define numa_demotion_enabled false
> > 
> >  #define default_dram_type NULL
> > 
> >  +#define default_dram_nodes NODE_MASK_NONE
> > 
> 
> Should we use <tab> after "default_dram_nodes"?
> 

Yes, thanks for the reminder. Will fix it in the v3.

> > 
> > /*
> > 
> >  * CONFIG_NUMA implementation returns non NULL error.
> > 
> >  */
> > 
> >  diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> > 
> >  index 6632102bd5c9..a19a90c3ad36 100644
> > 
> >  --- a/mm/memory-tiers.c
> > 
> >  +++ b/mm/memory-tiers.c
> > 
> >  @@ -43,6 +43,7 @@ static LIST_HEAD(memory_tiers);
> > 
> >  static LIST_HEAD(default_memory_types);
> > 
> >  static struct node_memory_type_map node_memory_types[MAX_NUMNODES];
> > 
> >  struct memory_dev_type *default_dram_type;
> > 
> >  +nodemask_t default_dram_nodes __initdata = NODE_MASK_NONE;
> > 
> >  
> > 
> >  static const struct bus_type memory_tier_subsys = {
> > 
> >  .name = "memory_tiering",
> > 
> >  @@ -671,28 +672,38 @@ EXPORT_SYMBOL_GPL(mt_put_memory_types);
> > 
> >  
> > 
> >  /*
> > 
> >  * This is invoked via `late_initcall()` to initialize memory tiers for
> > 
> >  - * CPU-less memory nodes after driver initialization, which is
> > 
> >  - * expected to provide `adistance` algorithms.
> > 
> >  + * memory nodes, both with and without CPUs. After the initialization of
> > 
> >  + * firmware and devices, adistance algorithms are expected to be provided.
> > 
> >  */
> > 
> >  static int __init memory_tier_late_init(void)
> > 
> >  {
> > 
> >  int nid;
> > 
> >  + struct memory_tier *memtier;
> > 
> >  
> > 
> >  + get_online_mems();
> > 
> >  guard(mutex)(&memory_tier_lock);
> > 
> >  + /*
> > 
> >  + * Look at all the existing and uninitialized N_MEMORY nodes and
> > 
> >  + * add them to default memory tier or to a tier if we already have
> > 
> >  + * memory types assigned.
> > 
> >  + */
> > 
> 
> If the memory type of the node has been assigned, we will skip it in the
> 
> following code. So, I think that we need to revise the comments.
> 

You are right, the new version in the v3 will be:
/* Assign each uninitialized N_MEMORY node to a memory tier. */

> > 
> > for_each_node_state(nid, N_MEMORY) {
> > 
> >  /*
> > 
> >  - * Some device drivers may have initialized memory tiers
> > 
> >  - * between `memory_tier_init()` and `memory_tier_late_init()`,
> > 
> >  - * potentially bringing online memory nodes and
> > 
> >  - * configuring memory tiers. Exclude them here.
> > 
> >  + * Some device drivers may have initialized
> > 
> >  + * memory tiers, potentially bringing memory nodes
> > 
> >  + * online and configuring memory tiers.
> > 
> >  + * Exclude them here.
> > 
> >  */
> > 
> >  if (node_memory_types[nid].memtype)
> > 
> >  continue;
> > 
> >  
> > 
> >  - set_node_memory_tier(nid);
> > 
> >  + memtier = set_node_memory_tier(nid);
> > 
> >  + if (IS_ERR(memtier))
> > 
> >  + /* Continue with memtiers we are able to setup. */
> > 
> >  + break;
> > 
> >  }
> > 
> >  -
> > 
> >  establish_demotion_targets();
> > 
> >  + put_online_mems();
> > 
> >  
> > 
> >  return 0;
> > 
> >  }
> > 
> >  @@ -875,8 +886,7 @@ static int __meminit memtier_hotplug_callback(struct notifier_block *self,
> > 
> >  
> > 
> >  static int __init memory_tier_init(void)
> > 
> >  {
> > 
> >  - int ret, node;
> > 
> >  - struct memory_tier *memtier;
> > 
> >  + int ret;
> > 
> >  
> > 
> >  ret = subsys_virtual_register(&memory_tier_subsys, NULL);
> > 
> >  if (ret)
> > 
> >  @@ -887,7 +897,8 @@ static int __init memory_tier_init(void)
> > 
> >  GFP_KERNEL);
> > 
> >  WARN_ON(!node_demotion);
> > 
> >  #endif
> > 
> >  - mutex_lock(&memory_tier_lock);
> > 
> >  +
> > 
> >  + guard(mutex)(&memory_tier_lock);
> > 
> >  /*
> > 
> >  * For now we can have 4 faster memory tiers with smaller adistance
> > 
> >  * than default DRAM tier.
> > 
> >  @@ -897,29 +908,9 @@ static int __init memory_tier_init(void)
> > 
> >  if (IS_ERR(default_dram_type))
> > 
> >  panic("%s() failed to allocate default DRAM tier\n", __func__);
> > 
> >  
> > 
> >  - /*
> > 
> >  - * Look at all the existing N_MEMORY nodes and add them to
> > 
> >  - * default memory tier or to a tier if we already have memory
> > 
> >  - * types assigned.
> > 
> >  - */
> > 
> >  - for_each_node_state(node, N_MEMORY) {
> > 
> >  - if (!node_state(node, N_CPU))
> > 
> >  - /*
> > 
> >  - * Defer memory tier initialization on
> > 
> >  - * CPUless numa nodes. These will be initialized
> > 
> >  - * after firmware and devices are initialized.
> > 
> >  - */
> > 
> >  - continue;
> > 
> >  -
> > 
> >  - memtier = set_node_memory_tier(node);
> > 
> >  - if (IS_ERR(memtier))
> > 
> >  - /*
> > 
> >  - * Continue with memtiers we are able to setup
> > 
> >  - */
> > 
> >  - break;
> > 
> >  - }
> > 
> >  - establish_demotion_targets();
> > 
> >  - mutex_unlock(&memory_tier_lock);
> > 
> >  + /* Record nodes with memory and CPU to set default DRAM performance. */
> > 
> >  + nodes_and(default_dram_nodes, node_states[N_MEMORY],
> > 
> >  + node_states[N_CPU]);
> > 
> >  
> > 
> >  hotplug_memory_notifier(memtier_hotplug_callback, MEMTIER_HOTPLUG_PRI);
> > 
> >  return 0;
> > 
> 
> --
> 
> Best Regards,
> 
> Huang, Ying
>

--
Best Regards,
Ho-Ren (Jack) Chuang

next prev parent reply	other threads:[~2024-07-02  5:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-28  6:09 [PATCH v2 0/1] memory tier: consolidate the initialization of memory tiers Ho-Ren (Jack) Chuang
2024-06-28  6:09 ` [PATCH v2 1/1] " Ho-Ren (Jack) Chuang
2024-07-01  5:13   ` Huang, Ying
2024-07-02  5:37     ` Ho-Ren (Jack) Chuang [this message]
2024-07-02 13:25   ` Jonathan Cameron via
2024-07-03  8:33     ` Ho-Ren (Jack) Chuang
2024-07-04 17:08       ` Jonathan Cameron via
2024-07-03  8:51     ` Huang, Ying
2024-07-04 17:09       ` Jonathan Cameron via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=970686ea8a7aba6f5daa752a39609f5ab7a48a06@linux.dev \
    --to=horen.chuang@linux.dev \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Ravis.OpenSrc@micron.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=apopple@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=emirakhur@micron.com \
    --cc=gourry.memverge@gmail.com \
    --cc=horenc@vt.edu \
    --cc=horenchuang@bytedance.com \
    --cc=horenchuang@gmail.com \
    --cc=john@jagalactic.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rafael@kernel.org \
    --cc=sj@kernel.org \
    --cc=sthanneeru@micron.com \
    --cc=tj@kernel.org \
    --cc=vtavarespetr@micron.com \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).