linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Wei Yang <richard.weiyang@gmail.com>,
	Miaohe Lin <linmiaohe@huawei.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline
Date: Thu, 28 Apr 2022 14:30:20 +0200	[thread overview]
Message-ID: <0d2853f6-66e5-251a-2d9e-c229f0ebcd5e@redhat.com> (raw)
In-Reply-To: <20220307150725.6810-3-osalvador@suse.de>

On 07.03.22 16:07, Oscar Salvador wrote:
> All possible nodes are now pre-allocated at boot time by free_area_init()->
> free_area_init_node(), and those which are to be hot-plugged are initialized
> later on by hotadd_init_pgdat()->free_area_init_core_hotplug() when they
> become online.
> 
> free_area_init_core_hotplug() calls pgdat_init_internals() and
> zone_init_internals() to initialize some internal data structures
> and zeroes a few pgdat fields.
> 
> But we do already call pgdat_init_internals() and zone_init_internals()
> for all possible nodes back in free_area_init_core(), and pgdat fields
> are already zeroed because the pre-allocation memsets with 0s the
> structure, meaning we do not need to repeat the process when
> the node becomes online.
> 
> So initialize it only once when booting, and make sure to reset
> the fields we care about to 0 when the node goes empty.
> The only thing we need to check for is to allocate per_cpu_nodestats
> struct the very first time this node goes online.
> 
> node_reset_state() is the function in charge of resetting pgdat's fields,
> and it is called when offline_pages() detects that the node becomes empty
> worth of memory.
> 
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> ---
>  include/linux/memory_hotplug.h |  2 +-
>  mm/memory_hotplug.c            | 58 +++++++++++++++++++++-------------
>  mm/page_alloc.c                | 49 +++++-----------------------
>  3 files changed, 45 insertions(+), 64 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 76bf2de86def..fcf4c9a023cc 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -319,7 +319,7 @@ extern void set_zone_contiguous(struct zone *zone);
>  extern void clear_zone_contiguous(struct zone *zone);
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
> -extern void __ref free_area_init_core_hotplug(struct pglist_data *pgdat);
> +extern bool pgdat_has_boot_nodestats(pg_data_t *pgdat);
>  extern int __add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory(int nid, u64 start, u64 size, mhp_t mhp_flags);
>  extern int add_memory_resource(int nid, struct resource *resource,
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index ddc62f8b591f..07cece9e22e4 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1164,18 +1164,18 @@ static void reset_node_present_pages(pg_data_t *pgdat)
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
>  static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  {
> -	struct pglist_data *pgdat;
> +	struct pglist_data *pgdat = NODE_DATA(nid);
>  
>  	/*
> -	 * NODE_DATA is preallocated (free_area_init) but its internal
> -	 * state is not allocated completely. Add missing pieces.
> -	 * Completely offline nodes stay around and they just need
> -	 * reintialization.
> +	 * NODE_DATA is preallocated (free_area_init), the only thing missing
> +	 * is to allocate its per_cpu_nodestats struct and to build node's
> +	 * zonelists. The allocation of per_cpu_nodestats only needs to be done
> +	 * the very first time this node is brought up, as we reset its state
> +	 * when all node's memory goes offline.
>  	 */
> -	pgdat = NODE_DATA(nid);
> -
> -	/* init node's zones as empty zones, we don't have any present pages.*/
> -	free_area_init_core_hotplug(pgdat);
> +	if (pgdat_has_boot_nodestats(pgdat))
> +		pgdat->per_cpu_nodestats = alloc_percpu_gfp(struct per_cpu_nodestat,
> +							    __GFP_ZERO);
>  
>  	/*
>  	 * The node we allocated has no zone fallback lists. For avoiding
> @@ -1183,15 +1183,6 @@ static pg_data_t __ref *hotadd_init_pgdat(int nid)
>  	 */
>  	build_all_zonelists(pgdat);
>  
> -	/*
> -	 * When memory is hot-added, all the memory is in offline state. So
> -	 * clear all zones' present_pages because they will be updated in
> -	 * online_pages() and offline_pages().
> -	 * TODO: should be in free_area_init_core_hotplug?
> -	 */
> -	reset_node_managed_pages(pgdat);
> -	reset_node_present_pages(pgdat);
> -
>  	return pgdat;
>  }
>  
> @@ -1799,6 +1790,30 @@ static void node_states_clear_node(int node, struct memory_notify *arg)
>  		node_clear_state(node, N_MEMORY);
>  }
>  
> +static void node_reset_state(int node)
> +{
> +	pg_data_t *pgdat = NODE_DATA(node);
> +	int cpu;
> +
> +	kswapd_stop(node);
> +	kcompactd_stop(node);
> +
> +	reset_node_managed_pages(pgdat);
> +	reset_node_present_pages(pgdat);
> +
> +	pgdat->nr_zones = 0;
> +	pgdat->kswapd_order = 0;
> +	pgdat->kswapd_highest_zoneidx = 0;
> +	pgdat->node_start_pfn = 0;


I'm confused why we have to mess with
* present pages
* managed pages
* node_start_pfn

here at all.

1) If there would be any present page left, calling node_reset_state()
would be a BUG.
2) If there would be any manged page left, calling node_reset_state()
would be a BUG.
3) node_start_pfn will be properly updated by
remove_pfn_range_from_zone()->update_pgdat_span()


To make it clearer, I *think* touching node_start_pfn is very wrong.

What if the node still has ZONE_DEVICE? They don't account towards
present pages but only towards spanned pages, and we're messing with the
start range.

remove_pfn_range_from_zone()->update_pgdat_span() should be the only
place that modifies the spanned range when offlining.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2022-04-28 12:30 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-07 15:07 [PATCH 0/3] A minor hotplug refactoring Oscar Salvador
2022-03-07 15:07 ` [PATCH 1/3] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
2022-04-28 12:13   ` David Hildenbrand
2022-05-05  9:09     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 2/3] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
2022-04-28 12:30   ` David Hildenbrand [this message]
2022-05-05 10:37     ` Oscar Salvador
2022-03-07 15:07 ` [PATCH 3/3] mm/memory_hotplug: Refactor hotadd_init_pgdat and try_online_node Oscar Salvador
2022-04-28 12:35   ` David Hildenbrand
2022-05-05  9:17     ` Oscar Salvador
2022-04-27 21:05 ` [PATCH 0/3] A minor hotplug refactoring Andrew Morton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0d2853f6-66e5-251a-2d9e-c229f0ebcd5e@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=richard.weiyang@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).