All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Oscar Salvador <osalvador@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline
Date: Wed, 22 Jun 2022 10:44:57 +0200	[thread overview]
Message-ID: <4dccc61d-d8fa-c2b2-d744-398b48ea8a89@redhat.com> (raw)
In-Reply-To: <YrKZsYNUAbfcC1CF@localhost.localdomain>

On 22.06.22 06:25, Oscar Salvador wrote:
> On Tue, Jun 21, 2022 at 09:59:07AM +0200, David Hildenbrand wrote:
>>> +static void node_reset_state(int node)
>>> +{
>>> +	pg_data_t *pgdat = NODE_DATA(node);
>>> +	int cpu;
>>> +
>>> +	kswapd_stop(node);
>>> +	kcompactd_stop(node);
>>> +
>>> +	pgdat->nr_zones = 0;
>>
>> ^ what is that? it should be "highest_zone_idx" and I don't see any
>> reason that we really need this.
> 
> Uhm, I thought we need to reset this, otherwise init_currently_empty_zone()
> might not set it to a right value:
> 
> ...
>  if (zone_idx > pgdat->nr_zones)
>     pgdat->nr_zones = zone_idx
> ...
> 
> At least we set it to 0 in free_area_init_core_hotplug() (before this patch).

Yeah, I don't think we need to, the nodes+zones should be empty either
way. Maybe a micro-optimization that doesn't really optimize anything
that much?

free_area_init_node() warns if it isn't reset, but my gut feeling is
this is completely unnecessary. We're not dealing with 200 zones ...

> 
>> To detect if a node is empty we can use pgdat_is_empty(). To detect if a
>> zone is empty we can use zone_is_empty().
>>
>> The usage of "pgdat->nr_zones" as an optimization is questionable,
>> especially when iterating over our handful of zones where most nodes
>> miss the *lower* zones like ZONE_DMA* in practice and have ZONE_NORMAL.
>>
>> Can we get rid of that and just check pgdat_is_empty() and
>> zone_is_empty() and iterate all applicable zones from 0..X?
> 
> So, lemme see if I get you.
> You mean to e.g: replace the following (code snippet from set_pgdat_percpu_threshold)
> 
>   for (i = 0; i < pgdat->nr_zones; i++) {
>            zone = &pgdat->node_zones[i];
> 
> 		    [some code]
>   }
> 
> with this:
> 
>   for (zid = 0; zid < MAX_NR_ZONES; i++) {
>             struct zone *zone = pgdat->node_zones + i;
> 
>             if (zone_is_empty(zone))
>                     continue; 
>   }

Yes, some places that are not interested in ZONE_DEVICE might want to
skip that completely. See below.

> 
> I guess we can, and I can see that we have a mix of both usages, so it might be
> good to consolidate one.
> And actually, I think we do the same amount of work, right? So not really an
> optimization in those pieces of code.

That's my understanding.

> 
> The only thing that unsettles me is the compaction part.
> We set pgdat->kcompactd_highest_zoneidx by checking pgdat->nr_zones, and use
> that as our compact_control->highest_zoneidx. (kcompactd->kcompactd_do_work)

I wonder why we would want to use ZONE_DEVICE there ...

move_pfn_range_to_zone()->init_currently_empty_zone() would set
pgdat->nr_zones = ZONE_DEVICE.

Which looks unnecessary.

> 
> Now, I do not really see any reason we could not adapt that code to not
> realy on pgdat->nr_zones, but I would have to check further how this
> interacts with highest_zoneidx down the road, and where else should
> we rewrite code.

I wonder if all we want is:


diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index aab70355d64f..905919683025 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -497,8 +497,9 @@ enum zone_type {
         * there can be false negatives).
         */
        ZONE_MOVABLE,
+       __MAX_NR_BUDDY_ZONES,
 #ifdef CONFIG_ZONE_DEVICE
-       ZONE_DEVICE,
+       ZONE_DEVICE = __MAX_NR_BUDDY_ZONES,
 #endif
        __MAX_NR_ZONES

diff --git a/tools/testing/memblock/linux/mmzone.h
b/tools/testing/memblock/linux/mmzone.h
index 7c2eb5c9bb54..2c3492239e45 100644
--- a/tools/testing/memblock/linux/mmzone.h
+++ b/tools/testing/memblock/linux/mmzone.h
@@ -17,6 +17,7 @@ enum zone_type {
 };

 #define MAX_NR_ZONES __MAX_NR_ZONES
+#define MAX_NR_BUDDY_ZONES __MAX_NR_BUDDY_ZONES
 #define MAX_ORDER 11
 #define MAX_ORDER_NR_PAGES (1 << (MAX_ORDER - 1))



Then, all relevant buddy-related code (compaction ...) can simply run

for (i = 0; i < MAX_NR_BUDDY_ZONES; i++)

or

for (i = MAX_NR_BUDDY_ZONES - 1; i >= 0; i--)

and check if the relevant zone is empty.


There will still be users of MAX_NR_ZONES, though, that have to consider
ZONE_DEVICE as well.

-- 
Thanks,

David / dhildenb



      reply	other threads:[~2022-06-22  8:45 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-21  4:17 [PATCH v2 0/2] Minor memoryhotplug refactoring Oscar Salvador
2022-06-21  4:17 ` [PATCH v2 1/2] mm/page_alloc: Do not calculate node's total pages and memmap pages when empty Oscar Salvador
2022-06-21  7:44   ` David Hildenbrand
2022-06-22  3:47     ` Oscar Salvador
2022-06-22  3:56       ` Muchun Song
2022-06-22  8:31         ` David Hildenbrand
2022-06-22  8:54           ` Muchun Song
2022-06-22 10:49             ` David Hildenbrand
2022-06-21  4:17 ` [PATCH v2 2/2] mm/memory_hotplug: Reset node's state when empty during offline Oscar Salvador
2022-06-21  7:59   ` David Hildenbrand
2022-06-22  4:25     ` Oscar Salvador
2022-06-22  8:44       ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4dccc61d-d8fa-c2b2-d744-398b48ea8a89@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.