From: Oscar Salvador <osalvador@suse.de>
To: Michal Hocko <mhocko@suse.com>
Cc: david@redhat.com, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com
Subject: Re: [RFC PATCH v2 2/4] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Mon, 30 Nov 2020 10:12:37 +0100 [thread overview]
Message-ID: <20201130091236.GB3825@linux> (raw)
In-Reply-To: <20201127151536.GV31550@dhcp22.suse.cz>
On Fri, Nov 27, 2020 at 04:15:36PM +0100, Michal Hocko wrote:
> > Vmemap page tables can map arbitrary memory.
> > That means that we can simply use the beginning of each memory section and
> > map struct pages there.
>
> Did you mean each memory block rather than section?
Yes, sorry, I did not update that part.
> > struct pages which back the allocated space then just need to be treated
> > carefully.
> >
> > Implementation wise we will reuse vmem_altmap infrastructure to override
> > the default allocator used by __populate_section_memmap. Once the memmap is
> > allocated, we are going to need a way to mark altmap pfns used for the allocation.
> > If MHP_MEMMAP_ON_MEMORY flag was passed, we will set up the layout of the
> > altmap structure in add_memory_resouce(), and then we will call
> > mhp_mark_vmemmap_pages() to properly mark those pages.
> >
> > Online/Offline:
> >
> > In the memory_block structure, a new field is created in order to
> > store the number of vmemmap_pages.
>
> Is this really needed? We know how many pfns are required for a block of
> a specific size, right?
>
> I have only glanced through the patch so I might be missing something
> but I am really wondering why you haven't chosen to use altmap directly
> here.
Well, this is my bad, I did not update the changelog wrt. to the previous version
so it might be confusing.
I will make sure to update it for the next submission, but let me explain it
here to shed some light.
We no longer need to use mhp_mark_vmemap_pages to mark vmemmap pages pages.
Prior to online_pages(), the whole range is offline, so no one should be
messing with any pages within that range.
The initialization of the pages takes places in online_pages():
We have:
start_pfn = first_pfn_of_the_range
buddy_start_pfn = first_pfn_of_the_range + nr_vmemmap_pages
We do have:
+ if (nr_vmemmap_pages)
+ move_pfn_range_to_zone(zone, pfn, nr_vmemmap_pages, NULL,
+ MIGRATE_UNMOVABLE);
+ move_pfn_range_to_zone(zone, buddy_start_pfn, buddy_nr_pages, NULL,
+ MIGRATE_ISOLATE);
Now, all the range is initialized and marked PageReserved, but we only send
to buddy (buddy_start_pfn, end_pfn].
And so, (start_pfn, buddy_start_pfn - 1] reamins PageReserved.
And we know that pfn walkers to skip Reserved pages.
About the altmap part.
Altmap is used in the hot-add phase in add_memory_resource.
The thing is, we could avoid adding the memory_block's nr_vmemmap_pages field
, but we would have to mark the vmemmap pages as we used to do in previous
implementations (see [1]), but I find this way cleaner, and it adds much less
code (previous implementions can be see in [2]), and as a starter I find it
much better.
> It would be also good to describe how does a pfn walker recognize such a
> page? Most of them will simply ignore it but e.g. hotplug walker will
> need to skip over those because they are not preventing offlining as
> they will go away with the memory block together.
Wrt. hotplug walker, same as above, we only care to migrate the
(buddy_start_pfn, end_pfn], so the first pfn to migrate and isolate is set to
buddy_start_pfn.
Other pfns walkers should merely skip vmemmap pages because they are Reserved.
> Some basic description of testing done would be suitable as well.
Well, that is:
- Hot-add memory to a specific numa node
- Online memory
- numactl -H and /proc/zoneinfo reflects the truth and nr_vmemmap_pages are
extracted where they have to be
- Start a memory stress program and bind it to the numa node we added memory
so we make sure it gets exercised
- Wait for a while and when node's free pages have decreased considerably,
offline memory
- Check that memory went offline and check /proc/zoneinfo and numactl -H
again
- Hot-remove range
[1] https://patchwork.kernel.org/project/linux-mm/patch/20201022125835.26396-3-osalvador@suse.de/
[2] https://patchwork.kernel.org/project/linux-mm/cover/20201022125835.26396-1-osalvador@suse.de/
--
Oscar Salvador
SUSE L3
next prev parent reply other threads:[~2020-11-30 9:12 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-11-25 11:20 [RFC PATCH v2 0/4] Allocate memmap from hotadded memory (per device) Oscar Salvador
2020-11-25 11:20 ` [RFC PATCH v2 1/4] mm,memory_hotplug: Introduce MHP_MEMMAP_ON_MEMORY Oscar Salvador
2020-11-27 14:59 ` Michal Hocko
2020-11-25 11:20 ` [RFC PATCH v2 2/4] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2020-11-27 15:15 ` Michal Hocko
2020-11-30 9:12 ` Oscar Salvador [this message]
2020-11-25 11:20 ` [RFC PATCH v2 3/4] mm,memory_hotplug: Add mhp_supports_memmap_on_memory Oscar Salvador
2020-11-27 15:02 ` Michal Hocko
2020-11-30 8:50 ` Oscar Salvador
2020-11-25 11:20 ` [RFC PATCH v2 4/4] mm,memory_hotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2020-11-27 11:55 ` Oscar Salvador
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201130091236.GB3825@linux \
--to=osalvador@suse.de \
--cc=david@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=pasha.tatashin@soleen.com \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).