Re: [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: mhocko@kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, vbabka@suse.cz, pasha.tatashin@soleen.com
Subject: Re: [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Wed, 9 Dec 2020 11:32:26 +0100	[thread overview]
Message-ID: <20201209103226.GC30892@linux> (raw)
In-Reply-To: <0e53ae49-6b0a-0714-df0f-fd6183061752@redhat.com>

On Wed, Dec 02, 2020 at 11:05:53AM +0100, David Hildenbrand wrote:
> If you take a look at generic_online_page() there are some things that
> won't be done for our vmemmap pages
> 
> 1. kernel_map_pages(page, 1 << order, 1);
> 
> We're accessing these pages already when initializing the memmap. We
> might have to explicitly map these vmemmap pages at some point. Might
> require some thought. Did you test with debug pagealloc?

I always try to run with all debug stuff enabled, but I definitely
did not enable debug_pagealloc.
I will have a look at it.

> 2. totalram_pages_add(1UL << order);
> 
> We should add/remove the vmemmap pages manually from totalram I guess.

Yes, we should. That was a clear oversight.

> Also, I wonder if we should add these pages to the zone managed page
> count. Not quite sure yet what the benefit would be and if we care at
> all. (it's a change in behavior when hotplugging memory, though, that's
> why I am poining it out)

I would rather not since they are not "manageable" per se.
The same we do not account the pages used for bootmem stuff into zone->managed.

> > Hot-remove:
> > 
> >  If the range was using memmap on memory (aka vmemmap pages),
> >  we construct an altmap structure so free_hugepage_table does
> >  the right thing and calls vmem_altmap_free instead of
> >  free_pagetable.
> > 
> 
> You might want to update the comment in include/linux/mmzone.h for
> ZONE_MOVABLE, adding this special case when onlining such memory to the
> MOVABLE zone. The pages are movable for offlining only, not for allocations.

Sure, let us document that.

> > -int create_memory_block_devices(unsigned long start, unsigned long size)
> > +int create_memory_block_devices(unsigned long start, unsigned long size,
> > +				unsigned long vmemmap_pages)
> >  {
> >  	const unsigned long start_block_id = pfn_to_block_id(PFN_DOWN(start));
> >  	unsigned long end_block_id = pfn_to_block_id(PFN_DOWN(start + size));
> > @@ -645,9 +649,10 @@ int create_memory_block_devices(unsigned long start, unsigned long size)
> >  		return -EINVAL;
> >  
> >  	for (block_id = start_block_id; block_id != end_block_id; block_id++) {
> > -		ret = init_memory_block(block_id, MEM_OFFLINE);
> > +		ret = init_memory_block(block_id, MEM_OFFLINE, vmemmap_pages);
> >  		if (ret)
> >  			break;
> > +		vmemmap_pages = 0;
> 
> We should even bail out if someone would have vmemmap_pages set when
> creating more than one memory block device.

That cannot really happen as we would have bailed out earlier prior to
call create_memory_block_devices, by the checks in mhp_supports_memmap_on_memory.

> > -	for (pfn = start_pfn; pfn < end_pfn; pfn += MAX_ORDER_NR_PAGES)
> > -		(*online_page_callback)(pfn_to_page(pfn), MAX_ORDER - 1);
> > +	for (pfn = buddy_start_pfn; pfn < end_pfn; pfn += (1 << order)) {
> > +		order = MAX_ORDER - 1;
> > +		while (pfn & ((1 << order) - 1))
> > +			order--;
> > +		(*online_page_callback)(pfn_to_page(pfn), order);
> > +	}
> 
> I think it would be nicer to handle the unaligned part in a separate loop.

No strong opinion here, I can certainly do it.

> >  	/* mark all involved sections as online */
> >  	online_mem_sections(start_pfn, end_pfn);
> > @@ -678,6 +684,7 @@ static void __meminit resize_pgdat_range(struct pglist_data *pgdat, unsigned lon
> >  	pgdat->node_spanned_pages = max(start_pfn + nr_pages, old_end_pfn) - pgdat->node_start_pfn;
> >  
> >  }
> > +
> 
> Unrelated change.

Ups ^^

> > +static int get_memblock_vmemmap_pages_cb(struct memory_block *mem, void *arg)
> > +{
> > +	unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
> > +	int ret = !mem->nr_vmemmap_pages;
> > +
> > +	if (!ret)
> > +		*nr_vmemmap_pages += mem->nr_vmemmap_pages;
> > +	return ret;
> 
> When removing in the same granularity than adding, all we care about it
> the first memory block.
> 
> 
> unsigned long *nr_vmemmap_pages = (unsigned long *)arg;
> 
> *nr_vmemmap_pages = mem->nr_vmemmap_pages;
> /* We only care about the first memory block. */
> return 1;

Yeah, cleaner, thanks

> > +	/*
> > +	 * Prepare a vmem_altmap struct if we used it at hot-add, so
> > +	 * remove_pmd_table->free_hugepage_table does the right thing.
> > +	 */
> > +	(void)walk_memory_blocks(start, size, &nr_vmemmap_pages,
> > +	     get_memblock_vmemmap_pages_cb);
> > +	if (nr_vmemmap_pages) {
> > +		mhp_altmap.alloc = nr_vmemmap_pages;
> > +		altmap = &mhp_altmap;
> > +	}
> > +
> 
> If someone would remove_memory() in a different granularity than
> add_memory(), this would no longer work. How can we catch that
> efficiently? Or at least document that this is not supported with
> memmap_on_memory).

Well, we can check whether the size spans more than a single memory block.
And if it does, we can print a warning with pr_warn and refuse to remove
the memory.

We could document that "memory_hotplug.memmap_on_memory" is meant to operate
with ranges that span a single memory block.
And I guess that we could place that documentation under [1].

[1] Documentation/admin-guide/kernel-parameters.txt

Thanks for the review David

-- 
Oscar Salvador
SUSE L3

next prev parent reply	other threads:[~2020-12-09 10:32 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-01 11:51 [RFC PATCH v3 0/4] Allocate memmap from hotadded memory (per device) Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 1/4] mm,memory_hotplug: Add mhp_supports_memmap_on_memory Oscar Salvador
2020-12-02  9:36   ` David Hildenbrand
2020-12-09  9:36     ` Oscar Salvador
2020-12-09  9:40       ` David Hildenbrand
2020-12-09  9:43         ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 2/4] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2020-12-02 10:05   ` David Hildenbrand
2020-12-09 10:32     ` Oscar Salvador [this message]
2020-12-09 11:51       ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 3/4] mm,memory_hotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2020-12-02  9:37   ` David Hildenbrand
2020-12-09  9:38     ` Oscar Salvador
2020-12-01 11:51 ` [RFC PATCH v3 4/4] mm,memory_hotplug: Add mhp_memmap_on_memory boot option Oscar Salvador
2020-12-02  9:42   ` David Hildenbrand
2020-12-09 10:02     ` Oscar Salvador
2020-12-09 10:04       ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201209103226.GC30892@linux \
    --to=osalvador@suse.de \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.