Re: [PATCH v2] x86/vmemmap: Handle unpopulated sub-pmd ranges

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H . Peter Anvin" <hpa@zytor.com>,
	Michal Hocko <mhocko@kernel.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] x86/vmemmap: Handle unpopulated sub-pmd ranges
Date: Tue, 2 Feb 2021 08:52:48 +0100	[thread overview]
Message-ID: <20210202075243.GA7037@linux> (raw)
In-Reply-To: <b9a2f80e-a90f-62bf-4197-66cdb315cb84@redhat.com>

On Fri, Jan 29, 2021 at 01:46:33PM +0100, David Hildenbrand wrote:
> >   static void __meminit free_pagetable(struct page *page, int order)
> >   {
> > @@ -1008,10 +1073,10 @@ remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
> >   			 * with 0xFD, and remove the page when it is wholly
> >   			 * filled with 0xFD.
> >   			 */
> > -			memset((void *)addr, PAGE_INUSE, next - addr);
> > +			memset((void *)addr, PAGE_UNUSED, next - addr);
> >   			page_addr = page_address(pte_page(*pte));
> > -			if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
> > +			if (!memchr_inv(page_addr, PAGE_UNUSED, PAGE_SIZE)) {
> >   				free_pagetable(pte_page(*pte), 0);
> 
> I remember already raising this, in the context of other cleanups, but let's
> start anew:
> 
> How could we ever even end up in "!PAGE_ALIGNED(addr) &&
> PAGE_ALIGNED(next)"? As the comment correctly indicates, it would only make
> sense for "freeing vmemmap pages".
> 
> This would mean we are removing parts of a vmemmap page (4k), calling
> vmemmap_free()->remove_pagetable() on sub-page granularity.
> 
> Even sub-sections (2MB - 512 pages) have a memmap size with base pages:
> - 56 bytes: 7 pages
> - 64 bytes: 8 pages
> - 72 bytes: 9 pages
> 
> sizeof(struct page) is always multiples of 8 bytes, so that will hold.
> 
> E.g., in __populate_section_memmap(), we already enforce proper subsection
> alignment.
> 
> IMHO, we should rip out that code here and enforce page alignment in
> vmemmap_populate()/vmemmap_free().
> 
> Am I missing something?

Thanks David for bringing this up, I must say I was not aware that this
topic was ever discussed.

Ok, I've been having a look into this.
At first I was concerced because of a pure SPARSEMEM configuration, but I
see that those allocations are done in a very diferent way so it does not
bother us.

So we have the following enforcements during hotplug:

add_memory_resource
 check_hotplug_memory_range : Checks range aligned to memory_block_size_bytes,
                            : which means it must be section-size aligned

populate_section_memmap
 __populate_section_memmap  : Checks range aligned to sub-section size

So, IIRC we have two cases during hotplug:
 1) the ones that want memory blocks
 2) the ones that do not want them (pmem stuff)

For #1, we always enforce section alignment in add_memory_resource, and for
#2 we always make sure the range is at least sub-section aligned.

And the important stuff is that boot memory is no longer to be hot-removed
(boot memory had some strange layout sometimes).

So, given the above, I think it should be safe to drop that check in
remote_pte_table.
But do we really need to force page alignment in vmemmap_populate/vmemmap_free?
vmemmap_populate should already receive a page-aligned chunk because 
 __populate_section_memmap made sure of that, and vmemmap_free() should be ok
as we already filtered out at hot-adding stage.

Of course, this will hold as long as struct page size of multiple of 8.
Should that change we might get trouble, but I do not think that can ever
happened (tm).

But anyway, I am fine with placing a couple of checks in vmemmap_{populate,free}
just to double check.

What do you think?

-- 
Oscar Salvador
SUSE L3

next prev parent reply	other threads:[~2021-02-02  7:53 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-29  6:40 [PATCH v2] x86/vmemmap: Handle unpopulated sub-pmd ranges Oscar Salvador
     [not found] ` <b9a2f80e-a90f-62bf-4197-66cdb315cb84@redhat.com>
2021-02-02  7:52   ` Oscar Salvador [this message]
2021-02-02  8:35     ` David Hildenbrand
2021-02-02  8:51       ` Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210202075243.GA7037@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox