All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, mhocko@kernel.org,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org, vbabka@suse.cz,
	pasha.tatashin@soleen.com
Subject: Re: [PATCH 2/5] mm,memory_hotplug: Allocate memmap from the added memory range
Date: Mon, 25 Jan 2021 11:56:02 +0100	[thread overview]
Message-ID: <20210125105557.GA28363@linux> (raw)
In-Reply-To: <20210125103951.GA27851@linux>

On Mon, Jan 25, 2021 at 11:39:55AM +0100, Oscar Salvador wrote:
> > Interresting, so we automatically support differeing sizeof(struct
> > page). I guess it will be problematic in case of sizeof(struct page) !=
> > 64, because then, we might not have multiples of 2MB for the memmap of a
> > memory block.
> > 
> > IIRC, it could happen that if we add two consecutive memory blocks, that
> > the second one might reuse parts of the vmemmap residing on the first
> > memory block. If you remove the first one, you might be in trouble.
> > 
> > E.g., on x86-64
> >  vmemmap_populate()->vmemmap_populate_hugepages()->vmemmap_alloc_block_buf():
> > - Populate a huge page
> > 
> > vmemmap_free()->remove_pagetable()...->remove_pmd_table():
> > - memchr_inv() will leave the hugepage populated.
> > 
> > Do we want to fence that off, maybe in mhp_supports_memmap_on_memory()?
> > Or do we somehow want to fix that? We should never populate partial huge
> > pages from an altmap ...
> > 
> > But maybe I am missing something.
> 
> No, you are not missing anything.
> 
> I think that remove_pmd_table() should be fixed.
> vmemmap_populate_hugepages() allocates PMD_SIZE chunk and marks the PMD as
> PAGE_KERNEL_LARGE, so when remove_pmd_table() sees that 1) the PMD
> is large and 2) the chunk is not aligned, the memset() should be writing
> PAGE_INUSE for a PMD chunk.
> 
> I do not really a see a reason why this should not happen.
> Actually, we will be leaving pagetables behind as we never get to free
> the PMD chunk when sizeof(struct page) is not a multiple of 2MB.
> 
> I plan to fix remove_pmd_table(), but for now let us not allow to use this feature
> if the size of a struct page is not 64.
> Let us keep it simply for the time being, shall we?

My first intention was:

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index b5a3fa4033d3..0c9756a2eb24 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1044,32 +1044,14 @@ remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
 			continue;
 
 		if (pmd_large(*pmd)) {
-			if (IS_ALIGNED(addr, PMD_SIZE) &&
-			    IS_ALIGNED(next, PMD_SIZE)) {
-				if (!direct)
-					free_hugepage_table(pmd_page(*pmd),
-							    altmap);
-
-				spin_lock(&init_mm.page_table_lock);
-				pmd_clear(pmd);
-				spin_unlock(&init_mm.page_table_lock);
-				pages++;
-			} else {
-				/* If here, we are freeing vmemmap pages. */
-				memset((void *)addr, PAGE_INUSE, next - addr);
-
-				page_addr = page_address(pmd_page(*pmd));
-				if (!memchr_inv(page_addr, PAGE_INUSE,
-						PMD_SIZE)) {
-					free_hugepage_table(pmd_page(*pmd),
-							    altmap);
-
-					spin_lock(&init_mm.page_table_lock);
-					pmd_clear(pmd);
-					spin_unlock(&init_mm.page_table_lock);
-				}
-			}
+			if (!direct)
+				free_hugepage_table(pmd_page(*pmd),
+						    altmap);
 
+			spin_lock(&init_mm.page_table_lock);
+			pmd_clear(pmd);
+			spin_unlock(&init_mm.page_table_lock);
+			pages++;
 			continue;
 		}

I did not try it out yet and this might explode badly, but AFAICS, a PMD size
chunk is always allocated even when the section does not spand 2MB.
E.g: sizeof(struct page) = 56.

PAGES_PER_SECTION * 64 = 2097152
PAGES_PER_SECTION * 56 = 1835008

Even in the latter case, vmemmap_populate_hugepages will allocate a PMD size chunk
to satisfy the unaligned range.
So, unless I am missing something, it should not be a problem to free that 2MB in
remove_pmd_table when 1) the PMD is large and 2) the range is not aligned.


-- 
Oscar Salvador
SUSE L3


  reply	other threads:[~2021-01-25 10:56 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 13:07 [PATCH 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador
2020-12-17 13:07 ` [PATCH 1/5] mm: Introduce ARCH_MHP_MEMMAP_ON_MEMORY_ENABLE Oscar Salvador
2021-01-11 16:52   ` David Hildenbrand
2021-01-12  7:26     ` Oscar Salvador
2021-01-12 10:12       ` David Hildenbrand
2021-01-12 11:17         ` Oscar Salvador
2021-01-12 11:17           ` David Hildenbrand
2020-12-17 13:07 ` [PATCH 2/5] mm,memory_hotplug: Allocate memmap from the added memory range Oscar Salvador
2021-01-18 10:11   ` Oscar Salvador
2021-01-19 13:58   ` David Hildenbrand
2021-01-25 10:39     ` Oscar Salvador
2021-01-25 10:56       ` Oscar Salvador [this message]
2021-01-25 11:02         ` David Hildenbrand
2021-01-25 13:36           ` Oscar Salvador
2021-01-25 10:57       ` David Hildenbrand
2021-01-25 11:18         ` Oscar Salvador
2021-01-25 11:23           ` David Hildenbrand
2020-12-17 13:07 ` [PATCH 3/5] acpi,memhotplug: Enable MHP_MEMMAP_ON_MEMORY when supported Oscar Salvador
2021-01-11 16:53   ` David Hildenbrand
2020-12-17 13:07 ` [PATCH 4/5] powerpc/memhotplug: " Oscar Salvador
2021-01-11 16:55   ` David Hildenbrand
2021-01-12 13:31     ` Oscar Salvador
2020-12-17 13:07 ` [PATCH 5/5] mm,memory_hotplug: Add kernel boot option to enable memmap_on_memory Oscar Salvador
2021-01-11  9:05 ` [PATCH 0/5] Allocate memmap from hotadded memory (per device) Oscar Salvador

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210125105557.GA28363@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.