From: Oscar Salvador <osalvador@suse.de>
To: Muchun Song <songmuchun@bytedance.com>
Cc: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de,
mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
dave.hansen@linux.intel.com, luto@kernel.org,
peterz@infradead.org, viro@zeniv.linux.org.uk,
akpm@linux-foundation.org, paulmck@kernel.org,
mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com,
rdunlap@infradead.org, oneukum@suse.com,
anshuman.khandual@arm.com, jroedel@suse.de,
almasrymina@google.com, rientjes@google.com, willy@infradead.org,
mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com,
naoya.horiguchi@nec.com, duanxiongchun@bytedance.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page
Date: Tue, 26 Jan 2021 10:29:46 +0100 [thread overview]
Message-ID: <20210126092942.GA10602@linux> (raw)
In-Reply-To: <20210117151053.24600-6-songmuchun@bytedance.com>
On Sun, Jan 17, 2021 at 11:10:46PM +0800, Muchun Song wrote:
> When we free a HugeTLB page to the buddy allocator, we should allocate the
> vmemmap pages associated with it. We can do that in the __free_hugepage()
> before freeing it to buddy.
>
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
This series has grown a certain grade of madurity and improvment, but it seems
to me that we have been stuck in this patch (and patch#4) for quite some time.
Would it be acceptable for a first implementation to not let hugetlb pages to
be freed when this feature is in use?
This would simplify things for now, as we could get rid of patch#4 and patch#5.
We can always extend functionality once this has been merged, right?
Of course, this means that e.g: memory-hotplug (hot-remove) will not fully work
when this in place, but well.
I would like to hear what others think, but in my opinion it would be a big step
to move on.
> ---
> include/linux/mm.h | 2 ++
> mm/hugetlb.c | 2 ++
> mm/hugetlb_vmemmap.c | 15 ++++++++++
> mm/hugetlb_vmemmap.h | 5 ++++
> mm/sparse-vmemmap.c | 77 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> 5 files changed, 100 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f928994ed273..16b55d13b0ab 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3007,6 +3007,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip)
>
> void vmemmap_remap_free(unsigned long start, unsigned long end,
> unsigned long reuse);
> +void vmemmap_remap_alloc(unsigned long start, unsigned long end,
> + unsigned long reuse);
>
> void *sparse_buffer_alloc(unsigned long size);
> struct page * __populate_section_memmap(unsigned long pfn,
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index c165186ec2cf..d11c32fcdb38 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -1326,6 +1326,8 @@ static void update_hpage_vmemmap_workfn(struct work_struct *work)
> page->mapping = NULL;
> h = page_hstate(page);
>
> + alloc_huge_page_vmemmap(h, page);
> +
> spin_lock(&hugetlb_lock);
> __free_hugepage(h, page);
> spin_unlock(&hugetlb_lock);
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 19f1898aaede..6108ae80314f 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -183,6 +183,21 @@ static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h)
> return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT;
> }
>
> +void alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
> +{
> + unsigned long vmemmap_addr = (unsigned long)head;
> + unsigned long vmemmap_end, vmemmap_reuse;
> +
> + if (!free_vmemmap_pages_per_hpage(h))
> + return;
> +
> + vmemmap_addr += RESERVE_VMEMMAP_SIZE;
> + vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
> + vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
> +
> + vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse);
> +}
> +
> void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> {
> unsigned long vmemmap_addr = (unsigned long)head;
> diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h
> index 01f8637adbe0..b2c8d2f11d48 100644
> --- a/mm/hugetlb_vmemmap.h
> +++ b/mm/hugetlb_vmemmap.h
> @@ -11,6 +11,7 @@
> #include <linux/hugetlb.h>
>
> #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP
> +void alloc_huge_page_vmemmap(struct hstate *h, struct page *head);
> void free_huge_page_vmemmap(struct hstate *h, struct page *head);
>
> /*
> @@ -25,6 +26,10 @@ static inline unsigned int free_vmemmap_pages_per_hpage(struct hstate *h)
> return 0;
> }
> #else
> +static inline void alloc_huge_page_vmemmap(struct hstate *h, struct page *head)
> +{
> +}
> +
> static inline void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> {
> }
> diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
> index ce4be1fa93c2..3b146d5949f3 100644
> --- a/mm/sparse-vmemmap.c
> +++ b/mm/sparse-vmemmap.c
> @@ -29,6 +29,7 @@
> #include <linux/sched.h>
> #include <linux/pgtable.h>
> #include <linux/bootmem_info.h>
> +#include <linux/delay.h>
>
> #include <asm/dma.h>
> #include <asm/pgalloc.h>
> @@ -40,7 +41,8 @@
> * @remap_pte: called for each non-empty PTE (lowest-level) entry.
> * @reuse_page: the page which is reused for the tail vmemmap pages.
> * @reuse_addr: the virtual address of the @reuse_page page.
> - * @vmemmap_pages: the list head of the vmemmap pages that can be freed.
> + * @vmemmap_pages: the list head of the vmemmap pages that can be freed
> + * or is mapped from.
> */
> struct vmemmap_remap_walk {
> void (*remap_pte)(pte_t *pte, unsigned long addr,
> @@ -50,6 +52,10 @@ struct vmemmap_remap_walk {
> struct list_head *vmemmap_pages;
> };
>
> +/* The gfp mask of allocating vmemmap page */
> +#define GFP_VMEMMAP_PAGE \
> + (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN | __GFP_THISNODE)
> +
> static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> unsigned long end,
> struct vmemmap_remap_walk *walk)
> @@ -228,6 +234,75 @@ void vmemmap_remap_free(unsigned long start, unsigned long end,
> free_vmemmap_page_list(&vmemmap_pages);
> }
>
> +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr,
> + struct vmemmap_remap_walk *walk)
> +{
> + pgprot_t pgprot = PAGE_KERNEL;
> + struct page *page;
> + void *to;
> +
> + BUG_ON(pte_page(*pte) != walk->reuse_page);
> +
> + page = list_first_entry(walk->vmemmap_pages, struct page, lru);
> + list_del(&page->lru);
> + to = page_to_virt(page);
> + copy_page(to, (void *)walk->reuse_addr);
> +
> + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot));
> +}
> +
> +static void alloc_vmemmap_page_list(struct list_head *list,
> + unsigned long start, unsigned long end)
> +{
> + unsigned long addr;
> +
> + for (addr = start; addr < end; addr += PAGE_SIZE) {
> + struct page *page;
> + int nid = page_to_nid((const void *)addr);
> +
> +retry:
> + page = alloc_pages_node(nid, GFP_VMEMMAP_PAGE, 0);
> + if (unlikely(!page)) {
> + msleep(100);
> + /*
> + * We should retry infinitely, because we cannot
> + * handle allocation failures. Once we allocate
> + * vmemmap pages successfully, then we can free
> + * a HugeTLB page.
> + */
> + goto retry;
> + }
> + list_add_tail(&page->lru, list);
> + }
> +}
> +
> +/**
> + * vmemmap_remap_alloc - remap the vmemmap virtual address range [@start, end)
> + * to the page which is from the @vmemmap_pages
> + * respectively.
> + * @start: start address of the vmemmap virtual address range.
> + * @end: end address of the vmemmap virtual address range.
> + * @reuse: reuse address.
> + */
> +void vmemmap_remap_alloc(unsigned long start, unsigned long end,
> + unsigned long reuse)
> +{
> + LIST_HEAD(vmemmap_pages);
> + struct vmemmap_remap_walk walk = {
> + .remap_pte = vmemmap_restore_pte,
> + .reuse_addr = reuse,
> + .vmemmap_pages = &vmemmap_pages,
> + };
> +
> + might_sleep();
> +
> + /* See the comment in the vmemmap_remap_free(). */
> + BUG_ON(start - reuse != PAGE_SIZE);
> +
> + alloc_vmemmap_page_list(&vmemmap_pages, start, end);
> + vmemmap_remap_range(reuse, end, &walk);
> +}
> +
> /*
> * Allocate a block of memory to be used to back the virtual memory map
> * or to back the page tables that are used to create the mapping.
> --
> 2.11.0
>
>
--
Oscar Salvador
SUSE L3
next prev parent reply other threads:[~2021-01-26 15:29 UTC|newest]
Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-17 15:10 [PATCH v13 00/12] Free some vmemmap pages of HugeTLB page Muchun Song
2021-01-17 15:10 ` [PATCH v13 01/12] mm: memory_hotplug: factor out bootmem core functions to bootmem_info.c Muchun Song
2021-01-25 2:48 ` Miaohe Lin
2021-01-17 15:10 ` [PATCH v13 02/12] mm: hugetlb: introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2021-01-24 23:58 ` David Rientjes
2021-01-25 3:16 ` Randy Dunlap
2021-01-25 4:06 ` [External] " Muchun Song
2021-01-25 4:08 ` Randy Dunlap
2021-01-25 5:06 ` Muchun Song
2021-01-25 18:47 ` David Rientjes
2021-01-26 2:45 ` Muchun Song
2021-01-26 20:13 ` David Rientjes
2021-01-26 2:07 ` Miaohe Lin
2021-01-17 15:10 ` [PATCH v13 03/12] mm: hugetlb: free the vmemmap pages associated with each HugeTLB page Muchun Song
2021-01-17 15:29 ` Muchun Song
2021-01-23 0:59 ` Mike Kravetz
2021-01-23 3:22 ` [External] " Muchun Song
2021-01-23 17:52 ` Oscar Salvador
2021-01-24 6:48 ` [External] " Muchun Song
2021-01-17 15:10 ` [PATCH v13 04/12] mm: hugetlb: defer freeing of HugeTLB pages Muchun Song
2021-01-24 23:55 ` David Rientjes
2021-01-25 3:58 ` [External] " Muchun Song
2021-01-17 15:10 ` [PATCH v13 05/12] mm: hugetlb: allocate the vmemmap pages associated with each HugeTLB page Muchun Song
2021-01-25 0:05 ` David Rientjes
2021-01-25 6:40 ` [External] " Muchun Song
2021-01-25 7:41 ` Muchun Song
2021-01-25 9:15 ` David Hildenbrand
2021-01-25 9:34 ` Muchun Song
2021-01-25 23:25 ` Mike Kravetz
2021-01-26 7:48 ` Oscar Salvador
2021-01-26 9:29 ` Oscar Salvador [this message]
2021-01-26 9:36 ` David Hildenbrand
2021-01-26 14:58 ` Oscar Salvador
2021-01-26 15:10 ` David Hildenbrand
2021-01-26 15:34 ` Oscar Salvador
2021-01-26 15:56 ` David Hildenbrand
2021-01-27 10:36 ` David Hildenbrand
2021-01-28 12:37 ` [External] " Muchun Song
2021-01-28 13:08 ` Muchun Song
2021-01-29 1:04 ` Mike Kravetz
2021-01-29 6:56 ` Muchun Song
2021-02-01 16:10 ` David Hildenbrand
2021-02-02 0:05 ` Mike Kravetz
2021-01-28 22:29 ` Oscar Salvador
2021-01-29 6:16 ` [External] " Muchun Song
2021-02-01 15:50 ` David Hildenbrand
2021-01-17 15:10 ` [PATCH v13 06/12] mm: hugetlb: set the PageHWPoison to the raw error page Muchun Song
2021-01-25 0:06 ` David Rientjes
2021-01-25 5:06 ` [External] " Muchun Song
2021-01-17 15:10 ` [PATCH v13 07/12] mm: hugetlb: flush work when dissolving a HugeTLB page Muchun Song
2021-01-17 15:10 ` [PATCH v13 08/12] mm: hugetlb: introduce PageHugeInflight Muchun Song
2021-01-17 15:10 ` [PATCH v13 09/12] mm: hugetlb: add a kernel parameter hugetlb_free_vmemmap Muchun Song
2021-01-25 11:43 ` David Hildenbrand
2021-01-25 12:08 ` Oscar Salvador
2021-01-25 12:31 ` [External] " Muchun Song
2021-01-25 12:30 ` Muchun Song
2021-01-17 15:10 ` [PATCH v13 10/12] mm: hugetlb: introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2021-01-17 15:10 ` [PATCH v13 11/12] mm: hugetlb: gather discrete indexes of tail page Muchun Song
2021-01-17 15:10 ` [PATCH v13 12/12] mm: hugetlb: optimize the code with the help of the compiler Muchun Song
2021-01-20 12:52 ` [PATCH v13 00/12] Free some vmemmap pages of HugeTLB page Muchun Song
2021-01-20 13:10 ` Oscar Salvador
2021-01-20 14:22 ` [External] " Muchun Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210126092942.GA10602@linux \
--to=osalvador@suse.de \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=anshuman.khandual@arm.com \
--cc=bp@alien8.de \
--cc=corbet@lwn.net \
--cc=dave.hansen@linux.intel.com \
--cc=david@redhat.com \
--cc=duanxiongchun@bytedance.com \
--cc=hpa@zytor.com \
--cc=jroedel@suse.de \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mchehab+huawei@kernel.org \
--cc=mhocko@suse.com \
--cc=mike.kravetz@oracle.com \
--cc=mingo@redhat.com \
--cc=naoya.horiguchi@nec.com \
--cc=oneukum@suse.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=rdunlap@infradead.org \
--cc=rientjes@google.com \
--cc=song.bao.hua@hisilicon.com \
--cc=songmuchun@bytedance.com \
--cc=tglx@linutronix.de \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.