All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Muchun Song <songmuchun@bytedance.com>
Cc: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de,
	mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com,
	dave.hansen@linux.intel.com, luto@kernel.org,
	peterz@infradead.org, viro@zeniv.linux.org.uk,
	akpm@linux-foundation.org, paulmck@kernel.org,
	mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com,
	rdunlap@infradead.org, oneukum@suse.com,
	anshuman.khandual@arm.com, jroedel@suse.de,
	almasrymina@google.com, rientjes@google.com, willy@infradead.org,
	mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com,
	naoya.horiguchi@nec.com, duanxiongchun@bytedance.com,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page
Date: Mon, 21 Dec 2020 10:11:23 +0100	[thread overview]
Message-ID: <20201221091123.GB14343@linux> (raw)
In-Reply-To: <20201217121303.13386-4-songmuchun@bytedance.com>

On Thu, Dec 17, 2020 at 08:12:55PM +0800, Muchun Song wrote:
> +static inline void free_bootmem_page(struct page *page)
> +{
> +	unsigned long magic = (unsigned long)page->freelist;
> +
> +	/*
> +	 * The reserve_bootmem_region sets the reserved flag on bootmem
> +	 * pages.
> +	 */
> +	VM_WARN_ON(page_ref_count(page) != 2);
> +
> +	if (magic == SECTION_INFO || magic == MIX_SECTION_INFO)
> +		put_page_bootmem(page);
> +	else
> +		VM_WARN_ON(1);

Ideally, I think we want to see what how the page looks since its state
is not what we expected, so maybe join both conditions and use dump_page().

> + * By removing redundant page structs for HugeTLB pages, memory can returned to
                                                                     ^^ be
> + * the buddy allocator for other uses.

[...]

> +void free_huge_page_vmemmap(struct hstate *h, struct page *head)
> +{
> +	unsigned long vmemmap_addr = (unsigned long)head;
> +
> +	if (!free_vmemmap_pages_per_hpage(h))
> +		return;
> +
> +	vmemmap_remap_free(vmemmap_addr + RESERVE_VMEMMAP_SIZE,
> +			   free_vmemmap_pages_size_per_hpage(h));

I am not sure what others think, but I would like to see vmemmap_remap_free taking
three arguments: start, end, and reuse addr, e.g:

 void free_huge_page_vmemmap(struct hstate *h, struct page *head)
 {
      unsigned long vmemmap_addr = (unsigned long)head;
      unsigned long vmemmap_end, vmemmap_reuse;
      
      if (!free_vmemmap_pages_per_hpage(h))
              return;

      vmemmap_addr += RESERVE_MEMMAP_SIZE;
      vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h);
      vmemmap_reuse = vmemmap_addr - PAGE_SIZE;
 
      vmemmap_remap_free(vmemmap_addr, vmemmap_end, vmemmap_reuse);
 }

The reason for me to do this is to let the callers of vmemmap_remap_free decide
__what__ they want to remap.

More on this below.


> +static void vmemmap_pte_range(pmd_t *pmd, unsigned long addr,
> +			      unsigned long end,
> +			      struct vmemmap_remap_walk *walk)
> +{
> +	pte_t *pte;
> +
> +	pte = pte_offset_kernel(pmd, addr);
> +
> +	if (walk->reuse_addr == addr) {
> +		BUG_ON(pte_none(*pte));
> +		walk->reuse_page = pte_page(*pte++);
> +		addr += PAGE_SIZE;
> +	}

Although it is quite obvious, a brief comment here pointing out what are we
doing and that this is meant to be set only once would be nice.


> +static void vmemmap_remap_range(unsigned long start, unsigned long end,
> +				struct vmemmap_remap_walk *walk)
> +{
> +	unsigned long addr = start - PAGE_SIZE;
> +	unsigned long next;
> +	pgd_t *pgd;
> +
> +	VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE));
> +	VM_BUG_ON(!IS_ALIGNED(end, PAGE_SIZE));
> +
> +	walk->reuse_page = NULL;
> +	walk->reuse_addr = addr;

With the change I suggested above, struct vmemmap_remap_walk should be
initialitzed at once in vmemmap_remap_free, so this should not longer be needed.
(And btw, you do not need to set reuse_page to NULL, the way you init the struct
in vmemmap_remap_free makes sure to null any field you do not explicitly set).


> +static void vmemmap_remap_pte(pte_t *pte, unsigned long addr,
> +			      struct vmemmap_remap_walk *walk)
> +{
> +	/*
> +	 * Make the tail pages are mapped with read-only to catch
> +	 * illegal write operation to the tail pages.
        "Remap the tail pages as read-only to ..."

> +	 */
> +	pgprot_t pgprot = PAGE_KERNEL_RO;
> +	pte_t entry = mk_pte(walk->reuse_page, pgprot);
> +	struct page *page;
> +
> +	page = pte_page(*pte);

 struct page *page = pte_page(*pte);

since you did the same for the other two.

> +	list_add(&page->lru, walk->vmemmap_pages);
> +
> +	set_pte_at(&init_mm, addr, pte, entry);
> +}
> +
> +/**
> + * vmemmap_remap_free - remap the vmemmap virtual address range
> + *                      [start, start + size) to the page which
> + *                      [start - PAGE_SIZE, start) is mapped,
> + *                      then free vmemmap pages.
> + * @start:	start address of the vmemmap virtual address range
> + * @size:	size of the vmemmap virtual address range
> + */
> +void vmemmap_remap_free(unsigned long start, unsigned long size)
> +{
> +	unsigned long end = start + size;
> +	LIST_HEAD(vmemmap_pages);
> +
> +	struct vmemmap_remap_walk walk = {
> +		.remap_pte	= vmemmap_remap_pte,
> +		.vmemmap_pages	= &vmemmap_pages,
> +	};

As stated above, this would become:

 void vmemmap_remap_free(unsigned long start, unsigned long end,
                         usigned long reuse)
 {
       LIST_HEAD(vmemmap_pages);
       struct vmemmap_remap_walk walk = {
               .reuse_addr = reuse,
               .remap_pte = vmemmap_remap_pte,
               .vmemmap_pages = &vmemmap_pages,
       };

  You might have had your reasons to do this way, but this looks more natural
  to me, with the plus that callers of vmemmap_remap_free can specify
  what they want to remap.


-- 
Oscar Salvador
SUSE L3

  reply	other threads:[~2020-12-21  9:48 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-17 12:12 [PATCH v10 00/11] Free some vmemmap pages of HugeTLB page Muchun Song
2020-12-17 12:12 ` [PATCH v10 01/11] mm/memory_hotplug: Factor out bootmem core functions to bootmem_info.c Muchun Song
2020-12-17 12:12 ` [PATCH v10 02/11] mm/hugetlb: Introduce a new config HUGETLB_PAGE_FREE_VMEMMAP Muchun Song
2020-12-18 15:41   ` Oscar Salvador
2020-12-21 23:56   ` Mike Kravetz
2020-12-17 12:12 ` [PATCH v10 03/11] mm/hugetlb: Free the vmemmap pages associated with each HugeTLB page Muchun Song
2020-12-21  9:11   ` Oscar Salvador [this message]
2020-12-21 11:25     ` [External] " Muchun Song
2020-12-21 13:43       ` Oscar Salvador
2020-12-21 15:52         ` Muchun Song
2020-12-21 18:00           ` Oscar Salvador
2020-12-22  1:03             ` Mike Kravetz
2020-12-22  2:49             ` Muchun Song
2020-12-17 12:12 ` [PATCH v10 04/11] mm/hugetlb: Defer freeing of HugeTLB pages Muchun Song
2020-12-21 10:27   ` Oscar Salvador
2020-12-21 11:07     ` [External] " Muchun Song
2020-12-21 14:14       ` Oscar Salvador
2020-12-21 15:18         ` Muchun Song
2020-12-17 12:12 ` [PATCH v10 05/11] mm/hugetlb: Allocate the vmemmap pages associated with each HugeTLB page Muchun Song
2020-12-17 12:12 ` [PATCH v10 06/11] mm/hugetlb: Set the PageHWPoison to the raw error page Muchun Song
2020-12-17 12:12 ` [PATCH v10 07/11] mm/hugetlb: Flush work when dissolving hugetlb page Muchun Song
2020-12-21 10:40   ` Oscar Salvador
2020-12-21 11:07     ` [External] " Muchun Song
2020-12-17 12:13 ` [PATCH v10 08/11] mm/hugetlb: Add a kernel parameter hugetlb_free_vmemmap Muchun Song
2020-12-17 12:13 ` [PATCH v10 09/11] mm/hugetlb: Introduce nr_free_vmemmap_pages in the struct hstate Muchun Song
2020-12-21  8:16   ` Oscar Salvador
2020-12-21  9:33     ` [External] " Muchun Song
2020-12-17 12:13 ` [PATCH v10 10/11] mm/hugetlb: Gather discrete indexes of tail page Muchun Song
2020-12-18  9:06   ` Oscar Salvador
2020-12-18  9:41     ` [External] " Muchun Song
2020-12-17 12:13 ` [PATCH v10 11/11] mm/hugetlb: Optimize the code with the help of the compiler Muchun Song
2020-12-17 12:17 ` [PATCH v10 00/11] Free some vmemmap pages of HugeTLB page David Hildenbrand
2020-12-17 14:59 ` Oscar Salvador
2020-12-17 15:52   ` [External] " Muchun Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201221091123.GB14343@linux \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=duanxiongchun@bytedance.com \
    --cc=hpa@zytor.com \
    --cc=jroedel@suse.de \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mchehab+huawei@kernel.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=naoya.horiguchi@nec.com \
    --cc=oneukum@suse.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=song.bao.hua@hisilicon.com \
    --cc=songmuchun@bytedance.com \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.