qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] initialize PG_reserved for tail pages of gigantig compound pages
@ 2013-10-10 16:12 Andrea Arcangeli
  2013-10-10 16:12 ` [Qemu-devel] [PATCH] mm: hugetlb: " Andrea Arcangeli
  0 siblings, 1 reply; 4+ messages in thread
From: Andrea Arcangeli @ 2013-10-10 16:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kvm, Gleb Natapov, Hugh Dickins, qemu-devel, linux-kernel,
	linux-mm, Mel Gorman

Hi,

large CC list because the below patch is important to merge before
3.12 final, either that or 11feeb498086a3a5907b8148bdf1786a9b18fc55
should be reverted ASAP.

The optimization 11feeb498086a3a5907b8148bdf1786a9b18fc55 avoids
deferefencing the head page during KVM mmio vmexit, and it is a
worthwhile optimization.

However for it to work, PG_reserved must be identical between tail and
head pages of all compound pages (at least those that can end up used
as guest physical memory). That looked a safe assumption to make and
it is enforced everywhere except by the gigantic compound page
initialization code (i.e. KVM running on hugepagesz=1g didn't work as
expected).

This further patch enforces the above assumption for gigantic compound
pages too. It has been successfully verified to fix the gigantic
compound pages memory leak in combination with patch
11feeb498086a3a5907b8148bdf1786a9b18fc55.

Enforcing PG_reserved not set for tail pages of hugetlbfs gigantic
compound pages sounds safer regardless of commit
11feeb498086a3a5907b8148bdf1786a9b18fc55 to be consistent with the
other hugetlbfs page sizes (i.e hugetlbfs page order < MAX_ORDER).

Thanks,
Andrea

Andrea Arcangeli (1):
  mm: hugetlb: initialize PG_reserved for tail pages of gigantig
    compound pages

 mm/hugetlb.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
  2013-10-10 16:12 [Qemu-devel] [PATCH] initialize PG_reserved for tail pages of gigantig compound pages Andrea Arcangeli
@ 2013-10-10 16:12 ` Andrea Arcangeli
  2013-10-10 17:51   ` Rik van Riel
  2013-10-10 22:13   ` Rafael Aquini
  0 siblings, 2 replies; 4+ messages in thread
From: Andrea Arcangeli @ 2013-10-10 16:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kvm, Gleb Natapov, Hugh Dickins, qemu-devel, linux-kernel,
	linux-mm, Mel Gorman

11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
KVM is run on gigantic compound pages.

11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
that PG_reserved is identical for all head and tail pages of a
compound page. So that if get_user_pages returns a tail page, we don't
need to check the head page in order to know if we deal with a
reserved page that requires different refcounting.

The assumption that PG_reserved is the same for head and tail pages is
certainly correct for THP and regular hugepages, but gigantic
hugepages allocated through bootmem don't clear the PG_reserved on the
tail pages (the clearing of PG_reserved is done later only if the
gigantic hugepage is freed).

This patch corrects the gigantic compound page initialization so that
we can retain the optimization in
11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
modified in order to set PG_tail so this won't affect the boot time of
large memory systems.

Reported-by: andy123 <ajs124.ajs124@gmail.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
---
 mm/hugetlb.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index b49579c..315450e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -695,8 +695,24 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
 	/* we rely on prep_new_huge_page to set the destructor */
 	set_compound_order(page, order);
 	__SetPageHead(page);
+	__ClearPageReserved(page);
 	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
 		__SetPageTail(p);
+		/*
+		 * For gigantic hugepages allocated through bootmem at
+		 * boot, it's safer to be consistent with the
+		 * not-gigantic hugepages and to clear the PG_reserved
+		 * bit from all tail pages too. Otherwse drivers using
+		 * get_user_pages() to access tail pages, may get the
+		 * reference counting wrong if they see the
+		 * PG_reserved bitflag set on a tail page (despite the
+		 * head page didn't have PG_reserved set). Enforcing
+		 * this consistency between head and tail pages,
+		 * allows drivers to optimize away a check on the head
+		 * page when they need know if put_page is needed after
+		 * get_user_pages() or not.
+		 */
+		__ClearPageReserved(p);
 		set_page_count(p, 0);
 		p->first_page = page;
 	}
@@ -1329,9 +1345,9 @@ static void __init gather_bootmem_prealloc(void)
 #else
 		page = virt_to_page(m);
 #endif
-		__ClearPageReserved(page);
 		WARN_ON(page_count(page) != 1);
 		prep_compound_huge_page(page, h->order);
+		WARN_ON(PageReserved(page));
 		prep_new_huge_page(h, page, page_to_nid(page));
 		/*
 		 * If we had gigantic hugepages allocated at boot time, we need

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
  2013-10-10 16:12 ` [Qemu-devel] [PATCH] mm: hugetlb: " Andrea Arcangeli
@ 2013-10-10 17:51   ` Rik van Riel
  2013-10-10 22:13   ` Rafael Aquini
  1 sibling, 0 replies; 4+ messages in thread
From: Rik van Riel @ 2013-10-10 17:51 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kvm, Gleb Natapov, Hugh Dickins, qemu-devel, linux-kernel,
	linux-mm, Mel Gorman, Andrew Morton

On 10/10/2013 12:12 PM, Andrea Arcangeli wrote:
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
> KVM is run on gigantic compound pages.
>
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
> that PG_reserved is identical for all head and tail pages of a
> compound page. So that if get_user_pages returns a tail page, we don't
> need to check the head page in order to know if we deal with a
> reserved page that requires different refcounting.
>
> The assumption that PG_reserved is the same for head and tail pages is
> certainly correct for THP and regular hugepages, but gigantic
> hugepages allocated through bootmem don't clear the PG_reserved on the
> tail pages (the clearing of PG_reserved is done later only if the
> gigantic hugepage is freed).
>
> This patch corrects the gigantic compound page initialization so that
> we can retain the optimization in
> 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
> modified in order to set PG_tail so this won't affect the boot time of
> large memory systems.
>
> Reported-by: andy123 <ajs124.ajs124@gmail.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
  2013-10-10 16:12 ` [Qemu-devel] [PATCH] mm: hugetlb: " Andrea Arcangeli
  2013-10-10 17:51   ` Rik van Riel
@ 2013-10-10 22:13   ` Rafael Aquini
  1 sibling, 0 replies; 4+ messages in thread
From: Rafael Aquini @ 2013-10-10 22:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: kvm, Gleb Natapov, Hugh Dickins, qemu-devel, linux-kernel,
	linux-mm, Mel Gorman, Andrew Morton

On Thu, Oct 10, 2013 at 06:12:41PM +0200, Andrea Arcangeli wrote:
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
> KVM is run on gigantic compound pages.
> 
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
> that PG_reserved is identical for all head and tail pages of a
> compound page. So that if get_user_pages returns a tail page, we don't
> need to check the head page in order to know if we deal with a
> reserved page that requires different refcounting.
> 
> The assumption that PG_reserved is the same for head and tail pages is
> certainly correct for THP and regular hugepages, but gigantic
> hugepages allocated through bootmem don't clear the PG_reserved on the
> tail pages (the clearing of PG_reserved is done later only if the
> gigantic hugepage is freed).
> 
> This patch corrects the gigantic compound page initialization so that
> we can retain the optimization in
> 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
> modified in order to set PG_tail so this won't affect the boot time of
> large memory systems.
> 
> Reported-by: andy123 <ajs124.ajs124@gmail.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---

Acked-by: Rafael Aquini <aquini@redhat.com>


>  mm/hugetlb.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b49579c..315450e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -695,8 +695,24 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
>  	/* we rely on prep_new_huge_page to set the destructor */
>  	set_compound_order(page, order);
>  	__SetPageHead(page);
> +	__ClearPageReserved(page);
>  	for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
>  		__SetPageTail(p);
> +		/*
> +		 * For gigantic hugepages allocated through bootmem at
> +		 * boot, it's safer to be consistent with the
> +		 * not-gigantic hugepages and to clear the PG_reserved
> +		 * bit from all tail pages too. Otherwse drivers using
> +		 * get_user_pages() to access tail pages, may get the
> +		 * reference counting wrong if they see the
> +		 * PG_reserved bitflag set on a tail page (despite the
> +		 * head page didn't have PG_reserved set). Enforcing
> +		 * this consistency between head and tail pages,
> +		 * allows drivers to optimize away a check on the head
> +		 * page when they need know if put_page is needed after
> +		 * get_user_pages() or not.
> +		 */
> +		__ClearPageReserved(p);
>  		set_page_count(p, 0);
>  		p->first_page = page;
>  	}
> @@ -1329,9 +1345,9 @@ static void __init gather_bootmem_prealloc(void)
>  #else
>  		page = virt_to_page(m);
>  #endif
> -		__ClearPageReserved(page);
>  		WARN_ON(page_count(page) != 1);
>  		prep_compound_huge_page(page, h->order);
> +		WARN_ON(PageReserved(page));
>  		prep_new_huge_page(h, page, page_to_nid(page));
>  		/*
>  		 * If we had gigantic hugepages allocated at boot time, we need
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-10-10 22:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-10 16:12 [Qemu-devel] [PATCH] initialize PG_reserved for tail pages of gigantig compound pages Andrea Arcangeli
2013-10-10 16:12 ` [Qemu-devel] [PATCH] mm: hugetlb: " Andrea Arcangeli
2013-10-10 17:51   ` Rik van Riel
2013-10-10 22:13   ` Rafael Aquini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).