From: Rafael Aquini <aquini@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
Date: Thu, 10 Oct 2013 19:13:43 -0300 [thread overview]
Message-ID: <20131010221343.GA23021@localhost.localdomain> (raw)
In-Reply-To: <1381421561-10203-2-git-send-email-aarcange@redhat.com>
On Thu, Oct 10, 2013 at 06:12:41PM +0200, Andrea Arcangeli wrote:
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
> KVM is run on gigantic compound pages.
>
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
> that PG_reserved is identical for all head and tail pages of a
> compound page. So that if get_user_pages returns a tail page, we don't
> need to check the head page in order to know if we deal with a
> reserved page that requires different refcounting.
>
> The assumption that PG_reserved is the same for head and tail pages is
> certainly correct for THP and regular hugepages, but gigantic
> hugepages allocated through bootmem don't clear the PG_reserved on the
> tail pages (the clearing of PG_reserved is done later only if the
> gigantic hugepage is freed).
>
> This patch corrects the gigantic compound page initialization so that
> we can retain the optimization in
> 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
> modified in order to set PG_tail so this won't affect the boot time of
> large memory systems.
>
> Reported-by: andy123 <ajs124.ajs124@gmail.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
Acked-by: Rafael Aquini <aquini@redhat.com>
> mm/hugetlb.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b49579c..315450e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -695,8 +695,24 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
> /* we rely on prep_new_huge_page to set the destructor */
> set_compound_order(page, order);
> __SetPageHead(page);
> + __ClearPageReserved(page);
> for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> __SetPageTail(p);
> + /*
> + * For gigantic hugepages allocated through bootmem at
> + * boot, it's safer to be consistent with the
> + * not-gigantic hugepages and to clear the PG_reserved
> + * bit from all tail pages too. Otherwse drivers using
> + * get_user_pages() to access tail pages, may get the
> + * reference counting wrong if they see the
> + * PG_reserved bitflag set on a tail page (despite the
> + * head page didn't have PG_reserved set). Enforcing
> + * this consistency between head and tail pages,
> + * allows drivers to optimize away a check on the head
> + * page when they need know if put_page is needed after
> + * get_user_pages() or not.
> + */
> + __ClearPageReserved(p);
> set_page_count(p, 0);
> p->first_page = page;
> }
> @@ -1329,9 +1345,9 @@ static void __init gather_bootmem_prealloc(void)
> #else
> page = virt_to_page(m);
> #endif
> - __ClearPageReserved(page);
> WARN_ON(page_count(page) != 1);
> prep_compound_huge_page(page, h->order);
> + WARN_ON(PageReserved(page));
> prep_new_huge_page(h, page, page_to_nid(page));
> /*
> * If we had gigantic hugepages allocated at boot time, we need
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Rafael Aquini <aquini@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
qemu-devel@nongnu.org, kvm@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
Mel Gorman <mgorman@suse.de>, Rik van Riel <riel@redhat.com>,
Hugh Dickins <hughd@google.com>
Subject: Re: [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
Date: Thu, 10 Oct 2013 19:13:43 -0300 [thread overview]
Message-ID: <20131010221343.GA23021@localhost.localdomain> (raw)
In-Reply-To: <1381421561-10203-2-git-send-email-aarcange@redhat.com>
On Thu, Oct 10, 2013 at 06:12:41PM +0200, Andrea Arcangeli wrote:
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
> KVM is run on gigantic compound pages.
>
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
> that PG_reserved is identical for all head and tail pages of a
> compound page. So that if get_user_pages returns a tail page, we don't
> need to check the head page in order to know if we deal with a
> reserved page that requires different refcounting.
>
> The assumption that PG_reserved is the same for head and tail pages is
> certainly correct for THP and regular hugepages, but gigantic
> hugepages allocated through bootmem don't clear the PG_reserved on the
> tail pages (the clearing of PG_reserved is done later only if the
> gigantic hugepage is freed).
>
> This patch corrects the gigantic compound page initialization so that
> we can retain the optimization in
> 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
> modified in order to set PG_tail so this won't affect the boot time of
> large memory systems.
>
> Reported-by: andy123 <ajs124.ajs124@gmail.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
Acked-by: Rafael Aquini <aquini@redhat.com>
> mm/hugetlb.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b49579c..315450e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -695,8 +695,24 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
> /* we rely on prep_new_huge_page to set the destructor */
> set_compound_order(page, order);
> __SetPageHead(page);
> + __ClearPageReserved(page);
> for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> __SetPageTail(p);
> + /*
> + * For gigantic hugepages allocated through bootmem at
> + * boot, it's safer to be consistent with the
> + * not-gigantic hugepages and to clear the PG_reserved
> + * bit from all tail pages too. Otherwse drivers using
> + * get_user_pages() to access tail pages, may get the
> + * reference counting wrong if they see the
> + * PG_reserved bitflag set on a tail page (despite the
> + * head page didn't have PG_reserved set). Enforcing
> + * this consistency between head and tail pages,
> + * allows drivers to optimize away a check on the head
> + * page when they need know if put_page is needed after
> + * get_user_pages() or not.
> + */
> + __ClearPageReserved(p);
> set_page_count(p, 0);
> p->first_page = page;
> }
> @@ -1329,9 +1345,9 @@ static void __init gather_bootmem_prealloc(void)
> #else
> page = virt_to_page(m);
> #endif
> - __ClearPageReserved(page);
> WARN_ON(page_count(page) != 1);
> prep_compound_huge_page(page, h->order);
> + WARN_ON(PageReserved(page));
> prep_new_huge_page(h, page, page_to_nid(page));
> /*
> * If we had gigantic hugepages allocated at boot time, we need
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
WARNING: multiple messages have this Message-ID (diff)
From: Rafael Aquini <aquini@redhat.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: kvm@vger.kernel.org, Gleb Natapov <gleb@redhat.com>,
Hugh Dickins <hughd@google.com>,
qemu-devel@nongnu.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, Mel Gorman <mgorman@suse.de>,
Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [Qemu-devel] [PATCH] mm: hugetlb: initialize PG_reserved for tail pages of gigantig compound pages
Date: Thu, 10 Oct 2013 19:13:43 -0300 [thread overview]
Message-ID: <20131010221343.GA23021@localhost.localdomain> (raw)
In-Reply-To: <1381421561-10203-2-git-send-email-aarcange@redhat.com>
On Thu, Oct 10, 2013 at 06:12:41PM +0200, Andrea Arcangeli wrote:
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 introduced a memory leak when
> KVM is run on gigantic compound pages.
>
> 11feeb498086a3a5907b8148bdf1786a9b18fc55 depends on the assumption
> that PG_reserved is identical for all head and tail pages of a
> compound page. So that if get_user_pages returns a tail page, we don't
> need to check the head page in order to know if we deal with a
> reserved page that requires different refcounting.
>
> The assumption that PG_reserved is the same for head and tail pages is
> certainly correct for THP and regular hugepages, but gigantic
> hugepages allocated through bootmem don't clear the PG_reserved on the
> tail pages (the clearing of PG_reserved is done later only if the
> gigantic hugepage is freed).
>
> This patch corrects the gigantic compound page initialization so that
> we can retain the optimization in
> 11feeb498086a3a5907b8148bdf1786a9b18fc55. The cacheline was already
> modified in order to set PG_tail so this won't affect the boot time of
> large memory systems.
>
> Reported-by: andy123 <ajs124.ajs124@gmail.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> ---
Acked-by: Rafael Aquini <aquini@redhat.com>
> mm/hugetlb.c | 18 +++++++++++++++++-
> 1 file changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index b49579c..315450e 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -695,8 +695,24 @@ static void prep_compound_gigantic_page(struct page *page, unsigned long order)
> /* we rely on prep_new_huge_page to set the destructor */
> set_compound_order(page, order);
> __SetPageHead(page);
> + __ClearPageReserved(page);
> for (i = 1; i < nr_pages; i++, p = mem_map_next(p, page, i)) {
> __SetPageTail(p);
> + /*
> + * For gigantic hugepages allocated through bootmem at
> + * boot, it's safer to be consistent with the
> + * not-gigantic hugepages and to clear the PG_reserved
> + * bit from all tail pages too. Otherwse drivers using
> + * get_user_pages() to access tail pages, may get the
> + * reference counting wrong if they see the
> + * PG_reserved bitflag set on a tail page (despite the
> + * head page didn't have PG_reserved set). Enforcing
> + * this consistency between head and tail pages,
> + * allows drivers to optimize away a check on the head
> + * page when they need know if put_page is needed after
> + * get_user_pages() or not.
> + */
> + __ClearPageReserved(p);
> set_page_count(p, 0);
> p->first_page = page;
> }
> @@ -1329,9 +1345,9 @@ static void __init gather_bootmem_prealloc(void)
> #else
> page = virt_to_page(m);
> #endif
> - __ClearPageReserved(page);
> WARN_ON(page_count(page) != 1);
> prep_compound_huge_page(page, h->order);
> + WARN_ON(PageReserved(page));
> prep_new_huge_page(h, page, page_to_nid(page));
> /*
> * If we had gigantic hugepages allocated at boot time, we need
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2013-10-10 22:13 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-10 16:12 [PATCH] initialize PG_reserved for tail pages of gigantig compound pages Andrea Arcangeli
2013-10-10 16:12 ` [Qemu-devel] " Andrea Arcangeli
2013-10-10 16:12 ` Andrea Arcangeli
2013-10-10 16:12 ` [PATCH] mm: hugetlb: " Andrea Arcangeli
2013-10-10 16:12 ` [Qemu-devel] " Andrea Arcangeli
2013-10-10 16:12 ` Andrea Arcangeli
2013-10-10 17:51 ` Rik van Riel
2013-10-10 17:51 ` [Qemu-devel] " Rik van Riel
2013-10-10 17:51 ` Rik van Riel
2013-10-10 22:13 ` Rafael Aquini [this message]
2013-10-10 22:13 ` [Qemu-devel] " Rafael Aquini
2013-10-10 22:13 ` Rafael Aquini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131010221343.GA23021@localhost.localdomain \
--to=aquini@redhat.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=gleb@redhat.com \
--cc=hughd@google.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@suse.de \
--cc=qemu-devel@nongnu.org \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.