From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PM0tuv031601 for ; Mon, 25 Feb 2008 17:00:55 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PM1KGI178326 for ; Mon, 25 Feb 2008 15:01:20 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PM1Kjb028880 for ; Mon, 25 Feb 2008 15:01:20 -0700 From: Adam Litke Subject: [PATCH 0/3] hugetlb: Dynamic pool resize improvements Date: Mon, 25 Feb 2008 14:01:19 -0800 Message-Id: <20080225220119.23627.33676.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: This series of patches contains fixes for a few issues found with the new dynamically resizing hugetlb pool while stress testing. The first patch corrects the page count for surplus huge pages when they are first allocated. This avoids a BUG when CONFIG_DEBUG_VM is enabled. The second patch closes a difficult to trigger race when setting up a reservation involving surplus pages which could lead to reservations not being honored. The third patch is a minor performance optimization in gather_surplus_huge_pages(). Patches 1 and 2 are candidates for -stable. These patches were generated against 2.6.24 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e36.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PM1WHK028492 for ; Mon, 25 Feb 2008 17:01:32 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PM1WCL214784 for ; Mon, 25 Feb 2008 15:01:32 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PM1VCp029542 for ; Mon, 25 Feb 2008 15:01:31 -0700 From: Adam Litke Subject: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages Date: Mon, 25 Feb 2008 14:01:29 -0800 Message-Id: <20080225220129.23627.5152.stgit@kernel> In-Reply-To: <20080225220119.23627.33676.stgit@kernel> References: <20080225220119.23627.33676.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: Free pages in the hugetlb pool are free and as such have a reference count of zero. Regular allocations into the pool from the buddy are "freed" into the pool which results in their page_count dropping to zero. However, surplus pages are directly utilized by the caller without first being freed so an explicit reset of the reference count is needed. This hasn't effected end users because the bad page count is reset before the page is handed off. However, under CONFIG_DEBUG_VM this triggers a BUG when the page count is validated. Thanks go to Mel for first spotting this issue and providing an initial fix. Signed-off-by: Adam Litke Cc: Mel Gorman --- mm/hugetlb.c | 12 +++++++++--- 1 files changed, 9 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index db861d8..026e5ee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -267,6 +267,11 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma, spin_lock(&hugetlb_lock); if (page) { + /* + * This page is now managed by the hugetlb allocator and has + * no current users -- reset its reference count. + */ + set_page_count(page, 0); nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* @@ -345,13 +350,14 @@ free: enqueue_huge_page(page); else { /* - * Decrement the refcount and free the page using its - * destructor. This must be done with hugetlb_lock + * The page has a reference count of zero already, so + * call free_huge_page directly instead of using + * put_page. This must be done with hugetlb_lock * unlocked which is safe because free_huge_page takes * hugetlb_lock before deciding how to free the page. */ spin_unlock(&hugetlb_lock); - put_page(page); + free_huge_page(page); spin_lock(&hugetlb_lock); } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PM1hPc030995 for ; Mon, 25 Feb 2008 17:01:43 -0500 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PM1hSH192758 for ; Mon, 25 Feb 2008 15:01:43 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PM1h9W023704 for ; Mon, 25 Feb 2008 15:01:43 -0700 From: Adam Litke Subject: [PATCH 2/3] hugetlb: Close a difficult to trigger reservation race Date: Mon, 25 Feb 2008 14:01:41 -0800 Message-Id: <20080225220141.23627.80568.stgit@kernel> In-Reply-To: <20080225220119.23627.33676.stgit@kernel> References: <20080225220119.23627.33676.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: If the following occurs in threads A and B: A) Allocates some surplus pages to satisfy a reservation B) Frees some huge pages A) A notices the extra free pages and drops hugetlb_lock to free some of its surplus pages back to the buddy allocator. B) Allocates some huge pages A) Reacquires hugetlb_lock and returns from gather_surplus_huge_pages() This series of events can result in a "short" reservation and subsequent application failure. Avoid this by commiting the reservation after pages have been allocated but before dropping the lock to free excess pages. For parity, release the reservation in return_unused_surplus_pages(). Thanks to Andy Whitcroft for discovering this. This does make the functions gather_surplus_huge_pages() and return_unused_surplus_pages() responsible for a bit more than their names imply. Does anyone support function name changes to create_reservation() and destroy_reservation() respectively? Signed-off-by: Adam Litke Cc: Mel Gorman Cc: Andy Whitcroft --- mm/hugetlb.c | 17 +++++++++++++---- 1 files changed, 13 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 026e5ee..8296431 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -300,8 +300,10 @@ static int gather_surplus_pages(int delta) int needed, allocated; needed = (resv_huge_pages + delta) - free_huge_pages; - if (needed <= 0) + if (needed <= 0) { + resv_huge_pages += delta; return 0; + } allocated = 0; INIT_LIST_HEAD(&surplus_list); @@ -339,9 +341,12 @@ retry: * The surplus_list now contains _at_least_ the number of extra pages * needed to accomodate the reservation. Add the appropriate number * of pages to the hugetlb pool and free the extras back to the buddy - * allocator. + * allocator. Commit the entire reservation here to prevent another + * process from stealing the pages as they are added to the pool but + * before they are reserved. */ needed += allocated; + resv_huge_pages += delta; ret = 0; free: list_for_each_entry_safe(page, tmp, &surplus_list, lru) { @@ -376,6 +381,9 @@ static void return_unused_surplus_pages(unsigned long unused_resv_pages) struct page *page; unsigned long nr_pages; + /* Uncommit the reservation */ + resv_huge_pages -= unused_resv_pages; + nr_pages = min(unused_resv_pages, surplus_huge_pages); while (nr_pages) { @@ -1197,12 +1205,13 @@ static int hugetlb_acct_memory(long delta) if (gather_surplus_pages(delta) < 0) goto out; - if (delta > cpuset_mems_nr(free_huge_pages_node)) + if (delta > cpuset_mems_nr(free_huge_pages_node)) { + return_unused_surplus_pages(delta); goto out; + } } ret = 0; - resv_huge_pages += delta; if (delta < 0) return_unused_surplus_pages((unsigned long) -delta); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e33.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PM1tpc005503 for ; Mon, 25 Feb 2008 17:01:55 -0500 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PM1sMr189590 for ; Mon, 25 Feb 2008 15:01:54 -0700 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PM1sma031202 for ; Mon, 25 Feb 2008 15:01:54 -0700 From: Adam Litke Subject: [PATCH 3/3] hugetlb: Decrease hugetlb_lock cycling in gather_surplus_huge_pages Date: Mon, 25 Feb 2008 14:01:52 -0800 Message-Id: <20080225220152.23627.25591.stgit@kernel> In-Reply-To: <20080225220119.23627.33676.stgit@kernel> References: <20080225220119.23627.33676.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: linux-mm@kvack.org Cc: mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: To reduce the number of times we acquire and release hugetlb_lock when freeing excess huge pages, loop through the page list twice: once to siphon pages into the hugetlb pool, and again to free the excess pages back to the buddy allocator. This removes the lock/unlock around free_huge_page(). Note that we still visit each page only once since a page is always removed from the list when it is visited. Thanks Mel Gorman for this improvement. Signed-off-by: Adam Litke Cc: Mel Gorman --- mm/hugetlb.c | 17 ++++++++++++----- 1 files changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8296431..306d762 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -349,11 +349,19 @@ retry: resv_huge_pages += delta; ret = 0; free: + /* Free the needed pages to the hugetlb pool */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + if ((--needed) < 0) + break; list_del(&page->lru); - if ((--needed) >= 0) - enqueue_huge_page(page); - else { + enqueue_huge_page(page); + } + + /* Free unnecessary surplus pages to the buddy allocator */ + if (!list_empty(&surplus_list)) { + spin_unlock(&hugetlb_lock); + list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + list_del(&page->lru); /* * The page has a reference count of zero already, so * call free_huge_page directly instead of using @@ -361,10 +369,9 @@ free: * unlocked which is safe because free_huge_page takes * hugetlb_lock before deciding how to free the page. */ - spin_unlock(&hugetlb_lock); free_huge_page(page); - spin_lock(&hugetlb_lock); } + spin_lock(&hugetlb_lock); } return ret; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PMQ7f2012035 for ; Mon, 25 Feb 2008 17:26:07 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PMQ680281934 for ; Mon, 25 Feb 2008 17:26:06 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PMQ67H024297 for ; Mon, 25 Feb 2008 17:26:06 -0500 Subject: Re: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages From: Dave Hansen In-Reply-To: <20080225220129.23627.5152.stgit@kernel> References: <20080225220119.23627.33676.stgit@kernel> <20080225220129.23627.5152.stgit@kernel> Content-Type: text/plain Date: Mon, 25 Feb 2008 14:26:03 -0800 Message-Id: <1203978363.11846.10.camel@nimitz.home.sr71.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: Mon, 2008-02-25 at 14:01 -0800, Adam Litke wrote: > > spin_lock(&hugetlb_lock); > if (page) { > + /* > + * This page is now managed by the hugetlb allocator and has > + * no current users -- reset its reference count. > + */ > + set_page_count(page, 0); So, they come out of the allocator and have a refcount of 1, and you want them to be consistent with the other huge pages that have a count of 0? I'd feel a lot better about this if you did a __put_page() then a atomic_read() or equivalent to double-check what's going on. (I basically suggested the same thing to Jon Tollefson on the ginormous page stuff). It just forces the thing to be more consistent. It also seems a bit goofy to me to zero the refcount here, then reset it to one later on in update_and_free_page(). I dunno. It just seems like every time something in here gets touched, three other things break. Makes me nervous. :( -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e5.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PMV3Tx030724 for ; Mon, 25 Feb 2008 17:31:03 -0500 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PMV3be254498 for ; Mon, 25 Feb 2008 17:31:03 -0500 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PMV3ee029543 for ; Mon, 25 Feb 2008 17:31:03 -0500 Subject: Re: [PATCH 3/3] hugetlb: Decrease hugetlb_lock cycling in gather_surplus_huge_pages From: Dave Hansen In-Reply-To: <20080225220152.23627.25591.stgit@kernel> References: <20080225220119.23627.33676.stgit@kernel> <20080225220152.23627.25591.stgit@kernel> Content-Type: text/plain Date: Mon, 25 Feb 2008 14:31:00 -0800 Message-Id: <1203978660.11846.11.camel@nimitz.home.sr71.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Mon, 2008-02-25 at 14:01 -0800, Adam Litke wrote: > + /* Free unnecessary surplus pages to the buddy allocator */ > + if (!list_empty(&surplus_list)) { > + spin_unlock(&hugetlb_lock); > + list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > + list_del(&page->lru); What is the surplus_list protected by? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PMi3EU017383 for ; Mon, 25 Feb 2008 17:44:03 -0500 Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PMiQbV177384 for ; Mon, 25 Feb 2008 15:44:26 -0700 Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1]) by d03av04.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PMiQU3022864 for ; Mon, 25 Feb 2008 15:44:26 -0700 Subject: Re: [PATCH 3/3] hugetlb: Decrease hugetlb_lock cycling in gather_surplus_huge_pages From: Adam Litke In-Reply-To: <1203978660.11846.11.camel@nimitz.home.sr71.net> References: <20080225220119.23627.33676.stgit@kernel> <20080225220152.23627.25591.stgit@kernel> <1203978660.11846.11.camel@nimitz.home.sr71.net> Content-Type: text/plain Date: Mon, 25 Feb 2008 16:51:45 -0600 Message-Id: <1203979905.3837.21.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Mon, 2008-02-25 at 14:31 -0800, Dave Hansen wrote: > On Mon, 2008-02-25 at 14:01 -0800, Adam Litke wrote: > > + /* Free unnecessary surplus pages to the buddy allocator */ > > + if (!list_empty(&surplus_list)) { > > + spin_unlock(&hugetlb_lock); > > + list_for_each_entry_safe(page, tmp, &surplus_list, lru) { > > + list_del(&page->lru); > > What is the surplus_list protected by? It's a local variable on the stack. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e6.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PMvcDp022157 for ; Mon, 25 Feb 2008 17:57:38 -0500 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PMtfh0247414 for ; Mon, 25 Feb 2008 17:55:41 -0500 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PMte4q001735 for ; Mon, 25 Feb 2008 17:55:41 -0500 Subject: Re: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages From: Adam Litke In-Reply-To: <1203978363.11846.10.camel@nimitz.home.sr71.net> References: <20080225220119.23627.33676.stgit@kernel> <20080225220129.23627.5152.stgit@kernel> <1203978363.11846.10.camel@nimitz.home.sr71.net> Content-Type: text/plain Date: Mon, 25 Feb 2008 17:03:00 -0600 Message-Id: <1203980580.3837.30.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Mon, 2008-02-25 at 14:26 -0800, Dave Hansen wrote: > Mon, 2008-02-25 at 14:01 -0800, Adam Litke wrote: > > > > spin_lock(&hugetlb_lock); > > if (page) { > > + /* > > + * This page is now managed by the hugetlb allocator and has > > + * no current users -- reset its reference count. > > + */ > > + set_page_count(page, 0); > > So, they come out of the allocator and have a refcount of 1, and you > want them to be consistent with the other huge pages that have a count > of 0? Yep all pages come out of the buddy allocator in this state. What is different in this case is that we are choosing not to enqueue it into the hugetlb pool right away since it might be immediately needed by the caller. > I'd feel a lot better about this if you did a __put_page() then a > atomic_read() or equivalent to double-check what's going on. (I > basically suggested the same thing to Jon Tollefson on the ginormous > page stuff). It just forces the thing to be more consistent. I could agree to this. > It also seems a bit goofy to me to zero the refcount here, then reset it > to one later on in update_and_free_page(). Yeah, it is a special case -- and commented accordingly. Do you have any ideas how to avoid it without the wasted time of an enqueue_huge_page()/dequeue_huge_page() cycle? > I dunno. It just seems like every time something in here gets touched, > three other things break. Makes me nervous. :( Now c'mon, that's not fair. I'd expect that sort of statement from the Hillary Clinton campaign, not you Dave :) -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e5.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1PNBpmt015239 for ; Mon, 25 Feb 2008 18:11:51 -0500 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1PNBpH6286654 for ; Mon, 25 Feb 2008 18:11:51 -0500 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1PNBp2t002587 for ; Mon, 25 Feb 2008 18:11:51 -0500 Subject: Re: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages From: Dave Hansen In-Reply-To: <1203980580.3837.30.camel@localhost.localdomain> References: <20080225220119.23627.33676.stgit@kernel> <20080225220129.23627.5152.stgit@kernel> <1203978363.11846.10.camel@nimitz.home.sr71.net> <1203980580.3837.30.camel@localhost.localdomain> Content-Type: text/plain Date: Mon, 25 Feb 2008 15:11:49 -0800 Message-Id: <1203981109.11846.22.camel@nimitz.home.sr71.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Mon, 2008-02-25 at 17:03 -0600, Adam Litke wrote: > > It also seems a bit goofy to me to zero the refcount here, then reset it > > to one later on in update_and_free_page(). > > Yeah, it is a special case -- and commented accordingly. Do you have > any ideas how to avoid it without the wasted time of an > enqueue_huge_page()/dequeue_huge_page() cycle? There are a couple of steps here, right? 1. alloc from the buddy list 2. initialize to set ->dtor, page->_count, etc... 3. enqueue_huge_page() 4. somebody does dequeue_huge_page() and gets it I wonder if it might get simpler if you just make the pages on the freelists "virgin buddy pages". Basically don't touch pages much until after they're dequeued. Flip flop (a la John Kerry) the order around a bit: 1. alloc from the buddy list 2. enqueue_huge_page() 3. somebody does dequeue_huge_page() and before it returns, we: 4. initialize to set ->dtor, page->_count, etc... This has the disadvantage of shifting some work from a "once per alloc from the buddy list" to "once per en/dequeue". Basically, just try and re-think when you turn pages from plain buddy pages into hugetlb-flavored pages. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e32.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1QGjjri008290 for ; Tue, 26 Feb 2008 11:45:45 -0500 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1QGkDEc178240 for ; Tue, 26 Feb 2008 09:46:13 -0700 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1QGkC7u014109 for ; Tue, 26 Feb 2008 09:46:12 -0700 Subject: Re: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages From: Adam Litke In-Reply-To: <1203981109.11846.22.camel@nimitz.home.sr71.net> References: <20080225220119.23627.33676.stgit@kernel> <20080225220129.23627.5152.stgit@kernel> <1203978363.11846.10.camel@nimitz.home.sr71.net> <1203980580.3837.30.camel@localhost.localdomain> <1203981109.11846.22.camel@nimitz.home.sr71.net> Content-Type: text/plain Date: Tue, 26 Feb 2008 10:53:35 -0600 Message-Id: <1204044815.3837.45.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Mon, 2008-02-25 at 15:11 -0800, Dave Hansen wrote: > I wonder if it might get simpler if you just make the pages on the > freelists "virgin buddy pages". Basically don't touch pages much until > after they're dequeued. Flip flop (a la John Kerry) the order around a > bit: > > 1. alloc from the buddy list > 2. enqueue_huge_page() > 3. somebody does dequeue_huge_page() and before it returns, we: > 4. initialize to set ->dtor, page->_count, etc... > > This has the disadvantage of shifting some work from a "once per alloc > from the buddy list" to "once per en/dequeue". Basically, just try and > re-think when you turn pages from plain buddy pages into > hugetlb-flavored pages. This is an interesting idea and I will think about it some more. However, switching this around will introduce more of the churn that makes people nervous. So I would appeal that we put forth my original idea (with your suggested modification) because it is a simple and verifiable bug fix. Amended patch follows: commit 635ff936930f5c56be23feffd06764554f88f8ad Author: Adam Litke Date: Tue Feb 26 08:42:33 2008 -0800 hugetlb: Correct page count for surplus huge pages Free pages in the hugetlb pool are free and as such have a reference count of zero. Regular allocations into the pool from the buddy are "freed" into the pool which results in their page_count dropping to zero. However, surplus pages are directly utilized by the caller without first being freed so an explicit reset of the reference count is needed. This hasn't effected end users because the bad page count is reset before the page is handed off. However, under CONFIG_DEBUG_VM this triggers a BUG when the page count is validated. Thanks go to Mel for first spotting this issue and providing an initial fix. Signed-off-by: Adam Litke Cc: Mel Gorman diff --git a/mm/hugetlb.c b/mm/hugetlb.c index db861d8..5afcacb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -267,6 +267,11 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma, spin_lock(&hugetlb_lock); if (page) { + /* + * This page is now managed by the hugetlb allocator and has + * no current users -- reset its reference count. + */ + BUG_ON(!put_page_testzero(page)); nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* @@ -345,13 +350,14 @@ free: enqueue_huge_page(page); else { /* - * Decrement the refcount and free the page using its - * destructor. This must be done with hugetlb_lock + * The page has a reference count of zero already, so + * call free_huge_page directly instead of using + * put_page. This must be done with hugetlb_lock * unlocked which is safe because free_huge_page takes * hugetlb_lock before deciding how to free the page. */ spin_unlock(&hugetlb_lock); - put_page(page); + free_huge_page(page); spin_lock(&hugetlb_lock); } } -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m1QGmv0j009909 for ; Tue, 26 Feb 2008 11:48:57 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m1QGmvXK253600 for ; Tue, 26 Feb 2008 11:48:57 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m1QGmujq018502 for ; Tue, 26 Feb 2008 11:48:57 -0500 Subject: Re: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages From: Dave Hansen In-Reply-To: <1204044815.3837.45.camel@localhost.localdomain> References: <20080225220119.23627.33676.stgit@kernel> <20080225220129.23627.5152.stgit@kernel> <1203978363.11846.10.camel@nimitz.home.sr71.net> <1203980580.3837.30.camel@localhost.localdomain> <1203981109.11846.22.camel@nimitz.home.sr71.net> <1204044815.3837.45.camel@localhost.localdomain> Content-Type: text/plain Date: Tue, 26 Feb 2008 08:48:54 -0800 Message-Id: <1204044534.1228.0.camel@nimitz.home.sr71.net> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: linux-mm@kvack.org, mel@csn.ul.ie, apw@shadowen.org, nacc@linux.vnet.ibm.com, agl@linux.vnet.ibm.com List-ID: On Tue, 2008-02-26 at 10:53 -0600, Adam Litke wrote: > This is an interesting idea and I will think about it some more. > However, switching this around will introduce more of the churn that > makes people nervous. Touche! > So I would appeal that we put forth my original > idea (with your suggested modification) because it is a simple and > verifiable bug fix. Looks good, thanks Adam. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m23I6ZfB024124 for ; Mon, 3 Mar 2008 13:06:35 -0500 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m23I6ZFR066174 for ; Mon, 3 Mar 2008 11:06:35 -0700 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m23I6ZXN008662 for ; Mon, 3 Mar 2008 11:06:35 -0700 From: Adam Litke Subject: [PATCH 1/3] hugetlb: Correct page count for surplus huge pages Date: Mon, 03 Mar 2008 10:06:32 -0800 Message-Id: <20080303180632.5383.7661.stgit@kernel> In-Reply-To: <20080303180622.5383.20868.stgit@kernel> References: <20080303180622.5383.20868.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Dave Hansen , Andy Whitcroft , Mel Gorman , Adam Litke List-ID: Free pages in the hugetlb pool are free and as such have a reference count of zero. Regular allocations into the pool from the buddy are "freed" into the pool which results in their page_count dropping to zero. However, surplus pages can be directly utilized by the caller without first being freed to the pool. Therefore, a call to put_page_testzero() is in order so that such a page will be handed to the caller with a correct count. This has not effected end users because the bad page count is reset before the page is handed off. However, under CONFIG_DEBUG_VM this triggers a BUG when the page count is validated. Thanks go to Mel for first spotting this issue and providing an initial fix. Signed-off-by: Adam Litke Cc: Mel Gorman --- mm/hugetlb.c | 13 ++++++++++--- 1 files changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index db861d8..819d6d9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -267,6 +267,12 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma, spin_lock(&hugetlb_lock); if (page) { + /* + * This page is now managed by the hugetlb allocator and has + * no users -- drop the buddy allocator's reference. + */ + int page_count = put_page_testzero(page); + BUG_ON(page_count != 0); nid = page_to_nid(page); set_compound_page_dtor(page, free_huge_page); /* @@ -345,13 +351,14 @@ free: enqueue_huge_page(page); else { /* - * Decrement the refcount and free the page using its - * destructor. This must be done with hugetlb_lock + * The page has a reference count of zero already, so + * call free_huge_page directly instead of using + * put_page. This must be done with hugetlb_lock * unlocked which is safe because free_huge_page takes * hugetlb_lock before deciding how to free the page. */ spin_unlock(&hugetlb_lock); - put_page(page); + free_huge_page(page); spin_lock(&hugetlb_lock); } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org