From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l93HerbW015610 for ; Wed, 3 Oct 2007 13:40:53 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l93HerUU693774 for ; Wed, 3 Oct 2007 13:40:53 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l93Heqmj029571 for ; Wed, 3 Oct 2007 13:40:52 -0400 Subject: Re: [PATCH] hugetlb: Fix pool resizing corner case From: Dave Hansen In-Reply-To: <20071003154748.19516.90317.stgit@kernel> References: <20071003154748.19516.90317.stgit@kernel> Content-Type: text/plain Date: Wed, 03 Oct 2007 10:40:48 -0700 Message-Id: <1191433248.4939.79.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org List-ID: On Wed, 2007-10-03 at 08:47 -0700, Adam Litke wrote: > When shrinking the size of the hugetlb pool via the nr_hugepages sysctl, we > are careful to keep enough pages around to satisfy reservations. But the > calculation is flawed for the following scenario: > > Action Pool Counters (Total, Free, Resv) > ====== ============= > Set pool to 1 page 1 1 0 > Map 1 page MAP_PRIVATE 1 1 0 > Touch the page to fault it in 1 0 0 > Set pool to 3 pages 3 2 0 > Map 2 pages MAP_SHARED 3 2 2 > Set pool to 2 pages 2 1 2 <-- Mistake, should be 3 2 2 > Touch the 2 shared pages 2 0 1 <-- Program crashes here > > The last touch above will terminate the process due to lack of huge pages. > > This patch corrects the calculation so that it factors in pages being used > for private mappings. Andrew, this is a standalone fix suitable for > mainline. It is also now corrected in my latest dynamic pool resizing > patchset which I will send out soon. > > Signed-off-by: Adam Litke > --- > > mm/hugetlb.c | 6 +++--- > 1 files changed, 3 insertions(+), 3 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 84c795e..7af3908 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count) > for (i = 0; i < MAX_NUMNODES; ++i) { > struct page *page, *next; > list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) { > + if (count >= nr_huge_pages) > + return; > if (PageHighMem(page)) > continue; > list_del(&page->lru); > update_and_free_page(page); > free_huge_pages--; > free_huge_pages_node[page_to_nid(page)]--; > - if (count >= nr_huge_pages) > - return; > } > } > } That's an excellent problem description. I'm just a bit hazy on how the patch fixes it. :) What is the actual error in this loop? The fact that we can go trying to free pages when the count is actually OK? BTW, try_to_free_low(count) kinda sucks for a function name. Is that count the number of pages we're trying to end up with, or the total number of low pages that we're trying to free? Also, as I look at try_to_free_low(), why do we need to #ifdef it out in the case of !HIGHMEM? If we have CONFIG_HIGHMEM=yes, we still might not have any _actual_ high memory. So, they loop obviously doesn't *hurt* when there is no high memory. > @@ -251,7 +251,7 @@ static unsigned long set_max_huge_pages(unsigned long count) > return nr_huge_pages; > > spin_lock(&hugetlb_lock); > - count = max(count, resv_huge_pages); > + count = max(count, resv_huge_pages + nr_huge_pages - free_huge_pages); > try_to_free_low(count); > while (count < nr_huge_pages) { > struct page *page = dequeue_huge_page(NULL, 0); The real problem with this line is that "count" is too ambiguous. :) We could rewrite the original max() line this way: if (resv_huge_pages > nr_of_pages_to_end_up_with) nr_of_pages_to_end_up_with = resv_huge_pages; try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with); Which makes it more clear that you're setting the number of total pages to the number of reserved pages, which is obviously screwy. OK, so this is actually saying: "count can never go below resv_huge_pages+nr_huge_pages"? Could we change try_to_free_low() to free a distinct number of pages? if (count > free_huge_pages) count = free_huge_pages; try_to_free_nr_huge_pages(count); I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages - free_huge_pages" logic. Could you elaborate a bit there on what the rules are? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org