linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dave Hansen <haveblue@us.ibm.com>
To: Adam Litke <agl@us.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH] hugetlb: Fix pool resizing corner case
Date: Wed, 03 Oct 2007 10:40:48 -0700	[thread overview]
Message-ID: <1191433248.4939.79.camel@localhost> (raw)
In-Reply-To: <20071003154748.19516.90317.stgit@kernel>

On Wed, 2007-10-03 at 08:47 -0700, Adam Litke wrote:
> When shrinking the size of the hugetlb pool via the nr_hugepages sysctl, we
> are careful to keep enough pages around to satisfy reservations.  But the
> calculation is flawed for the following scenario:
> 
> Action                          Pool Counters (Total, Free, Resv)
> ======                          =============
> Set pool to 1 page              1 1 0
> Map 1 page MAP_PRIVATE          1 1 0
> Touch the page to fault it in   1 0 0
> Set pool to 3 pages             3 2 0
> Map 2 pages MAP_SHARED          3 2 2
> Set pool to 2 pages             2 1 2 <-- Mistake, should be 3 2 2
> Touch the 2 shared pages        2 0 1 <-- Program crashes here
> 
> The last touch above will terminate the process due to lack of huge pages.
> 
> This patch corrects the calculation so that it factors in pages being used
> for private mappings.  Andrew, this is a standalone fix suitable for
> mainline.  It is also now corrected in my latest dynamic pool resizing
> patchset which I will send out soon.
> 
> Signed-off-by: Adam Litke <agl@us.ibm.com>
> ---
> 
>  mm/hugetlb.c |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 84c795e..7af3908 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
>  	for (i = 0; i < MAX_NUMNODES; ++i) {
>  		struct page *page, *next;
>  		list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> +			if (count >= nr_huge_pages)
> +				return;
>  			if (PageHighMem(page))
>  				continue;
>  			list_del(&page->lru);
>  			update_and_free_page(page);
>  			free_huge_pages--;
>  			free_huge_pages_node[page_to_nid(page)]--;
> -			if (count >= nr_huge_pages)
> -				return;
>  		}
>  	}
>  }

That's an excellent problem description.  I'm just a bit hazy on how the
patch fixes it. :)

What is the actual error in this loop?  The fact that we can go trying
to free pages when the count is actually OK?

BTW, try_to_free_low(count) kinda sucks for a function name.  Is that
count the number of pages we're trying to end up with, or the total
number of low pages that we're trying to free?

Also, as I look at try_to_free_low(), why do we need to #ifdef it out in
the case of !HIGHMEM?  If we have CONFIG_HIGHMEM=yes, we still might not
have any _actual_ high memory.  So, they loop obviously doesn't *hurt*
when there is no high memory.  

> @@ -251,7 +251,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
>  		return nr_huge_pages;
> 
>  	spin_lock(&hugetlb_lock);
> -	count = max(count, resv_huge_pages);
> +	count = max(count, resv_huge_pages + nr_huge_pages - free_huge_pages);
>  	try_to_free_low(count);
>  	while (count < nr_huge_pages) {
>  		struct page *page = dequeue_huge_page(NULL, 0);

The real problem with this line is that "count" is too ambiguous. :)

We could rewrite the original max() line this way:

	if (resv_huge_pages > nr_of_pages_to_end_up_with)
		nr_of_pages_to_end_up_with = resv_huge_pages;
	try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);

Which makes it more clear that you're setting the number of total pages
to the number of reserved pages, which is obviously screwy.

OK, so this is actually saying: "count can never go below
resv_huge_pages+nr_huge_pages"?

Could we change try_to_free_low() to free a distinct number of pages?

	if (count > free_huge_pages)
		count = free_huge_pages;
	try_to_free_nr_huge_pages(count);

I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
free_huge_pages" logic.  Could you elaborate a bit there on what the
rules are?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2007-10-03 17:40 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-10-03 15:47 [PATCH] hugetlb: Fix pool resizing corner case Adam Litke
2007-10-03 17:40 ` Dave Hansen [this message]
2007-10-03 18:33   ` Adam Litke
2007-10-03 18:59     ` Dave Hansen
2007-10-03 19:08       ` Ken Chen
2007-10-03 19:03     ` Ken Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1191433248.4939.79.camel@localhost \
    --to=haveblue@us.ibm.com \
    --cc=agl@us.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).