* Re: [PATCH] hugetlb: Fix pool resizing corner case
@ 2007-10-03 18:33 ` Adam Litke
0 siblings, 0 replies; 12+ messages in thread
From: Adam Litke @ 2007-10-03 18:33 UTC (permalink / raw)
To: Dave Hansen; +Cc: Andrew Morton, linux-mm, linux-kernel
On Wed, 2007-10-03 at 10:40 -0700, Dave Hansen wrote:
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 84c795e..7af3908 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
> > for (i = 0; i < MAX_NUMNODES; ++i) {
> > struct page *page, *next;
> > list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> > + if (count >= nr_huge_pages)
> > + return;
> > if (PageHighMem(page))
> > continue;
> > list_del(&page->lru);
> > update_and_free_page(page);
> > free_huge_pages--;
> > free_huge_pages_node[page_to_nid(page)]--;
> > - if (count >= nr_huge_pages)
> > - return;
> > }
> > }
> > }
>
> That's an excellent problem description. I'm just a bit hazy on how the
> patch fixes it. :)
>
> What is the actual error in this loop? The fact that we can go trying
> to free pages when the count is actually OK?
The above hunk serves only to change the behavior of try_to_free_low()
so that rather than always freeing _at_least_ one huge page, it can
return without having freed any pages.
> BTW, try_to_free_low(count) kinda sucks for a function name. Is that
> count the number of pages we're trying to end up with, or the total
> number of low pages that we're trying to free?
I agree the name sucks, but this is a bugfix patch.
> Also, as I look at try_to_free_low(), why do we need to #ifdef it out in
> the case of !HIGHMEM? If we have CONFIG_HIGHMEM=yes, we still might not
> have any _actual_ high memory. So, they loop obviously doesn't *hurt*
> when there is no high memory.
Maybe, but not really in-scope of what this patch is trying to
accomplish.
> > @@ -251,7 +251,7 @@ static unsigned long set_max_huge_pages(unsigned long count)
> > return nr_huge_pages;
> >
> > spin_lock(&hugetlb_lock);
> > - count = max(count, resv_huge_pages);
> > + count = max(count, resv_huge_pages + nr_huge_pages - free_huge_pages);
> > try_to_free_low(count);
> > while (count < nr_huge_pages) {
> > struct page *page = dequeue_huge_page(NULL, 0);
>
> The real problem with this line is that "count" is too ambiguous. :)
I agree, count is almost always a bad variable name :)
> We could rewrite the original max() line this way:
>
> if (resv_huge_pages > nr_of_pages_to_end_up_with)
> nr_of_pages_to_end_up_with = resv_huge_pages;
> try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);
>
> Which makes it more clear that you're setting the number of total pages
> to the number of reserved pages, which is obviously screwy.
>
> OK, so this is actually saying: "count can never go below
> resv_huge_pages+nr_huge_pages"?
Not quite. Count can never go below the number of reserved pages plus
pages allocated to MAP_PRIVATE mappings. That number is computed by:
(resv + (total - free)).
> Could we change try_to_free_low() to free a distinct number of pages?
>
> if (count > free_huge_pages)
> count = free_huge_pages;
> try_to_free_nr_huge_pages(count);
>
> I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
> free_huge_pages" logic. Could you elaborate a bit there on what the
> rules are?
The key is that we don't want to shrink the pool below the number of
pages we are committed to keeping around. Before this patch, we only
accounted for the pages we plan to hand out (reserved huge pages) but
not the ones we've already handed out (total - free). Does that make
sense?
--
Adam Litke - (agl at us.ibm.com)
IBM Linux Technology Center
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] hugetlb: Fix pool resizing corner case
2007-10-03 18:33 ` Adam Litke
@ 2007-10-03 18:59 ` Dave Hansen
-1 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2007-10-03 18:59 UTC (permalink / raw)
To: Adam Litke; +Cc: Andrew Morton, linux-mm, linux-kernel
On Wed, 2007-10-03 at 13:33 -0500, Adam Litke wrote:
> On Wed, 2007-10-03 at 10:40 -0700, Dave Hansen wrote:
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 84c795e..7af3908 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
> > > for (i = 0; i < MAX_NUMNODES; ++i) {
> > > struct page *page, *next;
> > > list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> > > + if (count >= nr_huge_pages)
> > > + return;
> > > if (PageHighMem(page))
> > > continue;
> > > list_del(&page->lru);
> > > update_and_free_page(page);
> > > free_huge_pages--;
> > > free_huge_pages_node[page_to_nid(page)]--;
> > > - if (count >= nr_huge_pages)
> > > - return;
> > > }
> > > }
> > > }
> >
> > That's an excellent problem description. I'm just a bit hazy on how the
> > patch fixes it. :)
> >
> > What is the actual error in this loop? The fact that we can go trying
> > to free pages when the count is actually OK?
>
> The above hunk serves only to change the behavior of try_to_free_low()
> so that rather than always freeing _at_least_ one huge page, it can
> return without having freed any pages.
OK, that makes sense. Can you include that in the patch description?
> > BTW, try_to_free_low(count) kinda sucks for a function name. Is that
> > count the number of pages we're trying to end up with, or the total
> > number of low pages that we're trying to free?
>
> I agree the name sucks, but this is a bugfix patch.
Oh, yeah, none of what I was suggesting belongs in this patch. But,
they'd make good followups. I was just trying to get you to look at
what caused that bug to be introduced in the first place.
> > We could rewrite the original max() line this way:
> >
> > if (resv_huge_pages > nr_of_pages_to_end_up_with)
> > nr_of_pages_to_end_up_with = resv_huge_pages;
> > try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);
> >
> > Which makes it more clear that you're setting the number of total pages
> > to the number of reserved pages, which is obviously screwy.
> >
> > OK, so this is actually saying: "count can never go below
> > resv_huge_pages+nr_huge_pages"?
>
> Not quite. Count can never go below the number of reserved pages plus
> pages allocated to MAP_PRIVATE mappings. That number is computed by:
> (resv + (total - free)).
So, (total - free) equals the number of MAP_PRIVATE pages? Does that
imply that all reserved pages are shared and that all shared pages are
reserved?
This would be clearer even with
static inline int nr_map_private_hpages(void)
{
return total - free;
}
Especially, if we could get the "shared" name somewhere into the
reserved variable.
It makes a whole ton of sense to see:
int total_in_use = nr_shared() + nr_private();
if (nr_requested_pages < total_in_use)
uh_oh_cant_do_that();
That'd be a wonderful cleanup.
> > Could we change try_to_free_low() to free a distinct number of pages?
> >
> > if (count > free_huge_pages)
> > count = free_huge_pages;
> > try_to_free_nr_huge_pages(count);
> >
> > I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
> > free_huge_pages" logic. Could you elaborate a bit there on what the
> > rules are?
>
> The key is that we don't want to shrink the pool below the number of
> pages we are committed to keeping around. Before this patch, we only
> accounted for the pages we plan to hand out (reserved huge pages) but
> not the ones we've already handed out (total - free). Does that make
> sense?
Yes. I'm just starting to get the idea that pages are committed in
different ways. So, it makes good sense to have a nice function like
total_committed_hpages(), which explains all of this, and is called to
compare agaist the evil "count" variable. :)
-- Dave
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] hugetlb: Fix pool resizing corner case
@ 2007-10-03 18:59 ` Dave Hansen
0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2007-10-03 18:59 UTC (permalink / raw)
To: Adam Litke; +Cc: Andrew Morton, linux-mm, linux-kernel
On Wed, 2007-10-03 at 13:33 -0500, Adam Litke wrote:
> On Wed, 2007-10-03 at 10:40 -0700, Dave Hansen wrote:
> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > > index 84c795e..7af3908 100644
> > > --- a/mm/hugetlb.c
> > > +++ b/mm/hugetlb.c
> > > @@ -224,14 +224,14 @@ static void try_to_free_low(unsigned long count)
> > > for (i = 0; i < MAX_NUMNODES; ++i) {
> > > struct page *page, *next;
> > > list_for_each_entry_safe(page, next, &hugepage_freelists[i], lru) {
> > > + if (count >= nr_huge_pages)
> > > + return;
> > > if (PageHighMem(page))
> > > continue;
> > > list_del(&page->lru);
> > > update_and_free_page(page);
> > > free_huge_pages--;
> > > free_huge_pages_node[page_to_nid(page)]--;
> > > - if (count >= nr_huge_pages)
> > > - return;
> > > }
> > > }
> > > }
> >
> > That's an excellent problem description. I'm just a bit hazy on how the
> > patch fixes it. :)
> >
> > What is the actual error in this loop? The fact that we can go trying
> > to free pages when the count is actually OK?
>
> The above hunk serves only to change the behavior of try_to_free_low()
> so that rather than always freeing _at_least_ one huge page, it can
> return without having freed any pages.
OK, that makes sense. Can you include that in the patch description?
> > BTW, try_to_free_low(count) kinda sucks for a function name. Is that
> > count the number of pages we're trying to end up with, or the total
> > number of low pages that we're trying to free?
>
> I agree the name sucks, but this is a bugfix patch.
Oh, yeah, none of what I was suggesting belongs in this patch. But,
they'd make good followups. I was just trying to get you to look at
what caused that bug to be introduced in the first place.
> > We could rewrite the original max() line this way:
> >
> > if (resv_huge_pages > nr_of_pages_to_end_up_with)
> > nr_of_pages_to_end_up_with = resv_huge_pages;
> > try_to_make_the_total_nr_of_hpages(nr_of_pages_to_end_up_with);
> >
> > Which makes it more clear that you're setting the number of total pages
> > to the number of reserved pages, which is obviously screwy.
> >
> > OK, so this is actually saying: "count can never go below
> > resv_huge_pages+nr_huge_pages"?
>
> Not quite. Count can never go below the number of reserved pages plus
> pages allocated to MAP_PRIVATE mappings. That number is computed by:
> (resv + (total - free)).
So, (total - free) equals the number of MAP_PRIVATE pages? Does that
imply that all reserved pages are shared and that all shared pages are
reserved?
This would be clearer even with
static inline int nr_map_private_hpages(void)
{
return total - free;
}
Especially, if we could get the "shared" name somewhere into the
reserved variable.
It makes a whole ton of sense to see:
int total_in_use = nr_shared() + nr_private();
if (nr_requested_pages < total_in_use)
uh_oh_cant_do_that();
That'd be a wonderful cleanup.
> > Could we change try_to_free_low() to free a distinct number of pages?
> >
> > if (count > free_huge_pages)
> > count = free_huge_pages;
> > try_to_free_nr_huge_pages(count);
> >
> > I feel a bit sketchy about the "resv_huge_pages + nr_huge_pages -
> > free_huge_pages" logic. Could you elaborate a bit there on what the
> > rules are?
>
> The key is that we don't want to shrink the pool below the number of
> pages we are committed to keeping around. Before this patch, we only
> accounted for the pages we plan to hand out (reserved huge pages) but
> not the ones we've already handed out (total - free). Does that make
> sense?
Yes. I'm just starting to get the idea that pages are committed in
different ways. So, it makes good sense to have a nice function like
total_committed_hpages(), which explains all of this, and is called to
compare agaist the evil "count" variable. :)
-- Dave
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] hugetlb: Fix pool resizing corner case
2007-10-03 18:59 ` Dave Hansen
@ 2007-10-03 19:08 ` Ken Chen
-1 siblings, 0 replies; 12+ messages in thread
From: Ken Chen @ 2007-10-03 19:08 UTC (permalink / raw)
To: Dave Hansen; +Cc: Adam Litke, Andrew Morton, linux-mm, linux-kernel
On 10/3/07, Dave Hansen <haveblue@us.ibm.com> wrote:
> > Not quite. Count can never go below the number of reserved pages plus
> > pages allocated to MAP_PRIVATE mappings. That number is computed by:
> > (resv + (total - free)).
>
> So, (total - free) equals the number of MAP_PRIVATE pages? Does that
> imply that all reserved pages are shared and that all shared pages are
> reserved?
no, not quite. In-use huge page (total - free) can be both private or
shared. resv_huge_pages counts number of pages that is committed for
shared mapping, but not yet faulted in.
What the equation does essentially is: resv_huge_pages + nr-huge-pages-in-use.
- Ken
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hugetlb: Fix pool resizing corner case
@ 2007-10-03 19:08 ` Ken Chen
0 siblings, 0 replies; 12+ messages in thread
From: Ken Chen @ 2007-10-03 19:08 UTC (permalink / raw)
To: Dave Hansen; +Cc: Adam Litke, Andrew Morton, linux-mm, linux-kernel
On 10/3/07, Dave Hansen <haveblue@us.ibm.com> wrote:
> > Not quite. Count can never go below the number of reserved pages plus
> > pages allocated to MAP_PRIVATE mappings. That number is computed by:
> > (resv + (total - free)).
>
> So, (total - free) equals the number of MAP_PRIVATE pages? Does that
> imply that all reserved pages are shared and that all shared pages are
> reserved?
no, not quite. In-use huge page (total - free) can be both private or
shared. resv_huge_pages counts number of pages that is committed for
shared mapping, but not yet faulted in.
What the equation does essentially is: resv_huge_pages + nr-huge-pages-in-use.
- Ken
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] hugetlb: Fix pool resizing corner case
2007-10-03 18:33 ` Adam Litke
@ 2007-10-03 19:03 ` Ken Chen
-1 siblings, 0 replies; 12+ messages in thread
From: Ken Chen @ 2007-10-03 19:03 UTC (permalink / raw)
To: Adam Litke; +Cc: Dave Hansen, Andrew Morton, linux-mm, linux-kernel
On 10/3/07, Adam Litke <agl@us.ibm.com> wrote:
> The key is that we don't want to shrink the pool below the number of
> pages we are committed to keeping around. Before this patch, we only
> accounted for the pages we plan to hand out (reserved huge pages) but
> not the ones we've already handed out (total - free). Does that make
> sense?
Good catch, adam.
>From what I can see, the statement
if (count >= nr_huge_pages)
return nr_huge_pages;
in set_max_huge_pages() is useless because (1) we recalculate "count"
variable below it; and (2) both try_to_free_low() and the while loop
below the call to try_to_free_low() will terminate correctly. If you
feel like it, please clean it up as well.
If not, I'm fine with that.
Acked-by: Ken Chen <kenchen@google.com>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH] hugetlb: Fix pool resizing corner case
@ 2007-10-03 19:03 ` Ken Chen
0 siblings, 0 replies; 12+ messages in thread
From: Ken Chen @ 2007-10-03 19:03 UTC (permalink / raw)
To: Adam Litke; +Cc: Dave Hansen, Andrew Morton, linux-mm, linux-kernel
On 10/3/07, Adam Litke <agl@us.ibm.com> wrote:
> The key is that we don't want to shrink the pool below the number of
> pages we are committed to keeping around. Before this patch, we only
> accounted for the pages we plan to hand out (reserved huge pages) but
> not the ones we've already handed out (total - free). Does that make
> sense?
Good catch, adam.
>From what I can see, the statement
if (count >= nr_huge_pages)
return nr_huge_pages;
in set_max_huge_pages() is useless because (1) we recalculate "count"
variable below it; and (2) both try_to_free_low() and the while loop
below the call to try_to_free_low() will terminate correctly. If you
feel like it, please clean it up as well.
If not, I'm fine with that.
Acked-by: Ken Chen <kenchen@google.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 12+ messages in thread