From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id 142E16B005C for ; Wed, 17 Jun 2009 13:16:33 -0400 (EDT) Subject: Re: [PATCH 1/5] Free huge pages round robin to balance across nodes From: Lee Schermerhorn In-Reply-To: <20090617131833.GG28529@csn.ul.ie> References: <20090616135228.25248.22018.sendpatchset@lts-notebook> <20090616135236.25248.93692.sendpatchset@lts-notebook> <20090617131833.GG28529@csn.ul.ie> Content-Type: text/plain Date: Wed, 17 Jun 2009 13:16:58 -0400 Message-Id: <1245259018.6235.59.camel@lts-notebook> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org To: Mel Gorman Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Nishanth Aravamudan , Adam Litke , Andy Whitcroft , eric.whitney@hp.com List-ID: On Wed, 2009-06-17 at 14:18 +0100, Mel Gorman wrote: > On Tue, Jun 16, 2009 at 09:52:36AM -0400, Lee Schermerhorn wrote: > > [PATCH 1/5] Free huge pages round robin to balance across nodes > > > > Against: 17may09 mmotm > > > > Currently, altho' increasing nr_hugepages will [attempt to] > > distribute the new huge pages across all nodes in the system, > > reducing nr_hugepages will free or surplus all free pages > > from nodes in node id order. This patch frees huges pages > > from nodes in round robin fashion in an attempt to keep > > [persistent] hugepage allocates balanced across the nodes. > > > > New function free_pool_huge_page() is modeled on and > > performs roughly the inverse of alloc_fresh_huge_page(). > > Replaces dequeue_huge_page() which now has no callers > > and can be removed. > > > > Helper function hstate_next_to_free_node() uses new hstate > > member next_to_free_nid to distribute "frees" across all > > nodes with huge pages. > > > > I placed this patch first in the series because I think it > > [or something similar] should be applied independent of the > > rest of the series. > > > > Agreed. Reading though, I can't see any problems with the patch and it > does make the freeing symmetric with the allocation. > > For clarity though, would it be worth renaming hugetlb_next_nid to > next_to_alloc_nid so that there is a clear relationship in the > round-robin allocation and freeing of pages amoung online nodes? OK. I'll do that in the next version of the series. > > > Signed-off-by: Lee Schermerhorn > > > > include/linux/hugetlb.h | 1 > > mm/hugetlb.c | 68 +++++++++++++++++++++++++++++++++--------------- > > 2 files changed, 48 insertions(+), 21 deletions(-) > > > > Index: linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h > > =================================================================== > > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/include/linux/hugetlb.h 2009-06-04 12:59:29.000000000 -0400 > > +++ linux-2.6.30-rc8-mmotm-090603-1633/include/linux/hugetlb.h 2009-06-04 12:59:31.000000000 -0400 > > @@ -184,6 +184,7 @@ unsigned long hugetlb_get_unmapped_area( > > /* Defines one hugetlb page size */ > > struct hstate { > > int hugetlb_next_nid; > > + int next_to_free_nid; > > unsigned int order; > > unsigned long mask; > > unsigned long max_huge_pages; > > Index: linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c > > =================================================================== > > --- linux-2.6.30-rc8-mmotm-090603-1633.orig/mm/hugetlb.c 2009-06-04 12:59:29.000000000 -0400 > > +++ linux-2.6.30-rc8-mmotm-090603-1633/mm/hugetlb.c 2009-06-04 12:59:31.000000000 -0400 > > @@ -455,24 +455,6 @@ static void enqueue_huge_page(struct hst > > h->free_huge_pages_node[nid]++; > > } > > > > -static struct page *dequeue_huge_page(struct hstate *h) > > -{ > > - int nid; > > - struct page *page = NULL; > > - > > - for (nid = 0; nid < MAX_NUMNODES; ++nid) { > > - if (!list_empty(&h->hugepage_freelists[nid])) { > > - page = list_entry(h->hugepage_freelists[nid].next, > > - struct page, lru); > > - list_del(&page->lru); > > - h->free_huge_pages--; > > - h->free_huge_pages_node[nid]--; > > - break; > > - } > > - } > > - return page; > > -} > > - > > static struct page *dequeue_huge_page_vma(struct hstate *h, > > struct vm_area_struct *vma, > > unsigned long address, int avoid_reserve) > > @@ -683,6 +665,52 @@ static int alloc_fresh_huge_page(struct > > return ret; > > } > > > > +/* > > + * helper for free_pool_huge_page() - find next node > > + * from which to free a huge page > > + */ > > +static int hstate_next_to_free_node(struct hstate *h) > > +{ > > + int next_nid; > > + next_nid = next_node(h->next_to_free_nid, node_online_map); > > + if (next_nid == MAX_NUMNODES) > > + next_nid = first_node(node_online_map); > > + h->next_to_free_nid = next_nid; > > + return next_nid; > > +} > > + > > +/* > > + * Free huge page from pool from next node to free. > > + * Attempt to keep persistent huge pages more or less > > + * balanced over allowed nodes. > > + * Called with hugetlb_lock locked. > > + */ > > +static int free_pool_huge_page(struct hstate *h) > > +{ > > + int start_nid; > > + int nid; > > + int ret = 0; > > + > > + start_nid = h->next_to_free_nid; > > + nid = h->next_to_free_nid; > > + > > + do { > > + if (!list_empty(&h->hugepage_freelists[nid])) { > > + struct page *page = > > + list_entry(h->hugepage_freelists[nid].next, > > + struct page, lru); > > + list_del(&page->lru); > > + h->free_huge_pages--; > > + h->free_huge_pages_node[nid]--; > > + update_and_free_page(h, page); > > + ret = 1; > > + } > > + nid = hstate_next_to_free_node(h); > > + } while (!ret && nid != start_nid); > > + > > + return ret; > > +} > > + > > static struct page *alloc_buddy_huge_page(struct hstate *h, > > struct vm_area_struct *vma, unsigned long address) > > { > > @@ -1226,10 +1254,8 @@ static unsigned long set_max_huge_pages( > > min_count = max(count, min_count); > > try_to_free_low(h, min_count); > > while (min_count < persistent_huge_pages(h)) { > > - struct page *page = dequeue_huge_page(h); > > - if (!page) > > + if (!free_pool_huge_page(h)) > > break; > > - update_and_free_page(h, page); > > } > > while (count < persistent_huge_pages(h)) { > > if (!adjust_pool_surplus(h, 1)) > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org