From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e36.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id m3ELAImX022378 for ; Mon, 14 Apr 2008 17:10:18 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m3ELAIdf218280 for ; Mon, 14 Apr 2008 15:10:18 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m3ELAH91006245 for ; Mon, 14 Apr 2008 15:10:18 -0600 Date: Mon, 14 Apr 2008 14:10:17 -0700 From: Nishanth Aravamudan Subject: Re: [RFC][PATCH 2/5] hugetlb: numafy several functions Message-ID: <20080414211017.GC6350@us.ibm.com> References: <20080411234449.GE19078@us.ibm.com> <20080411234712.GF19078@us.ibm.com> <1208184770.17385.93.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1208184770.17385.93.camel@localhost.localdomain> Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: wli@holomorphy.com, clameter@sgi.com, luick@cray.com, Lee.Schermerhorn@hp.com, linux-mm@kvack.org, npiggin@suse.de List-ID: On 14.04.2008 [09:52:50 -0500], Adam Litke wrote: > > On Fri, 2008-04-11 at 16:47 -0700, Nishanth Aravamudan wrote: > > +#define persistent_huge_pages_node(nid) \ > > + (nr_huge_pages_node[nid] - surplus_huge_pages_node[nid]) > > +static ssize_t hugetlb_write_nr_hugepages_node(struct sys_device *dev, > > + const char *buf, size_t count) > > +{ > > + int nid = dev->id; > > + unsigned long target; > > + unsigned long free_on_other_nodes; > > + unsigned long nr_huge_pages_req = simple_strtoul(buf, NULL, 10); > > + > > + /* > > + * Increase the pool size on the node > > + * First take pages out of surplus state. Then make up the > > + * remaining difference by allocating fresh huge pages. > > + * > > + * We might race with alloc_buddy_huge_page() here and be unable > > + * to convert a surplus huge page to a normal huge page. That is > > + * not critical, though, it just means the overall size of the > > + * pool might be one hugepage larger than it needs to be, but > > + * within all the constraints specified by the sysctls. > > + */ > > + spin_lock(&hugetlb_lock); > > + while (surplus_huge_pages_node[nid] && > > + nr_huge_pages_req > persistent_huge_pages_node(nid)) { > > + if (!adjust_pool_surplus_node(-1, nid)) > > + break; > > + } > > + > > + while (nr_huge_pages_req > persistent_huge_pages_node(nid)) { > > + struct page *ret; > > + /* > > + * If this allocation races such that we no longer need the > > + * page, free_huge_page will handle it by freeing the page > > + * and reducing the surplus. > > + */ > > + spin_unlock(&hugetlb_lock); > > + ret = alloc_fresh_huge_page_node(nid); > > + spin_lock(&hugetlb_lock); > > + if (!ret) > > + goto out; > > + > > + } > > + > > + if (nr_huge_pages_req >= nr_huge_pages_node[nid]) > > + goto out; > > + > > + /* > > + * Decrease the pool size > > + * First return free pages to the buddy allocator (being careful > > + * to keep enough around to satisfy reservations). Then place > > + * pages into surplus state as needed so the pool will shrink > > + * to the desired size as pages become free. > > + * > > + * By placing pages into the surplus state independent of the > > + * overcommit value, we are allowing the surplus pool size to > > + * exceed overcommit. There are few sane options here. Since > > + * alloc_buddy_huge_page() is checking the global counter, > > + * though, we'll note that we're not allowed to exceed surplus > > + * and won't grow the pool anywhere else. Not until one of the > > + * sysctls are changed, or the surplus pages go out of use. > > + */ > > + free_on_other_nodes = free_huge_pages - free_huge_pages_node[nid]; > > + if (free_on_other_nodes >= resv_huge_pages) { > > + /* other nodes can satisfy reserve */ > > + target = nr_huge_pages_req; > > + } else { > > + /* this node needs some free to satisfy reserve */ > > + target = max((resv_huge_pages - free_on_other_nodes), > > + nr_huge_pages_req); > > + } > > + try_to_free_low_node(nid, target); > > + while (target < persistent_huge_pages_node(nid)) { > > + struct page *page = dequeue_huge_page_node(NULL, nid); > > + if (!page) > > + break; > > + update_and_free_page(nid, page); > > + } > > + > > + while (target < persistent_huge_pages_node(nid)) { > > + if (!adjust_pool_surplus_node(1, nid)) > > + break; > > + } > > +out: > > + spin_unlock(&hugetlb_lock); > > + return count; > > +} > > Hmm, this function looks very familiar ;) Is there any way we can > consolidate it with set_max_huge_pages()? Perhaps the new node helpers > from the beginning of this series will help? A good idea. I think I was more worried about getting something wrong if I did that in my first cut after the dynamic pool was merged and just hadn't recombined. I will work on it for the next version, once we've come up with a consensus on the interface's location. Thanks for the review, Nish -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org