From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m4NKme2Z026900 for ; Fri, 23 May 2008 16:48:40 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.7) with ESMTP id m4NKmedg155434 for ; Fri, 23 May 2008 16:48:40 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m4NKmdbP018856 for ; Fri, 23 May 2008 16:48:39 -0400 Date: Fri, 23 May 2008 13:48:37 -0700 From: Nishanth Aravamudan Subject: Re: [patch 04/18] hugetlb: modular state Message-ID: <20080523204837.GG23924@us.ibm.com> References: <20080423015302.745723000@nick.local0.net> <20080423015430.054070000@nick.local0.net> <20080425171346.GB9680@us.ibm.com> <20080523050246.GB13071@wotan.suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20080523050246.GB13071@wotan.suse.de> Sender: owner-linux-mm@kvack.org Return-Path: To: Nick Piggin Cc: akpm@linux-foundation.org, linux-mm@kvack.org, andi@firstfloor.org, kniht@linux.vnet.ibm.com, abh@cray.com, wli@holomorphy.com List-ID: On 23.05.2008 [07:02:47 +0200], Nick Piggin wrote: > On Fri, Apr 25, 2008 at 10:13:46AM -0700, Nishanth Aravamudan wrote: > > On 23.04.2008 [11:53:06 +1000], npiggin@suse.de wrote: > > > Large, but rather mechanical patch that converts most of the hugetlb.c > > > globals into structure members and passes them around. > > > > > > Right now there is only a single global hstate structure, but most of > > > the infrastructure to extend it is there. > > > > While going through the patches as I apply them to 2.6.25-mm1 (as none > > will apply cleanly so far :), I have a few comments. I like this patch > > overall. > > Thanks for all the feedback, and sorry for the delay. I'm just > rebasing things now and getting through all the feedback. > > I really do appreciate the comments and have made a lot of changes > that you've suggested... Great, I'm looking forward to the new series and seeing it get some wider testing in -mm. I'll throw Acks in, when they are posted. Let me also reiterate that your and Andi's work really does make a world of difference for the larger hugetlb userbase. The hstate idea and implementation really do make hugepages a lot more flexible then we were before, and I really applaud you both for the code. > > > /* > > > * Protects updates to hugepage_freelists, nr_huge_pages, and free_huge_pages > > > */ > > > static DEFINE_SPINLOCK(hugetlb_lock); > > > > Not sure if this makes sense or not, but would it be useful to make the > > lock be per-hstate? It is designed to protect the counters and the > > freelists, but those are per-hstate, right? Would need heavy testing, > > but might be useful for varying apps both trying to use different size > > hugepages simultaneously? > > Hmm, sure we could do that. Although obviously it would be another > patchset, and actually I'd be concerned about making hstate the > unit of scalability in hugetlbfs -- a single hstate should be > suffiicently scalable to handle workloads reasonably. > > Good point, but at any rate I guess this patchset isn't the place > to do it. Agreed. > > > int hugetlb_report_meminfo(char *buf) > > > { > > > + struct hstate *h = &global_hstate; > > > return sprintf(buf, > > > "HugePages_Total: %5lu\n" > > > "HugePages_Free: %5lu\n" > > > "HugePages_Rsvd: %5lu\n" > > > "HugePages_Surp: %5lu\n" > > > "Hugepagesize: %5lu kB\n", > > > - nr_huge_pages, > > > - free_huge_pages, > > > - resv_huge_pages, > > > - surplus_huge_pages, > > > - HPAGE_SIZE/1024); > > > + h->nr_huge_pages, > > > + h->free_huge_pages, > > > + h->resv_huge_pages, > > > + h->surplus_huge_pages, > > > + 1UL << (huge_page_order(h) + PAGE_SHIFT - 10)); > > > > "- 10"? I think this should be easier to get at then this? Oh I > > guess it's to get it into kilobytes... Seems kind of odd, but I > > guess it's fine. > > I agree it's not perfect, but I might just leave all these for > a subsequent patchset (or can stick improvements to the end of > this patchset). I can submit a sequence of cleanup patches myself, as well, they shouldn't block your posting. > > > static inline int is_file_hugepages(struct file *file) > > > { > > > if (file->f_op == &hugetlbfs_file_operations) > > > @@ -199,4 +196,71 @@ unsigned long hugetlb_get_unmapped_area( > > > unsigned long flags); > > > #endif /* HAVE_ARCH_HUGETLB_UNMAPPED_AREA */ > > > > > > +#ifdef CONFIG_HUGETLB_PAGE > > > > Why another block of HUGETLB_PAGE? Shouldn't this go at the end of the > > other one? And the !HUGETLB_PAGE within the corresponding #else? > > Hmm, possibly. As has been noted, the CONFIG_ things are a bit > broken, and they should just get merged into one. I'll steer > clear of that area for the moment, as everything is working now, > but consolidating the options and cleaning things up would be > a good idea. Yep, I'll add this as a tail-cleanup. Perhaps part of the overarching one of just getting rid of CONFIG_HUGETLBFS or CONFIG_HUGETLB_PAGE (have one config option, not two, since they are mutually dependent). > > > + > > > +/* Defines one hugetlb page size */ > > > +struct hstate { > > > + int hugetlb_next_nid; > > > + unsigned int order; > > > > Which is actually a shift, too, right? So why not just call it that? No > > function should be direclty accessing these members, so the function > > name indicates how the shift is being used? > > I don't feel strongly. If you really do, then I guess it could be > changed. > > > > > + unsigned long mask; > > > + unsigned long max_huge_pages; > > > + unsigned long nr_huge_pages; > > > + unsigned long free_huge_pages; > > > + unsigned long resv_huge_pages; > > > + unsigned long surplus_huge_pages; > > > + unsigned long nr_overcommit_huge_pages; > > > + struct list_head hugepage_freelists[MAX_NUMNODES]; > > > + unsigned int nr_huge_pages_node[MAX_NUMNODES]; > > > + unsigned int free_huge_pages_node[MAX_NUMNODES]; > > > + unsigned int surplus_huge_pages_node[MAX_NUMNODES]; > > > +}; > > > + > > > +extern struct hstate global_hstate; > > > + > > > +static inline struct hstate *hstate_vma(struct vm_area_struct *vma) > > > +{ > > > + return &global_hstate; > > > +} > > > > After having looked at this functions while reviewing, it does seem like > > it might be more intuitive to ready vma_hstate ("vma's hstate") rather > > than hstate_vma ("hstate's vma"?). But your call. > > Again I don't feel strongly. Hstate prefix has some upsides. I think you can leave both as is. Thanks, Nish -- Nishanth Aravamudan IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org