From: Mel Gorman <mel@csn.ul.ie>
To: Adam Litke <agl@us.ibm.com>
Cc: linux-mm@kvack.org, dean@arctic.org,
linux-kernel@vger.kernel.org, wli@holomorphy.com,
dwg@au1.ibm.com, andi@firstfloor.org, kenchen@google.com,
apw@shadowen.org
Subject: Re: [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork()
Date: Fri, 16 May 2008 13:11:13 +0100 [thread overview]
Message-ID: <20080516121113.GB2637@csn.ul.ie> (raw)
In-Reply-To: <1210798520.19507.53.camel@localhost.localdomain>
On (14/05/08 20:55), Adam Litke didst pronounce:
> While not perfect, these patches definitely provide improved semantics
> over what we currently have. This is the best we can do in an
> environment where swapping and/or page size demotion are not
> available.
>
And whatever about page size demotion in the future, I don't think we
want to go down the swapping route any time soon. 16MB huge pages would
be one thing but the 1GB pages would be ruinous.
> I don't see anything that would impact performance here and the patches
> pass basic functional testing and work fine with a benchmark I use for
> both functional and performance verification.
>
Thanks.
> On Wed, 2008-05-07 at 20:39 +0100, Mel Gorman wrote:
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/include/linux/hugetlb.h linux-2.6.25-mm1-0020-map_private_reserve/include/linux/hugetlb.h
> > --- linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/include/linux/hugetlb.h 2008-04-22 10:30:03.000000000 +0100
> > +++ linux-2.6.25-mm1-0020-map_private_reserve/include/linux/hugetlb.h 2008-05-07 18:29:27.000000000 +0100
> > @@ -17,6 +17,7 @@ static inline int is_vm_hugetlb_page(str
> > return vma->vm_flags & VM_HUGETLB;
> > }
> >
> > +void reset_vma_resv_huge_pages(struct vm_area_struct *vma);
> > int hugetlb_sysctl_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
> > int hugetlb_overcommit_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
> > int hugetlb_treat_movable_handler(struct ctl_table *, int, struct file *, void __user *, size_t *, loff_t *);
> > @@ -30,7 +31,8 @@ int hugetlb_report_node_meminfo(int, cha
> > unsigned long hugetlb_total_pages(void);
> > int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> > unsigned long address, int write_access);
> > -int hugetlb_reserve_pages(struct inode *inode, long from, long to);
> > +int hugetlb_reserve_pages(struct inode *inode, long from, long to,
> > + struct vm_area_struct *vma);
> > void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed);
> >
> > extern unsigned long max_huge_pages;
> > @@ -58,6 +60,11 @@ static inline int is_vm_hugetlb_page(str
> > {
> > return 0;
> > }
> > +
> > +static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
> > +{
> > +}
>
> This is a curious convention. What purpose do the open and close braces
> serve? Is this the syntax for saying "please inline me but find the
> real definition of this function elsewhere"? You already declared it
> further above in hugetlb.h so I am confused.
This is the !CONFIG_HUGETLB_PAGE version where it's a no-op.
> > static inline unsigned long hugetlb_total_pages(void)
> > {
> > return 0;
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/kernel/fork.c linux-2.6.25-mm1-0020-map_private_reserve/kernel/fork.c
> > --- linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/kernel/fork.c 2008-04-22 10:30:04.000000000 +0100
> > +++ linux-2.6.25-mm1-0020-map_private_reserve/kernel/fork.c 2008-05-07 18:29:27.000000000 +0100
> > @@ -53,6 +53,7 @@
> > #include <linux/tty.h>
> > #include <linux/proc_fs.h>
> > #include <linux/blkdev.h>
> > +#include <linux/hugetlb.h>
> >
> > #include <asm/pgtable.h>
> > #include <asm/pgalloc.h>
> > @@ -282,6 +283,14 @@ static int dup_mmap(struct mm_struct *mm
> > }
> >
> > /*
> > + * Clear hugetlb-related page reserves for children. This only
> > + * affects MAP_PRIVATE mappings. Faults generated by the child
> > + * are not guaranteed to succeed, even if read-only
> > + */
> > + if (is_vm_hugetlb_page(tmp))
> > + reset_vma_resv_huge_pages(tmp);
> > +
> > + /*
> > * Link in the new vma and copy the page table entries.
> > */
> > *pprev = tmp;
> > diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c linux-2.6.25-mm1-0020-map_private_reserve/mm/hugetlb.c
> > --- linux-2.6.25-mm1-0010-move-hugetlb_acct_memory/mm/hugetlb.c 2008-05-07 18:28:28.000000000 +0100
> > +++ linux-2.6.25-mm1-0020-map_private_reserve/mm/hugetlb.c 2008-05-07 18:39:34.000000000 +0100
> > @@ -40,6 +40,45 @@ static int hugetlb_next_nid;
> > */
> > static DEFINE_SPINLOCK(hugetlb_lock);
> >
> > +/*
> > + * These three helpers are used to track how many pages are reserved for
> > + * faults in a MAP_PRIVATE mapping. Only the process that called mmap()
> > + * is guaranteed to have their future faults succeed
> > + */
>
> Should we mention what lock(s) are required when manipulating
> vm_private_data?
I was depending on the existing locking for reserve pages to cover the
manipulation of private data - i.e. the hugetlb_lock.
> What do you think is the actual controlling lock here?
> This all looks safe to me (especially considering the
> hugetlb_instantiation_mutex) but I wouldn't mind identifying a
> finer-grained lock. I would say hugetlb_lock makes sense given the new
> way we consume resv_huge_pages in ???decrement_hugepage_resv_vma().
> Thoughts?
It's already depending on the hugetlb_lock because that is what is
required for resv_huge_pages and is the lock taken on those paths. Is
there something I am missing?
>
> > +static unsigned long vma_resv_huge_pages(struct vm_area_struct *vma)
> > +{
> > + VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > + if (!(vma->vm_flags & VM_SHARED))
> > + return (unsigned long)vma->vm_private_data;
> > + return 0;
> > +}
>
> > +static void adjust_vma_resv_huge_pages(struct vm_area_struct *vma, int delta)
> > +{
> > + unsigned long reserve;
> > + VM_BUG_ON(vma->vm_flags & VM_SHARED);
> > + VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > +
> > + reserve = (unsigned long)vma->vm_private_data + delta;
> > + vma->vm_private_data = (void *)reserve;
> > +}
> > +
> > +void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
> > +{
> > + VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > + if (!(vma->vm_flags & VM_SHARED))
> > + vma->vm_private_data = (void *)0;
> > +}
> > +
> > +static void set_vma_resv_huge_pages(struct vm_area_struct *vma,
> > + unsigned long reserve)
> > +{
> > + VM_BUG_ON(!is_vm_hugetlb_page(vma));
> > + VM_BUG_ON(vma->vm_flags & VM_SHARED);
> > +
> > + vma->vm_private_data = (void *)reserve;
> > +}
> > +
> > static void clear_huge_page(struct page *page, unsigned long addr)
> > {
> > int i;
> > @@ -89,6 +128,24 @@ static struct page *dequeue_huge_page(vo
> > return page;
> > }
> >
> > +/* Decrement the reserved pages in the hugepage pool by one */
> > +static void decrement_hugepage_resv_vma(struct vm_area_struct *vma)
> > +{
> > + if (vma->vm_flags & VM_SHARED) {
> > + /* Shared mappings always use reserves */
> > + resv_huge_pages--;
> > + } else {
> > + /*
> > + * Only the process that called mmap() has reserves for
> > + * private mappings.
> > + */
> > + if (vma_resv_huge_pages(vma)) {
> > + resv_huge_pages--;
> > + adjust_vma_resv_huge_pages(vma, -1);
> > + }
> > + }
> > +}
>
> > static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma,
> > unsigned long address)
> > {
> > @@ -101,6 +158,16 @@ static struct page *dequeue_huge_page_vm
> > struct zone *zone;
> > struct zoneref *z;
> >
> > + /*
> > + * A child process with MAP_PRIVATE mappings created by their parent
> > + * have no page reserves. This check ensures that reservations are
> > + * not "stolen". The child may still get SIGKILLed
> > + */
> > + if (!(vma->vm_flags & VM_SHARED) &&
> > + !vma_resv_huge_pages(vma) &&
> > + free_huge_pages - resv_huge_pages == 0)
> > + return NULL;
> > +
> > for_each_zone_zonelist_nodemask(zone, z, zonelist,
> > MAX_NR_ZONES - 1, nodemask) {
> > nid = zone_to_nid(zone);
> > @@ -111,8 +178,8 @@ static struct page *dequeue_huge_page_vm
> > list_del(&page->lru);
> > free_huge_pages--;
> > free_huge_pages_node[nid]--;
> > - if (vma && vma->vm_flags & VM_MAYSHARE)
> > - resv_huge_pages--;
> > + decrement_hugepage_resv_vma(vma);
> > +
> > break;
> > }
> > }
> > @@ -461,55 +528,41 @@ static void return_unused_surplus_pages(
> > }
> > }
> >
> > -
> > -static struct page *alloc_huge_page_shared(struct vm_area_struct *vma,
> > - unsigned long addr)
> > +static struct page *alloc_huge_page(struct vm_area_struct *vma,
> > + unsigned long addr)
> > {
> > struct page *page;
> > + struct address_space *mapping = vma->vm_file->f_mapping;
> > + struct inode *inode = mapping->host;
> > + unsigned int chg = 0;
> > +
> > + /*
> > + * Private mappings belonging to a child will not have their own
> > + * reserves, nor will they have taken their quota. Check that
> > + * the quota can be made before satisfying the allocation
> > + */
> > + if (!(vma->vm_flags & VM_SHARED) &&
> > + !vma_resv_huge_pages(vma)) {
>
> I wonder if we should have a helper for this since I've now seen the
> same line-wrapped if-statement twice. How about something like:
>
> int hugetlb_resv_available(struct vm_area_struct *vma) {
> if (vma->vm_flags & VM_SHARED)
> return 1;
> else if (vma_resv_huge_pages(vma))
> return 1;
> return 0;
> }
>
I'll add a helper like this in the next version.
> > + chg = 1;
> > + if (hugetlb_get_quota(inode->i_mapping, chg))
> > + return ERR_PTR(-ENOSPC);
> > + }
> >
> > spin_lock(&hugetlb_lock);
> > page = dequeue_huge_page_vma(vma, addr);
> > spin_unlock(&hugetlb_lock);
> > - return page ? page : ERR_PTR(-VM_FAULT_OOM);
> > -}
> > -
> > -static struct page *alloc_huge_page_private(struct vm_area_struct *vma,
> > - unsigned long addr)
> > -{
> > - struct page *page = NULL;
> >
> > - if (hugetlb_get_quota(vma->vm_file->f_mapping, 1))
> > - return ERR_PTR(-VM_FAULT_SIGBUS);
> > -
> > - spin_lock(&hugetlb_lock);
> > - if (free_huge_pages > resv_huge_pages)
> > - page = dequeue_huge_page_vma(vma, addr);
> > - spin_unlock(&hugetlb_lock);
> > if (!page) {
> > page = alloc_buddy_huge_page(vma, addr);
> > if (!page) {
> > - hugetlb_put_quota(vma->vm_file->f_mapping, 1);
> > + hugetlb_put_quota(inode->i_mapping, chg);
> > return ERR_PTR(-VM_FAULT_OOM);
> > }
> > }
> > - return page;
> > -}
> >
> > -static struct page *alloc_huge_page(struct vm_area_struct *vma,
> > - unsigned long addr)
> > -{
> > - struct page *page;
> > - struct address_space *mapping = vma->vm_file->f_mapping;
> > -
> > - if (vma->vm_flags & VM_MAYSHARE)
> > - page = alloc_huge_page_shared(vma, addr);
> > - else
> > - page = alloc_huge_page_private(vma, addr);
> > + set_page_refcounted(page);
> > + set_page_private(page, (unsigned long) mapping);
> >
> > - if (!IS_ERR(page)) {
> > - set_page_refcounted(page);
> > - set_page_private(page, (unsigned long) mapping);
> > - }
> > return page;
> > }
> >
>
Thanks
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2008-05-16 12:11 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-07 19:38 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v2 Mel Gorman
2008-05-07 19:38 ` [PATCH 1/3] Move hugetlb_acct_memory() Mel Gorman
2008-05-07 19:39 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
2008-05-14 20:55 ` Adam Litke
2008-05-16 12:11 ` Mel Gorman [this message]
2008-05-07 19:39 ` [PATCH 3/3] Guarantee that COW faults for a process that called mmap(MAP_PRIVATE) on hugetlbfs will succeed Mel Gorman
2008-05-14 20:55 ` Adam Litke
2008-05-16 12:15 ` Mel Gorman
2008-05-08 1:48 ` [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v2 David Gibson
2008-05-08 6:56 ` Mel Gorman
2008-05-08 11:14 ` Andy Whitcroft
2008-05-09 0:02 ` David Gibson
2008-05-09 13:30 ` Mel Gorman
2008-05-13 18:12 ` Andrew Hastings
-- strict thread matches above, loose matches on Subject: below --
2008-05-20 16:28 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v3 Mel Gorman
2008-05-20 16:29 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
2008-05-27 18:50 [PATCH 0/3] Guarantee faults for processes that call mmap(MAP_PRIVATE) on hugetlbfs v4 Mel Gorman
2008-05-27 18:51 ` [PATCH 2/3] Reserve huge pages for reliable MAP_PRIVATE hugetlbfs mappings until fork() Mel Gorman
2008-05-28 13:52 ` Adam Litke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20080516121113.GB2637@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=agl@us.ibm.com \
--cc=andi@firstfloor.org \
--cc=apw@shadowen.org \
--cc=dean@arctic.org \
--cc=dwg@au1.ibm.com \
--cc=kenchen@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=wli@holomorphy.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).