From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e32.co.us.ibm.com (8.12.11.20060308/8.13.8) with ESMTP id l9OCEYJD007484 for ; Wed, 24 Oct 2007 08:14:34 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9ODNagr071942 for ; Wed, 24 Oct 2007 07:23:41 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9ODNaqM029902 for ; Wed, 24 Oct 2007 07:23:36 -0600 From: Adam Litke Subject: [PATCH 0/3] hugetlb: Fix up filesystem quota accounting Date: Wed, 24 Oct 2007 06:23:35 -0700 Message-Id: <20071024132335.13013.76227.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: Hugetlbfs implements a quota system which can limit the amount of memory that can be used by the filesystem. Before allocating a new huge page for a file, the quota is checked and debited. The quota is then credited when truncating the file. I found a few bugs in the code for both MAP_PRIVATE and MAP_SHARED mappings. Before detailing the problems and my proposed solutions, we should agree on a definition of quotas that properly addresses both private and shared pages. Since the purpose of quotas is to limit total memory consumption on a per-filesystem basis, I argue that all pages allocated by the fs (private and shared) should be charged against quota. Private Mappings ================ The current code will debit quota for private pages sometimes, but will never credit it. At a minimum, this causes a leak in the quota accounting which renders the accounting essentially useless as it is. Shared pages have a one to one mapping with a hugetlbfs file and are easy to account by debiting on allocation and crediting on truncate. Private pages are anonymous in nature and have a many to one relationship with their hugetlbfs files (due to copy on write). Because private pages are not indexed by the mapping's radix tree, thier quota cannot be credited at file truncation time. Crediting must be done when the page is unmapped and freed. Shared Pages ============ I discovered an issue concerning the interaction between the MAP_SHARED reservation system and quotas. Since quota is not checked until page instantiation, an over-quota mmap/reservation will initially succeed. When instantiating the first over-quota page, the program will receive SIGBUS. This is inconsistent since the reservation is supposed to be a guarantee. The solution is to debit the full amount of quota at reservation time and credit the unused portion when the reservation is released. This patch series brings quotas back in line by making the following modifications: - Debit quota when allocating a COW page - Credit MAP_PRIVATE pages at unmap time (shared pages continue to be credited during truncation) - Debit entire amount of quota at shared mmap reservation time -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9ODNmgR007667 for ; Wed, 24 Oct 2007 09:23:48 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9ODNmTB138710 for ; Wed, 24 Oct 2007 09:23:48 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9ODNlUo023476 for ; Wed, 24 Oct 2007 09:23:48 -0400 From: Adam Litke Subject: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management Date: Wed, 24 Oct 2007 06:23:45 -0700 Message-Id: <20071024132345.13013.36192.stgit@kernel> In-Reply-To: <20071024132335.13013.76227.stgit@kernel> References: <20071024132335.13013.76227.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: The hugetlbfs quota management system was never taught to handle MAP_PRIVATE mappings when that support was added. Currently, quota is debited at page instantiation and credited at file truncation. This approach works correctly for shared pages but is incomplete for private pages. In addition to hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but this function does not respect quotas. Private huge pages are treated very much like normal, anonymous pages. They are not "backed" by the hugetlbfs file and are not stored in the mapping's radix tree. This means that private pages are invisible to truncate_hugepages() so that function will not credit the quota. This patch teaches the unmap path how to release fs quota analogous to truncate_hugepages(). If __unmap_hugepage_range() clears the last page reference it will credit the corresponding fs quota before calling the page dtor. This will catch pages that were mapped MAP_PRIVATE only as the file will always hold the last reference on a MAP_SHARED page. hugetlb_cow() is also updated to charge against quota before allocating the new page. Signed-off-by: Adam Litke Signed-off-by: Andy Whitcroft --- mm/hugetlb.c | 14 +++++++++++++- 1 files changed, 13 insertions(+), 1 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ae2959b..0d645ca 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -685,7 +685,17 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, flush_tlb_range(vma, start, end); list_for_each_entry_safe(page, tmp, &page_list, lru) { list_del(&page->lru); - put_page(page); + if (put_page_testzero(page)) { + /* + * When releasing the last reference to a page we must + * credit the quota. For MAP_PRIVATE pages this occurs + * when the last PTE is cleared, for MAP_SHARED pages + * this occurs when the file is truncated. + */ + VM_BUG_ON(PageMapping(page)); + hugetlb_put_quota(vma->vm_file->f_mapping); + free_huge_page(page); + } } } @@ -722,6 +732,8 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, set_huge_ptep_writable(vma, address, ptep); return 0; } + if (hugetlb_get_quota(vma->vm_file->f_mapping)) + return VM_FAULT_SIGBUS; page_cache_get(old_page); new_page = alloc_huge_page(vma, address); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9ODNxtx008097 for ; Wed, 24 Oct 2007 09:23:59 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9ODNxVK113818 for ; Wed, 24 Oct 2007 09:23:59 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9ODNxDR005277 for ; Wed, 24 Oct 2007 09:23:59 -0400 From: Adam Litke Subject: [PATCH 2/3] hugetlb: Allow bulk updating in hugetlb_*_quota() Date: Wed, 24 Oct 2007 06:23:57 -0700 Message-Id: <20071024132357.13013.67944.stgit@kernel> In-Reply-To: <20071024132335.13013.76227.stgit@kernel> References: <20071024132335.13013.76227.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to allow bulk updating of the sbinfo->free_blocks counter. This will be used by the next patch in the series. Signed-off-by: Adam Litke --- fs/hugetlbfs/inode.c | 12 ++++++------ include/linux/hugetlb.h | 4 ++-- mm/hugetlb.c | 12 ++++++------ 3 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 04598e1..df15dee 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -364,7 +364,7 @@ static void truncate_hugepages(struct inode *inode, loff_t lstart) ++next; truncate_huge_page(page); unlock_page(page); - hugetlb_put_quota(mapping); + hugetlb_put_quota(mapping, 1); freed++; } huge_pagevec_release(&pvec); @@ -859,15 +859,15 @@ out_free: return -ENOMEM; } -int hugetlb_get_quota(struct address_space *mapping) +int hugetlb_get_quota(struct address_space *mapping, long delta) { int ret = 0; struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); if (sbinfo->free_blocks > -1) { spin_lock(&sbinfo->stat_lock); - if (sbinfo->free_blocks > 0) - sbinfo->free_blocks--; + if (sbinfo->free_blocks - delta >= 0) + sbinfo->free_blocks -= delta; else ret = -ENOMEM; spin_unlock(&sbinfo->stat_lock); @@ -876,13 +876,13 @@ int hugetlb_get_quota(struct address_space *mapping) return ret; } -void hugetlb_put_quota(struct address_space *mapping) +void hugetlb_put_quota(struct address_space *mapping, long delta) { struct hugetlbfs_sb_info *sbinfo = HUGETLBFS_SB(mapping->host->i_sb); if (sbinfo->free_blocks > -1) { spin_lock(&sbinfo->stat_lock); - sbinfo->free_blocks++; + sbinfo->free_blocks += delta; spin_unlock(&sbinfo->stat_lock); } } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ea0f50b..770dbed 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -165,8 +165,8 @@ static inline struct hugetlbfs_sb_info *HUGETLBFS_SB(struct super_block *sb) extern const struct file_operations hugetlbfs_file_operations; extern struct vm_operations_struct hugetlb_vm_ops; struct file *hugetlb_file_setup(const char *name, size_t); -int hugetlb_get_quota(struct address_space *mapping); -void hugetlb_put_quota(struct address_space *mapping); +int hugetlb_get_quota(struct address_space *mapping, long delta); +void hugetlb_put_quota(struct address_space *mapping, long delta); static inline int is_file_hugepages(struct file *file) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 0d645ca..eaade8c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -693,7 +693,7 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, * this occurs when the file is truncated. */ VM_BUG_ON(PageMapping(page)); - hugetlb_put_quota(vma->vm_file->f_mapping); + hugetlb_put_quota(vma->vm_file->f_mapping, 1); free_huge_page(page); } } @@ -732,7 +732,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, set_huge_ptep_writable(vma, address, ptep); return 0; } - if (hugetlb_get_quota(vma->vm_file->f_mapping)) + if (hugetlb_get_quota(vma->vm_file->f_mapping, 1)) return VM_FAULT_SIGBUS; page_cache_get(old_page); @@ -784,11 +784,11 @@ retry: size = i_size_read(mapping->host) >> HPAGE_SHIFT; if (idx >= size) goto out; - if (hugetlb_get_quota(mapping)) + if (hugetlb_get_quota(mapping, 1)) goto out; page = alloc_huge_page(vma, address); if (!page) { - hugetlb_put_quota(mapping); + hugetlb_put_quota(mapping, 1); ret = VM_FAULT_OOM; goto out; } @@ -800,7 +800,7 @@ retry: err = add_to_page_cache(page, mapping, idx, GFP_KERNEL); if (err) { put_page(page); - hugetlb_put_quota(mapping); + hugetlb_put_quota(mapping, 1); if (err == -EEXIST) goto retry; goto out; @@ -834,7 +834,7 @@ out: backout: spin_unlock(&mm->page_table_lock); - hugetlb_put_quota(mapping); + hugetlb_put_quota(mapping, 1); unlock_page(page); put_page(page); goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9ODOBis005256 for ; Wed, 24 Oct 2007 09:24:11 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9ODOB4J113576 for ; Wed, 24 Oct 2007 09:24:11 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9ODOA3O017192 for ; Wed, 24 Oct 2007 09:24:10 -0400 From: Adam Litke Subject: [PATCH 3/3] [PATCH] hugetlb: Enforce quotas during reservation for shared mappings Date: Wed, 24 Oct 2007 06:24:08 -0700 Message-Id: <20071024132408.13013.81566.stgit@kernel> In-Reply-To: <20071024132335.13013.76227.stgit@kernel> References: <20071024132335.13013.76227.stgit@kernel> Content-Type: text/plain; charset=utf-8; format=fixed Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Return-Path: To: Andrew Morton Cc: linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are reserved to guarantee no problems will occur later when instantiating pages. If quotas are in force, page instantiation could fail due to a race with another process or an oversized (but approved) shared mapping. To prevent these scenarios, debit the quota for the full reservation amount up front and credit the unused quota when the reservation is released. Signed-off-by: Adam Litke --- mm/hugetlb.c | 18 ++++++++++++------ 1 files changed, 12 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eaade8c..5fc075e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -769,6 +769,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page *page; struct address_space *mapping; pte_t new_pte; + int shared_page = vma->vm_flags & VM_SHARED; mapping = vma->vm_file->f_mapping; idx = ((address - vma->vm_start) >> HPAGE_SHIFT) @@ -784,23 +785,24 @@ retry: size = i_size_read(mapping->host) >> HPAGE_SHIFT; if (idx >= size) goto out; - if (hugetlb_get_quota(mapping, 1)) + /* Shared pages are quota-accounted at reservation/mmap time */ + if (!shared_page && hugetlb_get_quota(mapping, 1)) goto out; page = alloc_huge_page(vma, address); if (!page) { - hugetlb_put_quota(mapping, 1); + if (!shared_page) + hugetlb_put_quota(mapping, 1); ret = VM_FAULT_OOM; goto out; } clear_huge_page(page, address); - if (vma->vm_flags & VM_SHARED) { + if (shared_page) { int err; err = add_to_page_cache(page, mapping, idx, GFP_KERNEL); if (err) { put_page(page); - hugetlb_put_quota(mapping, 1); if (err == -EEXIST) goto retry; goto out; @@ -834,7 +836,8 @@ out: backout: spin_unlock(&mm->page_table_lock); - hugetlb_put_quota(mapping, 1); + if (!shared_page) + hugetlb_put_quota(mapping, 1); unlock_page(page); put_page(page); goto out; @@ -1144,6 +1147,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) if (chg < 0) return chg; + if (hugetlb_get_quota(inode->i_mapping, chg)) + return -ENOSPC; ret = hugetlb_acct_memory(chg); if (ret < 0) return ret; @@ -1154,5 +1159,6 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) { long chg = region_truncate(&inode->i_mapping->private_list, offset); - hugetlb_acct_memory(freed - chg); + hugetlb_put_quota(inode->i_mapping, (chg - freed)); + hugetlb_acct_memory(-(chg - freed)); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e3.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OIhbte013078 for ; Wed, 24 Oct 2007 14:43:37 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OIhbtf124806 for ; Wed, 24 Oct 2007 14:43:37 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OIhbjq032590 for ; Wed, 24 Oct 2007 14:43:37 -0400 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Dave Hansen In-Reply-To: <20071024132345.13013.36192.stgit@kernel> References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> Content-Type: text/plain Date: Wed, 24 Oct 2007 11:43:34 -0700 Message-Id: <1193251414.4039.14.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 06:23 -0700, Adam Litke wrote: > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -685,7 +685,17 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > flush_tlb_range(vma, start, end); > list_for_each_entry_safe(page, tmp, &page_list, lru) { > list_del(&page->lru); > - put_page(page); > + if (put_page_testzero(page)) { > + /* > + * When releasing the last reference to a page we must > + * credit the quota. For MAP_PRIVATE pages this occurs > + * when the last PTE is cleared, for MAP_SHARED pages > + * this occurs when the file is truncated. > + */ > + VM_BUG_ON(PageMapping(page)); > + hugetlb_put_quota(vma->vm_file->f_mapping); > + free_huge_page(page); > + } > } > } That's a pretty good mechanism to use. This particular nugget is for MAP_PRIVATE pages only, right? The shared ones should have another ref out on them for the 'mapping' too, so won't get released at unmap, right? It isn't obvious from the context here, but how did free_huge_page() get called for these pages before? Can you use free_pages_check() here to get some more comprehensive page tests? Just a cursory look at it makes me think that it should work. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e33.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OJ397t008389 for ; Wed, 24 Oct 2007 15:03:09 -0400 Received: from d03av02.boulder.ibm.com (d03av02.boulder.ibm.com [9.17.195.168]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OJ35jm093024 for ; Wed, 24 Oct 2007 13:03:05 -0600 Received: from d03av02.boulder.ibm.com (loopback [127.0.0.1]) by d03av02.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OJ340E004591 for ; Wed, 24 Oct 2007 13:03:04 -0600 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Adam Litke In-Reply-To: <1193251414.4039.14.camel@localhost> References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> Content-Type: text/plain Date: Wed, 24 Oct 2007 14:03:03 -0500 Message-Id: <1193252583.18417.52.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 11:43 -0700, Dave Hansen wrote: > On Wed, 2007-10-24 at 06:23 -0700, Adam Litke wrote: > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -685,7 +685,17 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > > flush_tlb_range(vma, start, end); > > list_for_each_entry_safe(page, tmp, &page_list, lru) { > > list_del(&page->lru); > > - put_page(page); > > + if (put_page_testzero(page)) { > > + /* > > + * When releasing the last reference to a page we must > > + * credit the quota. For MAP_PRIVATE pages this occurs > > + * when the last PTE is cleared, for MAP_SHARED pages > > + * this occurs when the file is truncated. > > + */ > > + VM_BUG_ON(PageMapping(page)); > > + hugetlb_put_quota(vma->vm_file->f_mapping); > > + free_huge_page(page); > > + } > > } > > } > > That's a pretty good mechanism to use. > > This particular nugget is for MAP_PRIVATE pages only, right? The shared > ones should have another ref out on them for the 'mapping' too, so won't > get released at unmap, right? Yep that's right. Shared pages are released by truncate_hugepages() when the ref for the mapping is dropped. > It isn't obvious from the context here, but how did free_huge_page() get > called for these pages before? put_page() calls put_compound_page() which looks up the page dtor (free_huge_page) and calls it. > Can you use free_pages_check() here to get some more comprehensive page > tests? Just a cursory look at it makes me think that it should work. Yeah, that should work but I am a little hesitant to use free_pages_check() because it may happen to "just work." None of the other hugetlb code concerns itself with all of those page flags so it seems a little out of place to start caring only here. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OJ74BB010536 for ; Wed, 24 Oct 2007 15:07:04 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OJ74cs114352 for ; Wed, 24 Oct 2007 15:07:04 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OJ73tI012995 for ; Wed, 24 Oct 2007 15:07:03 -0400 Subject: Re: [PATCH 3/3] [PATCH] hugetlb: Enforce quotas during reservation for shared mappings From: Dave Hansen In-Reply-To: <20071024132408.13013.81566.stgit@kernel> References: <20071024132335.13013.76227.stgit@kernel> <20071024132408.13013.81566.stgit@kernel> Content-Type: text/plain Date: Wed, 24 Oct 2007 12:07:01 -0700 Message-Id: <1193252821.4039.33.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 06:24 -0700, Adam Litke wrote: > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index eaade8c..5fc075e 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -769,6 +769,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, > struct page *page; > struct address_space *mapping; > pte_t new_pte; > + int shared_page = vma->vm_flags & VM_SHARED; > > mapping = vma->vm_file->f_mapping; > idx = ((address - vma->vm_start) >> HPAGE_SHIFT) > @@ -784,23 +785,24 @@ retry: > size = i_size_read(mapping->host) >> HPAGE_SHIFT; > if (idx >= size) > goto out; > - if (hugetlb_get_quota(mapping, 1)) > + /* Shared pages are quota-accounted at reservation/mmap time */ > + if (!shared_page && hugetlb_get_quota(mapping, 1)) > goto out; > page = alloc_huge_page(vma, address); Since alloc_huge_page() gets the VMA it could, in theory, be doing the accounting. The other user, hugetlb_cow(), seems to have a similar code path. But, it doesn't have to worry about shared_page, right? We can only have COWs on MAP_PRIVATE. I'm just trying to find ways to future-proof the quotas since they already got screwed up once. The fewer call sites we have for them, the fewer places they can get screwed up. :) > if (!page) { > - hugetlb_put_quota(mapping, 1); > + if (!shared_page) > + hugetlb_put_quota(mapping, 1); > ret = VM_FAULT_OOM; > goto out; > } > clear_huge_page(page, address); > > - if (vma->vm_flags & VM_SHARED) { > + if (shared_page) { > int err; > > err = add_to_page_cache(page, mapping, idx, GFP_KERNEL); > if (err) { > put_page(page); > - hugetlb_put_quota(mapping, 1); > if (err == -EEXIST) > goto retry; > goto out; To where was this quota put moved? Is it because we're in a fault path here, and shared pages don't modify quotas during faults, only at mmap/truncate() time now? > backout: > spin_unlock(&mm->page_table_lock); > - hugetlb_put_quota(mapping, 1); > + if (!shared_page) > + hugetlb_put_quota(mapping, 1); > unlock_page(page); > put_page(page); > goto out; > @@ -1144,6 +1147,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) > if (chg < 0) > return chg; > > + if (hugetlb_get_quota(inode->i_mapping, chg)) > + return -ENOSPC; > ret = hugetlb_acct_memory(chg); > if (ret < 0) > return ret; > @@ -1154,5 +1159,6 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) > void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) > { > long chg = region_truncate(&inode->i_mapping->private_list, offset); > - hugetlb_acct_memory(freed - chg); > + hugetlb_put_quota(inode->i_mapping, (chg - freed)); > + hugetlb_acct_memory(-(chg - freed)); > } Would it be any easier to just do all of the quota operations in _either_ truncate_hugepages() or in here? Could you skip the quota operation in truncate_hugepages()'s while() loop, and just put the quota for the entire region in hugetlb_unreserve_pages()? void hugetlb_unreserve_pages(struct inode *inode, long offset, long already_freed) { long total_truncated = region_truncate(&inode->i_mapping->private_list, offset); long newly_freed = total_truncated - already_freed; hugetlb_put_quota(inode->i_mapping, newly_freed); hugetlb_acct_memory(-newly_freed); } I do see several hugetlb_put_quota()/hugetlb_acct_memory() pairs next to each other. Do they deserve to be lumped together in one helper? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OJIf4d009757 for ; Wed, 24 Oct 2007 15:18:41 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OJIfa2103266 for ; Wed, 24 Oct 2007 15:18:41 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OJIejB022664 for ; Wed, 24 Oct 2007 15:18:40 -0400 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Dave Hansen In-Reply-To: <1193252583.18417.52.camel@localhost.localdomain> References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> Content-Type: text/plain Date: Wed, 24 Oct 2007 12:18:38 -0700 Message-Id: <1193253518.4039.41.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 14:03 -0500, Adam Litke wrote: > On Wed, 2007-10-24 at 11:43 -0700, Dave Hansen wrote: > > On Wed, 2007-10-24 at 06:23 -0700, Adam Litke wrote: > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -685,7 +685,17 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, > > > flush_tlb_range(vma, start, end); > > > list_for_each_entry_safe(page, tmp, &page_list, lru) { > > > list_del(&page->lru); > > > - put_page(page); > > > + if (put_page_testzero(page)) { > > > + /* > > > + * When releasing the last reference to a page we must > > > + * credit the quota. For MAP_PRIVATE pages this occurs > > > + * when the last PTE is cleared, for MAP_SHARED pages > > > + * this occurs when the file is truncated. > > > + */ Could we change this comment a bit? It isn't completely clear that _this_ code is never called for shared pages. /* * We'll never get here for MAP_SHARED. We always * have another reference out for the address_space, * and will never get to page_count()==0 during * an unmap. We put the quota for these in * some_function() instead. */ > > > + VM_BUG_ON(PageMapping(page)); > > > + hugetlb_put_quota(vma->vm_file->f_mapping); > > > + free_huge_page(page); > > > + } > > > } > > > } > > > > That's a pretty good mechanism to use. > > > > This particular nugget is for MAP_PRIVATE pages only, right? The shared > > ones should have another ref out on them for the 'mapping' too, so won't > > get released at unmap, right? > > Yep that's right. Shared pages are released by truncate_hugepages() > when the ref for the mapping is dropped. > > > It isn't obvious from the context here, but how did free_huge_page() get > > called for these pages before? > > put_page() calls put_compound_page() which looks up the page dtor > (free_huge_page) and calls it. Ahhh, and put_page_testzero() doesn't hook into the destructors at all. I wish it had an __ prefixed. :) > > Can you use free_pages_check() here to get some more comprehensive page > > tests? Just a cursory look at it makes me think that it should work. > > Yeah, that should work but I am a little hesitant to use > free_pages_check() because it may happen to "just work." None of the > other hugetlb code concerns itself with all of those page flags so it > seems a little out of place to start caring only here. I just think it is more comprehensive. If you're going to check PageMapping(), should you not also check page->mapping to make sure they are consistent? Anyway, bad_page() may also be more appropriate than a VM_BUG_ON(). It gets you a much prettier message. Your comment about this happening when the last PTE is shot down would be reinforced by having the page_mapcount() check in free_pages_check(). But, it would also be worthless since we don't do rmap for huge pages. ;) -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zps75.corp.google.com (zps75.corp.google.com [172.25.146.75]) by smtp-out.google.com with ESMTP id l9OJLeB7025222 for ; Wed, 24 Oct 2007 20:21:41 +0100 Received: from rv-out-0910.google.com (rvfc24.prod.google.com [10.140.180.24]) by zps75.corp.google.com with ESMTP id l9OJLLRd020215 for ; Wed, 24 Oct 2007 12:21:40 -0700 Received: by rv-out-0910.google.com with SMTP id c24so260467rvf for ; Wed, 24 Oct 2007 12:21:40 -0700 (PDT) Message-ID: Date: Wed, 24 Oct 2007 12:21:40 -0700 From: "Ken Chen" Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management In-Reply-To: <1193252583.18417.52.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Dave Hansen , Andrew Morton , linux-mm@kvack.org, Andy Whitcroft List-ID: On 10/24/07, Adam Litke wrote: > On Wed, 2007-10-24 at 11:43 -0700, Dave Hansen wrote: > > This particular nugget is for MAP_PRIVATE pages only, right? The shared > > ones should have another ref out on them for the 'mapping' too, so won't > > get released at unmap, right? > > Yep that's right. Shared pages are released by truncate_hugepages() > when the ref for the mapping is dropped. I think as a follow up patch, we should debit the quota in free_huge_page(), so you don't have to open code it like this and also consolidate calls to hugetlb_put_quota() in one place. It's cleaner that way. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e31.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OJqxvs011159 for ; Wed, 24 Oct 2007 15:52:59 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OJqxS3080276 for ; Wed, 24 Oct 2007 13:52:59 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OJqxlh021616 for ; Wed, 24 Oct 2007 13:52:59 -0600 Subject: Re: [PATCH 3/3] [PATCH] hugetlb: Enforce quotas during reservation for shared mappings From: Adam Litke In-Reply-To: <1193252821.4039.33.camel@localhost> References: <20071024132335.13013.76227.stgit@kernel> <20071024132408.13013.81566.stgit@kernel> <1193252821.4039.33.camel@localhost> Content-Type: text/plain Date: Wed, 24 Oct 2007 14:52:58 -0500 Message-Id: <1193255578.18417.63.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 12:07 -0700, Dave Hansen wrote: > On Wed, 2007-10-24 at 06:24 -0700, Adam Litke wrote: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index eaade8c..5fc075e 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -769,6 +769,7 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, > > struct page *page; > > struct address_space *mapping; > > pte_t new_pte; > > + int shared_page = vma->vm_flags & VM_SHARED; > > > > mapping = vma->vm_file->f_mapping; > > idx = ((address - vma->vm_start) >> HPAGE_SHIFT) > > @@ -784,23 +785,24 @@ retry: > > size = i_size_read(mapping->host) >> HPAGE_SHIFT; > > if (idx >= size) > > goto out; > > - if (hugetlb_get_quota(mapping, 1)) > > + /* Shared pages are quota-accounted at reservation/mmap time */ > > + if (!shared_page && hugetlb_get_quota(mapping, 1)) > > goto out; > > page = alloc_huge_page(vma, address); > > Since alloc_huge_page() gets the VMA it could, in theory, be doing the > accounting. The other user, hugetlb_cow(), seems to have a similar code > path. But, it doesn't have to worry about shared_page, right? We can > only have COWs on MAP_PRIVATE. > > I'm just trying to find ways to future-proof the quotas since they > already got screwed up once. The fewer call sites we have for them, the > fewer places they can get screwed up. :) Yep. Originally I wanted to put the hugetlb_get_quota() call inside alloc_huge_page() but the devil is in the details. Failure to get quota needs to result in a SIGBUS whereas a standard allocation failure is OOM. Because of this, we'd still need special handling of the alloc_huge_page() return value. While that can be done easily enough, I didn't think it was worth it. > > if (!page) { > > - hugetlb_put_quota(mapping, 1); > > + if (!shared_page) > > + hugetlb_put_quota(mapping, 1); > > ret = VM_FAULT_OOM; > > goto out; > > } > > clear_huge_page(page, address); > > > > - if (vma->vm_flags & VM_SHARED) { > > + if (shared_page) { > > int err; > > > > err = add_to_page_cache(page, mapping, idx, GFP_KERNEL); > > if (err) { > > put_page(page); > > - hugetlb_put_quota(mapping, 1); > > if (err == -EEXIST) > > goto retry; > > goto out; > > To where was this quota put moved? Is it because we're in a fault path > here, and shared pages don't modify quotas during faults, only at > mmap/truncate() time now? That's right. > > backout: > > spin_unlock(&mm->page_table_lock); > > - hugetlb_put_quota(mapping, 1); > > + if (!shared_page) > > + hugetlb_put_quota(mapping, 1); > > unlock_page(page); > > put_page(page); > > goto out; > > @@ -1144,6 +1147,8 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) > > if (chg < 0) > > return chg; > > > > + if (hugetlb_get_quota(inode->i_mapping, chg)) > > + return -ENOSPC; > > ret = hugetlb_acct_memory(chg); > > if (ret < 0) > > return ret; > > @@ -1154,5 +1159,6 @@ int hugetlb_reserve_pages(struct inode *inode, long from, long to) > > void hugetlb_unreserve_pages(struct inode *inode, long offset, long freed) > > { > > long chg = region_truncate(&inode->i_mapping->private_list, offset); > > - hugetlb_acct_memory(freed - chg); > > + hugetlb_put_quota(inode->i_mapping, (chg - freed)); > > + hugetlb_acct_memory(-(chg - freed)); > > } > > Would it be any easier to just do all of the quota operations in > _either_ truncate_hugepages() or in here? Could you skip the quota > operation in truncate_hugepages()'s while() loop, and just put the quota > for the entire region in hugetlb_unreserve_pages()? Yep, we certainly could do that. I'll change it to that for the next version so we can see how it looks. > void hugetlb_unreserve_pages(struct inode *inode, long offset, long already_freed) > { > long total_truncated = region_truncate(&inode->i_mapping->private_list, offset); > long newly_freed = total_truncated - already_freed; > hugetlb_put_quota(inode->i_mapping, newly_freed); > hugetlb_acct_memory(-newly_freed); > } > > I do see several hugetlb_put_quota()/hugetlb_acct_memory() pairs next to > each other. Do they deserve to be lumped together in one helper? I don't really think putting them together in one helper would do anything to improve readability. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OK0PSm026208 for ; Wed, 24 Oct 2007 16:00:25 -0400 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OK0PwN133344 for ; Wed, 24 Oct 2007 16:00:25 -0400 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OK0O2v023582 for ; Wed, 24 Oct 2007 16:00:25 -0400 Subject: Re: [PATCH 3/3] [PATCH] hugetlb: Enforce quotas during reservation for shared mappings From: Dave Hansen In-Reply-To: <1193255578.18417.63.camel@localhost.localdomain> References: <20071024132335.13013.76227.stgit@kernel> <20071024132408.13013.81566.stgit@kernel> <1193252821.4039.33.camel@localhost> <1193255578.18417.63.camel@localhost.localdomain> Content-Type: text/plain Date: Wed, 24 Oct 2007 13:00:22 -0700 Message-Id: <1193256022.4039.58.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Andrew Morton , linux-mm@kvack.org, Ken Chen , Andy Whitcroft List-ID: On Wed, 2007-10-24 at 14:52 -0500, Adam Litke wrote: > > > Since alloc_huge_page() gets the VMA it could, in theory, be doing the > > accounting. The other user, hugetlb_cow(), seems to have a similar code > > path. But, it doesn't have to worry about shared_page, right? We can > > only have COWs on MAP_PRIVATE. > > > > I'm just trying to find ways to future-proof the quotas since they > > already got screwed up once. The fewer call sites we have for them, the > > fewer places they can get screwed up. :) > > Yep. Originally I wanted to put the hugetlb_get_quota() call inside > alloc_huge_page() but the devil is in the details. Failure to get quota > needs to result in a SIGBUS whereas a standard allocation failure is > OOM. Because of this, we'd still need special handling of the > alloc_huge_page() return value. While that can be done easily enough, I > didn't think it was worth it. Does it need special handling if we just use ERR_PTR()s? -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay04.boulder.ibm.com (d03relay04.boulder.ibm.com [9.17.195.106]) by e35.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OK26qP000717 for ; Wed, 24 Oct 2007 16:02:06 -0400 Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169]) by d03relay04.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OK25Tk064482 for ; Wed, 24 Oct 2007 14:02:05 -0600 Received: from d03av03.boulder.ibm.com (loopback [127.0.0.1]) by d03av03.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OK25qE006758 for ; Wed, 24 Oct 2007 14:02:05 -0600 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Adam Litke In-Reply-To: References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> Content-Type: text/plain Date: Wed, 24 Oct 2007 15:02:04 -0500 Message-Id: <1193256124.18417.70.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ken Chen Cc: Dave Hansen , Andrew Morton , linux-mm@kvack.org, Andy Whitcroft List-ID: On Wed, 2007-10-24 at 12:21 -0700, Ken Chen wrote: > On 10/24/07, Adam Litke wrote: > > On Wed, 2007-10-24 at 11:43 -0700, Dave Hansen wrote: > > > This particular nugget is for MAP_PRIVATE pages only, right? The shared > > > ones should have another ref out on them for the 'mapping' too, so won't > > > get released at unmap, right? > > > > Yep that's right. Shared pages are released by truncate_hugepages() > > when the ref for the mapping is dropped. > > I think as a follow up patch, we should debit the quota in > free_huge_page(), so you don't have to open code it like this and also > consolidate calls to hugetlb_put_quota() in one place. It's cleaner > that way. At free_huge_page() time, you can't associate the page with a struct address_space so it becomes hard to credit the proper filesystem. When freeing the page, page->mapping is no longer valid (even for shared pages). -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9OMCRE1022506 for ; Wed, 24 Oct 2007 18:12:27 -0400 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9OMCQCq119988 for ; Wed, 24 Oct 2007 18:12:26 -0400 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9OMCQtj002366 for ; Wed, 24 Oct 2007 18:12:26 -0400 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Dave Hansen In-Reply-To: <1193256124.18417.70.camel@localhost.localdomain> References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> <1193256124.18417.70.camel@localhost.localdomain> Content-Type: text/plain Date: Wed, 24 Oct 2007 15:12:24 -0700 Message-Id: <1193263944.4039.87.camel@localhost> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Adam Litke Cc: Ken Chen , Andrew Morton , linux-mm@kvack.org, Andy Whitcroft List-ID: On Wed, 2007-10-24 at 15:02 -0500, Adam Litke wrote: > > I think as a follow up patch, we should debit the quota in > > free_huge_page(), so you don't have to open code it like this and also > > consolidate calls to hugetlb_put_quota() in one place. It's cleaner > > that way. > > At free_huge_page() time, you can't associate the page with a struct > address_space so it becomes hard to credit the proper filesystem. When > freeing the page, page->mapping is no longer valid (even for shared > pages). Why is that? Because we rely on put_page() calling into the destructors, and don't pass along mapping? There are basically two free paths: shared file truncating and the last vma using a MAP_PRIVATE page being munmap()'d. Your code just made it so that regular put_page() isn't called for the MAP_PRIVATE free case. The destructor is called manually. So, it doesn't really apply. For the shared case, the quota calls aren't even done during allocation and free, but at truncation, so they wouldn't have a VMA available to determine shared/private. But, I think what I'm realizing is that the free paths for shared vs. private are actually quite distinct. Even more now after your patches abolish using and actual put_page() (and the destructors) on private pages losing their last mapping. I think it may make a lot of sense to have {alloc,free}_{private,shared}_huge_page(). It'll really help readability, and I _think_ it gives you a handy dandy place to add the different quota operations needed. -- Dave -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from zps38.corp.google.com (zps38.corp.google.com [172.25.146.38]) by smtp-out.google.com with ESMTP id l9P5KUYl011679 for ; Wed, 24 Oct 2007 22:20:30 -0700 Received: from rv-out-0910.google.com (rvbl15.prod.google.com [10.140.88.15]) by zps38.corp.google.com with ESMTP id l9P5KIcm013289 for ; Wed, 24 Oct 2007 22:20:30 -0700 Received: by rv-out-0910.google.com with SMTP id l15so339907rvb for ; Wed, 24 Oct 2007 22:20:30 -0700 (PDT) Message-ID: Date: Wed, 24 Oct 2007 22:20:30 -0700 From: "Ken Chen" Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management In-Reply-To: <1193263944.4039.87.camel@localhost> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> <1193256124.18417.70.camel@localhost.localdomain> <1193263944.4039.87.camel@localhost> Sender: owner-linux-mm@kvack.org Return-Path: To: Dave Hansen Cc: Adam Litke , Andrew Morton , linux-mm@kvack.org, Andy Whitcroft List-ID: On 10/24/07, Dave Hansen wrote: > But, I think what I'm realizing is that the free paths for shared vs. > private are actually quite distinct. Even more now after your patches > abolish using and actual put_page() (and the destructors) on private > pages losing their last mapping. > > I think it may make a lot of sense to have > {alloc,free}_{private,shared}_huge_page(). It'll really help > readability, and I _think_ it gives you a handy dandy place to add the > different quota operations needed. Here is my version of re-factoring hugetlb_put_quota() into free_huge_page. Not exactly what Dave suggested, but at least consolidate quota credit in one place. diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 12aca8e..6513f56 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -364,7 +364,6 @@ static void truncate_hugepages( ++next; truncate_huge_page(page); unlock_page(page); - hugetlb_put_quota(mapping); freed++; } huge_pagevec_release(&pvec); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8b809ec..70c58cd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -116,6 +116,9 @@ static void free_huge_page(struct page *page) { int nid = page_to_nid(page); + struct address_space *mapping; + + mapping = (struct address_space *) page_private(page); BUG_ON(page_count(page)); INIT_LIST_HEAD(&page->lru); @@ -129,6 +132,9 @@ static void free_huge_page(struct page *page) enqueue_huge_page(page); } spin_unlock(&hugetlb_lock); + if (mapping) + hugetlb_put_quota(mapping); + set_page_private(page, 0); } /* @@ -369,6 +375,7 @@ static struct page *alloc_huge_page( spin_unlock(&hugetlb_lock); set_page_refcounted(page); + set_page_private(page, (unsigned long) vma->vm_file->f_mapping); return page; fail: @@ -382,6 +389,8 @@ fail: if (!use_reserved_page) page = alloc_buddy_huge_page(vma, addr); + if (page) + set_page_private(page, (unsigned long) vma->vm_file->f_mapping); return page; } @@ -788,7 +797,6 @@ retry: err = add_to_page_cache(page, mapping, idx, GFP_KERNEL); if (err) { put_page(page); - hugetlb_put_quota(mapping); if (err == -EEXIST) goto retry; goto out; @@ -822,7 +830,6 @@ out: backout: spin_unlock(&mm->page_table_lock); - hugetlb_put_quota(mapping); unlock_page(page); put_page(page); goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from d03relay02.boulder.ibm.com (d03relay02.boulder.ibm.com [9.17.195.227]) by e34.co.us.ibm.com (8.13.8/8.13.8) with ESMTP id l9PEt0O9000776 for ; Thu, 25 Oct 2007 10:55:00 -0400 Received: from d03av01.boulder.ibm.com (d03av01.boulder.ibm.com [9.17.195.167]) by d03relay02.boulder.ibm.com (8.13.8/8.13.8/NCO v8.5) with ESMTP id l9PEt0mK108752 for ; Thu, 25 Oct 2007 08:55:00 -0600 Received: from d03av01.boulder.ibm.com (loopback [127.0.0.1]) by d03av01.boulder.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id l9PEsxbe021183 for ; Thu, 25 Oct 2007 08:54:59 -0600 Subject: Re: [PATCH 1/3] [FIX] hugetlb: Fix broken fs quota management From: Adam Litke In-Reply-To: References: <20071024132335.13013.76227.stgit@kernel> <20071024132345.13013.36192.stgit@kernel> <1193251414.4039.14.camel@localhost> <1193252583.18417.52.camel@localhost.localdomain> <1193256124.18417.70.camel@localhost.localdomain> <1193263944.4039.87.camel@localhost> Content-Type: text/plain Date: Thu, 25 Oct 2007 09:54:58 -0500 Message-Id: <1193324098.18417.80.camel@localhost.localdomain> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org Return-Path: To: Ken Chen Cc: Dave Hansen , Andrew Morton , linux-mm@kvack.org, Andy Whitcroft List-ID: On Wed, 2007-10-24 at 22:20 -0700, Ken Chen wrote: > On 10/24/07, Dave Hansen wrote: > > But, I think what I'm realizing is that the free paths for shared vs. > > private are actually quite distinct. Even more now after your patches > > abolish using and actual put_page() (and the destructors) on private > > pages losing their last mapping. > > > > I think it may make a lot of sense to have > > {alloc,free}_{private,shared}_huge_page(). It'll really help > > readability, and I _think_ it gives you a handy dandy place to add the > > different quota operations needed. > > Here is my version of re-factoring hugetlb_put_quota() into > free_huge_page. Not exactly what Dave suggested, but at least > consolidate quota credit in one place. I think consolidating quota into alloc/free_huge_page is a laudable goal, but it has some problems (see below) that I feel go beyond simply fixing the broken accounting (the initial purpose of the patches I was targeting for -rc2). > @@ -369,6 +375,7 @@ static struct page *alloc_huge_page( > > spin_unlock(&hugetlb_lock); > set_page_refcounted(page); > + set_page_private(page, (unsigned long) vma->vm_file->f_mapping); > return page; > > fail: This seems like a layering violation to me. We set page->private here, but if add_to_page_cache() is called on this page later, it will set page->mapping. Granted it will overwrite the exact same value, but it does so using a different field in the union. So I don't think we should be messing with page->private in this way. -- Adam Litke - (agl at us.ibm.com) IBM Linux Technology Center -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org