* put_page() tries to handle hugepages but fails
@ 2004-04-23 8:18 David Gibson
2004-04-23 8:34 ` Andrew Morton
0 siblings, 1 reply; 12+ messages in thread
From: David Gibson @ 2004-04-23 8:18 UTC (permalink / raw)
To: Andrew Morton; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev
Andrew, please apply.
The code of put_page() is misleading, in that it appears to have code
handling PageCompound pages (i.e. hugepages). However it won't
actually handle them correctly - __page_cache_release() will not work
properly on a compound. Instead, hugepages should be and are released
with huge_page_release() from mm/hugetlb.c. This patch removes the
broken PageCompound path from put_page(), replacing it with a
BUG_ON(). This also removes the initialization of page[1].mapping
from compoound pages, which was only ever used in this broken code
path.
Index: working-2.6/mm/swap.c
===================================================================
--- working-2.6.orig/mm/swap.c 2004-04-14 12:22:49.000000000 +1000
+++ working-2.6/mm/swap.c 2004-04-23 18:13:53.932263936 +1000
@@ -38,17 +38,7 @@
void put_page(struct page *page)
{
- if (unlikely(PageCompound(page))) {
- page = (struct page *)page->private;
- if (put_page_testzero(page)) {
- if (page[1].mapping) { /* destructor? */
- (*(void (*)(struct page *))page[1].mapping)(page);
- } else {
- __page_cache_release(page);
- }
- }
- return;
- }
+ BUG_ON(PageCompound(page));
if (!PageReserved(page) && put_page_testzero(page))
__page_cache_release(page);
}
Index: working-2.6/mm/page_alloc.c
===================================================================
--- working-2.6.orig/mm/page_alloc.c 2004-04-13 11:42:42.000000000 +1000
+++ working-2.6/mm/page_alloc.c 2004-04-23 18:18:06.799295248 +1000
@@ -118,7 +118,6 @@
int i;
int nr_pages = 1 << order;
- page[1].mapping = 0;
page[1].index = order;
for (i = 0; i < nr_pages; i++) {
struct page *p = page + i;
--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: put_page() tries to handle hugepages but fails 2004-04-23 8:18 put_page() tries to handle hugepages but fails David Gibson @ 2004-04-23 8:34 ` Andrew Morton 2004-04-23 10:28 ` William Lee Irwin III 2004-04-27 4:36 ` David Gibson 0 siblings, 2 replies; 12+ messages in thread From: Andrew Morton @ 2004-04-23 8:34 UTC (permalink / raw) To: David Gibson; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev David Gibson <david@gibson.dropbear.id.au> wrote: > > Andrew, please apply. > > The code of put_page() is misleading, in that it appears to have code > handling PageCompound pages (i.e. hugepages). However it won't > actually handle them correctly - __page_cache_release() will not work > properly on a compound. Instead, hugepages should be and are released > with huge_page_release() from mm/hugetlb.c. This patch removes the > broken PageCompound path from put_page(), replacing it with a > BUG_ON(). This also removes the initialization of page[1].mapping > from compoound pages, which was only ever used in this broken code > path. We could certainly remove the test for a null destructor in there and require that compound pages have a destructor installed. But the main reason why that code is in there is for transparently handling direct-io into hugepage regions. That code does perform put_page against 4k pageframes within the huge page and it does follow the pointer to the head page. With your patch applied get_user_pages() and bio_release_pages() will manipulate the refcounts of the inner 4k pages rather than the head pages and things will explode. We could change follow_hugetlb_page() to always take a ref against the head page and we could teach bio_release_pages() to perform appropriate pfn masking to locate the head page, and perform similar tricks for futexes-in-large-pages. But with the code as-is the refcounting works transparently. If it's "broken" I wanna know why. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-23 8:34 ` Andrew Morton @ 2004-04-23 10:28 ` William Lee Irwin III 2004-04-23 10:35 ` Andrew Morton 2004-04-27 4:36 ` David Gibson 1 sibling, 1 reply; 12+ messages in thread From: William Lee Irwin III @ 2004-04-23 10:28 UTC (permalink / raw) To: Andrew Morton; +Cc: David Gibson, linux-kernel, linuxppc64-dev On Fri, Apr 23, 2004 at 01:34:37AM -0700, Andrew Morton wrote: > We could certainly remove the test for a null destructor in there and > require that compound pages have a destructor installed. > But the main reason why that code is in there is for transparently handling > direct-io into hugepage regions. That code does perform put_page against > 4k pageframes within the huge page and it does follow the pointer to the > head page. > With your patch applied get_user_pages() and bio_release_pages() will > manipulate the refcounts of the inner 4k pages rather than the head pages > and things will explode. > We could change follow_hugetlb_page() to always take a ref against the head > page and we could teach bio_release_pages() to perform appropriate pfn > masking to locate the head page, and perform similar tricks for > futexes-in-large-pages. But with the code as-is the refcounting works > transparently. > If it's "broken" I wanna know why. The destructor is never invoked from that path, but it's not the path that should free it anyway. To me it appears the call to __page_cache_release() on the head of the hugepage should just be removed in favor of doing nothing; at best it can only race against concurrent put_page(), see page_count(page) vanish, and accidentally call free_hot_page() against the head of the hugepage. As hugepages are never on the LRU, the remainder of __page_cache_release() should be a nop for them. Untested patch below. -- wli Index: wli-2.6.6-rc2-mm1/mm/swap.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/mm/swap.c 2004-04-21 05:19:58.000000000 -0700 +++ wli-2.6.6-rc2-mm1/mm/swap.c 2004-04-23 03:21:22.000000000 -0700 @@ -41,15 +41,12 @@ if (unlikely(PageCompound(page))) { page = (struct page *)page->private; if (put_page_testzero(page)) { - if (page[1].mapping) { /* destructor? */ - (*(void (*)(struct page *))page[1].mapping)(page); - } else { - __page_cache_release(page); - } + void (*destructor)(struct page *); + destructor = (void (*)(struct page *))page[1].mapping; + BUG_ON(!destructor); + (*destructor)(page); } - return; - } - if (!PageReserved(page) && put_page_testzero(page)) + } else if (!PageReserved(page) && put_page_testzero(page)) __page_cache_release(page); } EXPORT_SYMBOL(put_page); ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-23 10:28 ` William Lee Irwin III @ 2004-04-23 10:35 ` Andrew Morton 2004-04-23 10:47 ` William Lee Irwin III 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2004-04-23 10:35 UTC (permalink / raw) To: William Lee Irwin III; +Cc: david, linux-kernel, linuxppc64-dev William Lee Irwin III <wli@holomorphy.com> wrote: > > --- wli-2.6.6-rc2-mm1.orig/mm/swap.c 2004-04-21 05:19:58.000000000 -0700 > +++ wli-2.6.6-rc2-mm1/mm/swap.c 2004-04-23 03:21:22.000000000 -0700 > @@ -41,15 +41,12 @@ > if (unlikely(PageCompound(page))) { > page = (struct page *)page->private; > if (put_page_testzero(page)) { > - if (page[1].mapping) { /* destructor? */ > - (*(void (*)(struct page *))page[1].mapping)(page); > - } else { > - __page_cache_release(page); > - } > + void (*destructor)(struct page *); > + destructor = (void (*)(struct page *))page[1].mapping; > + BUG_ON(!destructor); > + (*destructor)(page); > } > - return; > - } > - if (!PageReserved(page) && put_page_testzero(page)) > + } else if (!PageReserved(page) && put_page_testzero(page)) > __page_cache_release(page); Sure. This of course duplicates huge_page_release(), which can be killed off. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-23 10:35 ` Andrew Morton @ 2004-04-23 10:47 ` William Lee Irwin III 2004-04-23 11:28 ` William Lee Irwin III 0 siblings, 1 reply; 12+ messages in thread From: William Lee Irwin III @ 2004-04-23 10:47 UTC (permalink / raw) To: Andrew Morton; +Cc: david, linux-kernel, linuxppc64-dev On Fri, Apr 23, 2004 at 03:35:22AM -0700, Andrew Morton wrote: > Sure. > This of course duplicates huge_page_release(), which can be killed off. Ah, but mm/hugetlb.c is putting the destructor in head->lru.prev not head[1].mapping; fix below along with nuking huge_page_release(). -- wli Index: wli-2.6.6-rc2-mm1/include/linux/hugetlb.h =================================================================== --- wli-2.6.6-rc2-mm1.orig/include/linux/hugetlb.h 2004-04-21 05:20:33.000000000 -0700 +++ wli-2.6.6-rc2-mm1/include/linux/hugetlb.h 2004-04-23 03:40:24.000000000 -0700 @@ -18,7 +18,6 @@ void zap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); -void huge_page_release(struct page *); int hugetlb_report_meminfo(char *); int is_hugepage_mem_enough(size_t); unsigned long hugetlb_total_pages(void); @@ -70,7 +69,6 @@ #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) #define zap_hugepage_range(vma, start, len) BUG() #define unmap_hugepage_range(vma, start, end) BUG() -#define huge_page_release(page) BUG() #define is_hugepage_mem_enough(size) 0 #define hugetlb_report_meminfo(buf) 0 #define mark_mm_hugetlb(mm, vma) do { } while (0) Index: wli-2.6.6-rc2-mm1/mm/swap.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/mm/swap.c 2004-04-21 05:19:58.000000000 -0700 +++ wli-2.6.6-rc2-mm1/mm/swap.c 2004-04-23 03:21:22.000000000 -0700 @@ -41,15 +41,12 @@ if (unlikely(PageCompound(page))) { page = (struct page *)page->private; if (put_page_testzero(page)) { - if (page[1].mapping) { /* destructor? */ - (*(void (*)(struct page *))page[1].mapping)(page); - } else { - __page_cache_release(page); - } + void (*destructor)(struct page *); + destructor = (void (*)(struct page *))page[1].mapping; + BUG_ON(!destructor); + (*destructor)(page); } - return; - } - if (!PageReserved(page) && put_page_testzero(page)) + } else if (!PageReserved(page) && put_page_testzero(page)) __page_cache_release(page); } EXPORT_SYMBOL(put_page); Index: wli-2.6.6-rc2-mm1/mm/hugetlb.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/mm/hugetlb.c 2004-04-21 05:19:58.000000000 -0700 +++ wli-2.6.6-rc2-mm1/mm/hugetlb.c 2004-04-23 03:45:41.000000000 -0700 @@ -78,20 +78,12 @@ free_huge_pages--; spin_unlock(&hugetlb_lock); set_page_count(page, 1); - page->lru.prev = (void *)free_huge_page; + page[1].mapping = (void *)free_huge_page; for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) clear_highpage(&page[i]); return page; } -void huge_page_release(struct page *page) -{ - if (!put_page_testzero(page)) - return; - - free_huge_page(page); -} - static int __init hugetlb_init(void) { unsigned long i; Index: wli-2.6.6-rc2-mm1/arch/i386/mm/hugetlbpage.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/arch/i386/mm/hugetlbpage.c 2004-04-21 05:20:29.000000000 -0700 +++ wli-2.6.6-rc2-mm1/arch/i386/mm/hugetlbpage.c 2004-04-23 03:40:04.000000000 -0700 @@ -220,7 +220,7 @@ if (pte_none(pte)) continue; page = pte_page(pte); - huge_page_release(page); + put_page(page); } mm->rss -= (end - start) >> PAGE_SHIFT; flush_tlb_range(vma, start, end); Index: wli-2.6.6-rc2-mm1/arch/ia64/mm/hugetlbpage.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/arch/ia64/mm/hugetlbpage.c 2004-04-21 05:19:41.000000000 -0700 +++ wli-2.6.6-rc2-mm1/arch/ia64/mm/hugetlbpage.c 2004-04-23 03:40:07.000000000 -0700 @@ -249,7 +249,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + put_page(page); pte_clear(pte); } mm->rss -= (end - start) >> PAGE_SHIFT; Index: wli-2.6.6-rc2-mm1/arch/ppc64/mm/hugetlbpage.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/arch/ppc64/mm/hugetlbpage.c 2004-04-21 05:20:29.000000000 -0700 +++ wli-2.6.6-rc2-mm1/arch/ppc64/mm/hugetlbpage.c 2004-04-23 03:40:10.000000000 -0700 @@ -394,7 +394,7 @@ flush_hash_hugepage(mm->context, addr, pte, local); - huge_page_release(page); + put_page(page); } mm->rss -= (end - start) >> PAGE_SHIFT; Index: wli-2.6.6-rc2-mm1/arch/sh/mm/hugetlbpage.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/arch/sh/mm/hugetlbpage.c 2004-04-21 05:19:43.000000000 -0700 +++ wli-2.6.6-rc2-mm1/arch/sh/mm/hugetlbpage.c 2004-04-23 03:40:14.000000000 -0700 @@ -200,7 +200,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + put_page(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; Index: wli-2.6.6-rc2-mm1/arch/sparc64/mm/hugetlbpage.c =================================================================== --- wli-2.6.6-rc2-mm1.orig/arch/sparc64/mm/hugetlbpage.c 2004-04-21 05:19:43.000000000 -0700 +++ wli-2.6.6-rc2-mm1/arch/sparc64/mm/hugetlbpage.c 2004-04-23 03:40:17.000000000 -0700 @@ -198,7 +198,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + put_page(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-23 10:47 ` William Lee Irwin III @ 2004-04-23 11:28 ` William Lee Irwin III 0 siblings, 0 replies; 12+ messages in thread From: William Lee Irwin III @ 2004-04-23 11:28 UTC (permalink / raw) To: Andrew Morton, david, linux-kernel, linuxppc64-dev On Fri, Apr 23, 2004 at 03:35:22AM -0700, Andrew Morton wrote: >> Sure. >> This of course duplicates huge_page_release(), which can be killed off. On Fri, Apr 23, 2004 at 03:47:44AM -0700, William Lee Irwin III wrote: > Ah, but mm/hugetlb.c is putting the destructor in head->lru.prev not > head[1].mapping; fix below along with nuking huge_page_release(). I just noticed a general problem where ->mapping is often referred to naked and so either needs to be filled in for the base pages of the superpage or the examiners need to be referred to the head of the superpage. Likewise with ->index, which is apparently now used to hold the order. In general, if they're superpage properties, either the examiners must be referred to the head of the superpage, or (wastefully) the values must be replicated across the base pages. The current scheme to prevent radix tree deadlocks and other nastinesses with multiple insertions (e.g. inserting an entry into the radix tree for each base page, which I've seen naive abuses of shit themselves several times before) reduces the resolution of the pagecache indices, which is problematic in various ways with respect to finding the right base page without updating excessive numbers of callers. The reason why this is less of a problem than it sounds like is that the base page lookups usually go through pagetables, which effectively act either as radix trees with an insertion for each base page (normal cpus) or radix trees with low- resolution indices with procedural API's to recover offsets into the superpage and the proper base page (i386, ppc64). I'll be off the net all weekend, so hopefully the interested parties can use this info to resolve the remainder of the hugetlb put_page() issues. The work here is just cleaning up self-inconsistencies so it's not hard. -- wli ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-23 8:34 ` Andrew Morton 2004-04-23 10:28 ` William Lee Irwin III @ 2004-04-27 4:36 ` David Gibson 2004-04-27 4:41 ` David Gibson 2004-04-27 4:47 ` Andrew Morton 1 sibling, 2 replies; 12+ messages in thread From: David Gibson @ 2004-04-27 4:36 UTC (permalink / raw) To: Andrew Morton; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev On Fri, Apr 23, 2004 at 01:34:37AM -0700, Andrew Morton wrote: > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > Andrew, please apply. > > > > The code of put_page() is misleading, in that it appears to have code > > handling PageCompound pages (i.e. hugepages). However it won't > > actually handle them correctly - __page_cache_release() will not work > > properly on a compound. Instead, hugepages should be and are released > > with huge_page_release() from mm/hugetlb.c. This patch removes the > > broken PageCompound path from put_page(), replacing it with a > > BUG_ON(). This also removes the initialization of page[1].mapping > > from compoound pages, which was only ever used in this broken code > > path. > > We could certainly remove the test for a null destructor in there and > require that compound pages have a destructor installed. We could, but there seems little point, given that we have to test for PageCompound anyway, and the destructor is always the same for hugepages. > But the main reason why that code is in there is for transparently handling > direct-io into hugepage regions. That code does perform put_page against > 4k pageframes within the huge page and it does follow the pointer to the > head page. Ah. This I did not know. > With your patch applied get_user_pages() and bio_release_pages() will > manipulate the refcounts of the inner 4k pages rather than the head pages > and things will explode. Actually, no - bio_release_pages() will instantly BUG() with my patch. > We could change follow_hugetlb_page() to always take a ref against the head > page and we could teach bio_release_pages() to perform appropriate pfn > masking to locate the head page, and perform similar tricks for > futexes-in-large-pages. But with the code as-is the refcounting works > transparently. > > If it's "broken" I wanna know why. It's broken if put_page() ever actually reduces a hugepage's count to zero, because it will call __page_cache_release() which will do the wrong thing on hugepages. We could bypass that by requiring that a hugepage always has a destructor (and actually setting it in the correct place, which hugetlb.c currently doesn't as wli pointed out). But since we only have one destructor, why bother. Patch below makes put_page() work correctly on hugepages, and uses it (via page_cache_release()) in place of huge_page_release(). It also abolishes the destructor pointer - free_huge_page() is always used. Please apply. Index: working-2.6/mm/swap.c =================================================================== --- working-2.6.orig/mm/swap.c 2004-04-14 12:22:49.000000000 +1000 +++ working-2.6/mm/swap.c 2004-04-27 14:19:10.184343216 +1000 @@ -40,13 +40,8 @@ { if (unlikely(PageCompound(page))) { page = (struct page *)page->private; - if (put_page_testzero(page)) { - if (page[1].mapping) { /* destructor? */ - (*(void (*)(struct page *))page[1].mapping)(page); - } else { - __page_cache_release(page); - } - } + if (put_page_testzero(page)) + free_huge_page(page); return; } if (!PageReserved(page) && put_page_testzero(page)) Index: working-2.6/mm/hugetlb.c =================================================================== --- working-2.6.orig/mm/hugetlb.c 2004-04-14 12:22:49.000000000 +1000 +++ working-2.6/mm/hugetlb.c 2004-04-27 14:24:14.155325888 +1000 @@ -78,20 +78,11 @@ free_huge_pages--; spin_unlock(&hugetlb_lock); set_page_count(page, 1); - page->lru.prev = (void *)free_huge_page; for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) clear_highpage(&page[i]); return page; } -void huge_page_release(struct page *page) -{ - if (!put_page_testzero(page)) - return; - - free_huge_page(page); -} - static int __init hugetlb_init(void) { unsigned long i; Index: working-2.6/arch/i386/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/i386/mm/hugetlbpage.c 2004-04-22 09:09:11.000000000 +1000 +++ working-2.6/arch/i386/mm/hugetlbpage.c 2004-04-27 14:21:30.874387904 +1000 @@ -221,7 +221,7 @@ if (pte_none(pte)) continue; page = pte_page(pte); - huge_page_release(page); + page_cache_release(page); } mm->rss -= (end - start) >> PAGE_SHIFT; flush_tlb_range(vma, start, end); Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-04-27 14:21:18.225313880 +1000 @@ -395,7 +395,7 @@ flush_hash_hugepage(mm->context, addr, pte, local); - huge_page_release(page); + page_cache_release(page); } mm->rss -= (end - start) >> PAGE_SHIFT; Index: working-2.6/arch/sh/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sh/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/sh/mm/hugetlbpage.c 2004-04-27 14:22:19.234357656 +1000 @@ -200,7 +200,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + page_cache_release(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; Index: working-2.6/arch/sparc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sparc64/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/sparc64/mm/hugetlbpage.c 2004-04-27 14:22:34.274333944 +1000 @@ -198,7 +198,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + page_cache_release(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; Index: working-2.6/fs/hugetlbfs/inode.c =================================================================== --- working-2.6.orig/fs/hugetlbfs/inode.c 2004-04-13 11:42:38.000000000 +1000 +++ working-2.6/fs/hugetlbfs/inode.c 2004-04-27 14:23:17.963348384 +1000 @@ -142,7 +142,7 @@ int i; for (i = 0; i < pagevec_count(pvec); ++i) - huge_page_release(pvec->pages[i]); + page_cache_release(pvec->pages[i]); pagevec_reinit(pvec); } @@ -152,7 +152,7 @@ clear_page_dirty(page); ClearPageUptodate(page); remove_from_page_cache(page); - huge_page_release(page); + page_cache_release(page); } void truncate_hugepages(struct address_space *mapping, loff_t lstart) Index: working-2.6/include/linux/hugetlb.h =================================================================== --- working-2.6.orig/include/linux/hugetlb.h 2004-04-20 10:50:08.000000000 +1000 +++ working-2.6/include/linux/hugetlb.h 2004-04-27 14:23:59.443298984 +1000 @@ -16,7 +16,6 @@ void zap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); -void huge_page_release(struct page *); int hugetlb_report_meminfo(char *); int is_hugepage_mem_enough(size_t); unsigned long hugetlb_total_pages(void); @@ -68,7 +67,6 @@ #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) #define zap_hugepage_range(vma, start, len) BUG() #define unmap_hugepage_range(vma, start, end) BUG() -#define huge_page_release(page) BUG() #define is_hugepage_mem_enough(size) 0 #define hugetlb_report_meminfo(buf) 0 #define mark_mm_hugetlb(mm, vma) do { } while (0) -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-27 4:36 ` David Gibson @ 2004-04-27 4:41 ` David Gibson 2004-04-27 4:47 ` Andrew Morton 1 sibling, 0 replies; 12+ messages in thread From: David Gibson @ 2004-04-27 4:41 UTC (permalink / raw) To: Andrew Morton, apw, agl, mbligh, linux-kernel, linuxppc64-dev On Tue, Apr 27, 2004 at 02:36:52PM +1000, David Gibson wrote: > On Fri, Apr 23, 2004 at 01:34:37AM -0700, Andrew Morton wrote: > > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > > > Andrew, please apply. > > > > > > The code of put_page() is misleading, in that it appears to have code > > > handling PageCompound pages (i.e. hugepages). However it won't > > > actually handle them correctly - __page_cache_release() will not work > > > properly on a compound. Instead, hugepages should be and are released > > > with huge_page_release() from mm/hugetlb.c. This patch removes the > > > broken PageCompound path from put_page(), replacing it with a > > > BUG_ON(). This also removes the initialization of page[1].mapping > > > from compoound pages, which was only ever used in this broken code > > > path. > > > > We could certainly remove the test for a null destructor in there and > > require that compound pages have a destructor installed. > > We could, but there seems little point, given that we have to test for > PageCompound anyway, and the destructor is always the same for > hugepages. > > > But the main reason why that code is in there is for transparently handling > > direct-io into hugepage regions. That code does perform put_page against > > 4k pageframes within the huge page and it does follow the pointer to the > > head page. > > Ah. This I did not know. > > > With your patch applied get_user_pages() and bio_release_pages() will > > manipulate the refcounts of the inner 4k pages rather than the head pages > > and things will explode. > > Actually, no - bio_release_pages() will instantly BUG() with my patch. > > > We could change follow_hugetlb_page() to always take a ref against the head > > page and we could teach bio_release_pages() to perform appropriate pfn > > masking to locate the head page, and perform similar tricks for > > futexes-in-large-pages. But with the code as-is the refcounting works > > transparently. > > > > If it's "broken" I wanna know why. > > It's broken if put_page() ever actually reduces a hugepage's count to > zero, because it will call __page_cache_release() which will do the > wrong thing on hugepages. We could bypass that by requiring that a > hugepage always has a destructor (and actually setting it in the > correct place, which hugetlb.c currently doesn't as wli pointed out). > But since we only have one destructor, why bother. > > Patch below makes put_page() work correctly on hugepages, and uses it > (via page_cache_release()) in place of huge_page_release(). It also > abolishes the destructor pointer - free_huge_page() is always used. > Please apply. Damn. Supplied patch generated a warning. Fixed version below: Index: working-2.6/mm/swap.c =================================================================== --- working-2.6.orig/mm/swap.c 2004-04-14 12:22:49.000000000 +1000 +++ working-2.6/mm/swap.c 2004-04-27 14:42:40.257309016 +1000 @@ -30,6 +30,7 @@ #include <linux/cpu.h> #include <linux/notifier.h> #include <linux/init.h> +#include <linux/hugetlb.h> /* How many pages do we try to swap or page in/out together? */ int page_cluster; @@ -40,13 +41,8 @@ { if (unlikely(PageCompound(page))) { page = (struct page *)page->private; - if (put_page_testzero(page)) { - if (page[1].mapping) { /* destructor? */ - (*(void (*)(struct page *))page[1].mapping)(page); - } else { - __page_cache_release(page); - } - } + if (put_page_testzero(page)) + free_huge_page(page); return; } if (!PageReserved(page) && put_page_testzero(page)) Index: working-2.6/mm/hugetlb.c =================================================================== --- working-2.6.orig/mm/hugetlb.c 2004-04-14 12:22:49.000000000 +1000 +++ working-2.6/mm/hugetlb.c 2004-04-27 14:24:14.155325888 +1000 @@ -78,20 +78,11 @@ free_huge_pages--; spin_unlock(&hugetlb_lock); set_page_count(page, 1); - page->lru.prev = (void *)free_huge_page; for (i = 0; i < (HPAGE_SIZE/PAGE_SIZE); ++i) clear_highpage(&page[i]); return page; } -void huge_page_release(struct page *page) -{ - if (!put_page_testzero(page)) - return; - - free_huge_page(page); -} - static int __init hugetlb_init(void) { unsigned long i; Index: working-2.6/arch/i386/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/i386/mm/hugetlbpage.c 2004-04-22 09:09:11.000000000 +1000 +++ working-2.6/arch/i386/mm/hugetlbpage.c 2004-04-27 14:21:30.874387904 +1000 @@ -221,7 +221,7 @@ if (pte_none(pte)) continue; page = pte_page(pte); - huge_page_release(page); + page_cache_release(page); } mm->rss -= (end - start) >> PAGE_SHIFT; flush_tlb_range(vma, start, end); Index: working-2.6/arch/ppc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/ppc64/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/ppc64/mm/hugetlbpage.c 2004-04-27 14:21:18.225313880 +1000 @@ -395,7 +395,7 @@ flush_hash_hugepage(mm->context, addr, pte, local); - huge_page_release(page); + page_cache_release(page); } mm->rss -= (end - start) >> PAGE_SHIFT; Index: working-2.6/arch/sh/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sh/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/sh/mm/hugetlbpage.c 2004-04-27 14:22:19.234357656 +1000 @@ -200,7 +200,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + page_cache_release(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; Index: working-2.6/arch/sparc64/mm/hugetlbpage.c =================================================================== --- working-2.6.orig/arch/sparc64/mm/hugetlbpage.c 2004-04-20 10:50:06.000000000 +1000 +++ working-2.6/arch/sparc64/mm/hugetlbpage.c 2004-04-27 14:22:34.274333944 +1000 @@ -198,7 +198,7 @@ if (pte_none(*pte)) continue; page = pte_page(*pte); - huge_page_release(page); + page_cache_release(page); for (i = 0; i < (1 << HUGETLB_PAGE_ORDER); i++) { pte_clear(pte); pte++; Index: working-2.6/fs/hugetlbfs/inode.c =================================================================== --- working-2.6.orig/fs/hugetlbfs/inode.c 2004-04-13 11:42:38.000000000 +1000 +++ working-2.6/fs/hugetlbfs/inode.c 2004-04-27 14:23:17.963348384 +1000 @@ -142,7 +142,7 @@ int i; for (i = 0; i < pagevec_count(pvec); ++i) - huge_page_release(pvec->pages[i]); + page_cache_release(pvec->pages[i]); pagevec_reinit(pvec); } @@ -152,7 +152,7 @@ clear_page_dirty(page); ClearPageUptodate(page); remove_from_page_cache(page); - huge_page_release(page); + page_cache_release(page); } void truncate_hugepages(struct address_space *mapping, loff_t lstart) Index: working-2.6/include/linux/hugetlb.h =================================================================== --- working-2.6.orig/include/linux/hugetlb.h 2004-04-20 10:50:08.000000000 +1000 +++ working-2.6/include/linux/hugetlb.h 2004-04-27 14:23:59.443298984 +1000 @@ -16,7 +16,6 @@ void zap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long); int hugetlb_prefault(struct address_space *, struct vm_area_struct *); -void huge_page_release(struct page *); int hugetlb_report_meminfo(char *); int is_hugepage_mem_enough(size_t); unsigned long hugetlb_total_pages(void); @@ -68,7 +67,6 @@ #define hugetlb_prefault(mapping, vma) ({ BUG(); 0; }) #define zap_hugepage_range(vma, start, len) BUG() #define unmap_hugepage_range(vma, start, end) BUG() -#define huge_page_release(page) BUG() #define is_hugepage_mem_enough(size) 0 #define hugetlb_report_meminfo(buf) 0 #define mark_mm_hugetlb(mm, vma) do { } while (0) -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-27 4:36 ` David Gibson 2004-04-27 4:41 ` David Gibson @ 2004-04-27 4:47 ` Andrew Morton 2004-04-27 5:05 ` David Gibson 1 sibling, 1 reply; 12+ messages in thread From: Andrew Morton @ 2004-04-27 4:47 UTC (permalink / raw) To: David Gibson; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev David Gibson <david@gibson.dropbear.id.au> wrote: > > + if (put_page_testzero(page)) > + free_huge_page(page); Well yes, but this is assuming that compound pages are always hugetlb pages. It's true at present, but it doesn't have to always be true. The cost of the destructor is zilch, so why not? Please review the changes which went into 2.6.6-rc2-mm2. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-27 4:47 ` Andrew Morton @ 2004-04-27 5:05 ` David Gibson 2004-04-27 5:15 ` Andrew Morton 0 siblings, 1 reply; 12+ messages in thread From: David Gibson @ 2004-04-27 5:05 UTC (permalink / raw) To: Andrew Morton; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev On Mon, Apr 26, 2004 at 09:47:57PM -0700, Andrew Morton wrote: > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > + if (put_page_testzero(page)) > > + free_huge_page(page); > > Well yes, but this is assuming that compound pages are always hugetlb pages. > > It's true at present, but it doesn't have to always be true. The cost of > the destructor is zilch, so why not? True enough, I guess I was just following the "don't build it till you need it" philosophy. > Please review the changes which went into 2.6.6-rc2-mm2. Sorry, should have done that earlier. Looks reasonable with two small exceptions: 1) put_page() can still theoretically call __page_cache_release() which is wrong (and makes the code misleading) - patch below replaces this with a BUG() if there is no destructor. 2) what about wli's concern that mapping may be accessed without first checking for a PageCompound? Index: working-2.6/mm/swap.c =================================================================== --- working-2.6.orig/mm/swap.c 2004-04-14 12:22:49.000000000 +1000 +++ working-2.6/mm/swap.c 2004-04-27 15:02:30.046342392 +1000 @@ -41,11 +41,9 @@ if (unlikely(PageCompound(page))) { page = (struct page *)page->private; if (put_page_testzero(page)) { - if (page[1].mapping) { /* destructor? */ - (*(void (*)(struct page *))page[1].mapping)(page); - } else { - __page_cache_release(page); - } + BUG_ON(! page[1].mapping); + + (*(void (*)(struct page *))page[1].mapping)(page); } return; } -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-27 5:05 ` David Gibson @ 2004-04-27 5:15 ` Andrew Morton 2004-04-27 5:16 ` David Gibson 0 siblings, 1 reply; 12+ messages in thread From: Andrew Morton @ 2004-04-27 5:15 UTC (permalink / raw) To: David Gibson; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev David Gibson <david@gibson.dropbear.id.au> wrote: > > 1) put_page() can still theoretically call > __page_cache_release() which is wrong (and makes the code misleading) Don't think so? void put_page(struct page *page) { if (unlikely(PageCompound(page))) { page = (struct page *)page->private; if (put_page_testzero(page)) { void (*dtor)(struct page *page); dtor = (void (*)(struct page *))page[1].mapping; (*dtor)(page); } return; } if (!PageReserved(page) && put_page_testzero(page)) __page_cache_release(page); } > - patch below replaces this with a BUG() if there is no destructor. 2) > what about wli's concern that mapping may be accessed without first > checking for a PageCompound? That shouldn't be happening (should it?). If someone had such a page they'd need to lock it to play with ->mapping and I cannot think of any code which does all that and yet could stumble across a compound page? ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: put_page() tries to handle hugepages but fails 2004-04-27 5:15 ` Andrew Morton @ 2004-04-27 5:16 ` David Gibson 0 siblings, 0 replies; 12+ messages in thread From: David Gibson @ 2004-04-27 5:16 UTC (permalink / raw) To: Andrew Morton; +Cc: apw, agl, mbligh, linux-kernel, linuxppc64-dev On Mon, Apr 26, 2004 at 10:15:19PM -0700, Andrew Morton wrote: > David Gibson <david@gibson.dropbear.id.au> wrote: > > > > 1) put_page() can still theoretically call > > __page_cache_release() which is wrong (and makes the code misleading) > > Don't think so? > > void put_page(struct page *page) > { > if (unlikely(PageCompound(page))) { > page = (struct page *)page->private; > if (put_page_testzero(page)) { > void (*dtor)(struct page *page); > > dtor = (void (*)(struct page *))page[1].mapping; > (*dtor)(page); > } > return; > } > if (!PageReserved(page) && put_page_testzero(page)) > __page_cache_release(page); > } Dang. Missed that patch in the series. My mistake. -- David Gibson | For every complex problem there is a david AT gibson.dropbear.id.au | solution which is simple, neat and | wrong. http://www.ozlabs.org/people/dgibson ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2004-04-27 5:20 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2004-04-23 8:18 put_page() tries to handle hugepages but fails David Gibson 2004-04-23 8:34 ` Andrew Morton 2004-04-23 10:28 ` William Lee Irwin III 2004-04-23 10:35 ` Andrew Morton 2004-04-23 10:47 ` William Lee Irwin III 2004-04-23 11:28 ` William Lee Irwin III 2004-04-27 4:36 ` David Gibson 2004-04-27 4:41 ` David Gibson 2004-04-27 4:47 ` Andrew Morton 2004-04-27 5:05 ` David Gibson 2004-04-27 5:15 ` Andrew Morton 2004-04-27 5:16 ` David Gibson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox