* [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock @ 2012-09-20 1:19 David Rientjes 2012-09-20 4:01 ` Hugh Dickins 2012-09-28 13:30 ` Andrea Arcangeli 0 siblings, 2 replies; 15+ messages in thread From: David Rientjes @ 2012-09-20 1:19 UTC (permalink / raw) To: Linus Torvalds, Andrew Morton Cc: Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel, linux-mm, stable When a transparent hugepage is mapped and it is included in an mlock() range, follow_page() incorrectly avoids setting the page's mlock bit and moving it to the unevictable lru. This is evident if you try to mlock(), munlock(), and then mlock() a range again. Currently: #define MAP_SIZE (4 << 30) /* 4GB */ void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); mlock(ptr, MAP_SIZE); $ grep -E "Unevictable|Inactive\(anon" /proc/meminfo Inactive(anon): 6304 kB Unevictable: 4213924 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4186252 kB Unevictable: 19652 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4198556 kB Unevictable: 21684 kB Notice that less than 2MB was added to the unevictable list; this is because these pages in the range are not transparent hugepages since the 4GB range was allocated with mmap() and has no specific alignment. If posix_memalign() were used instead, unevictable would not have grown at all on the second mlock(). The fix is to call mlock_vma_page() so that the mlock bit is set and the page is added to the unevictable list. With this patch: mlock(ptr, MAP_SIZE); Inactive(anon): 4056 kB Unevictable: 4213940 kB munlock(ptr, MAP_SIZE); Inactive(anon): 4198268 kB Unevictable: 19636 kB mlock(ptr, MAP_SIZE); Inactive(anon): 4008 kB Unevictable: 4213940 kB Cc: stable@vger.kernel.org [v2.6.38+] Signed-off-by: David Rientjes <rientjes@google.com> --- include/linux/huge_mm.h | 2 +- mm/huge_memory.c | 11 ++++++++++- mm/memory.c | 2 +- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -12,7 +12,7 @@ extern int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, pmd_t *pmd, pmd_t orig_pmd); extern pgtable_t get_pmd_huge_pte(struct mm_struct *mm); -extern struct page *follow_trans_huge_pmd(struct mm_struct *mm, +extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -997,11 +997,12 @@ out: return ret; } -struct page *follow_trans_huge_pmd(struct mm_struct *mm, +struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags) { + struct mm_struct *mm = vma->vm_mm; struct page *page = NULL; assert_spin_locked(&mm->page_table_lock); @@ -1024,6 +1025,14 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm, _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); set_pmd_at(mm, addr & HPAGE_PMD_MASK, pmd, _pmd); } + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { + if (page->mapping && trylock_page(page)) { + lru_add_drain(); + if (page->mapping) + mlock_vma_page(page); + unlock_page(page); + } + } page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; VM_BUG_ON(!PageCompound(page)); if (flags & FOLL_GET) diff --git a/mm/memory.c b/mm/memory.c --- a/mm/memory.c +++ b/mm/memory.c @@ -1521,7 +1521,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, spin_unlock(&mm->page_table_lock); wait_split_huge_page(vma->anon_vma, pmd); } else { - page = follow_trans_huge_pmd(mm, address, + page = follow_trans_huge_pmd(vma, address, pmd, flags); spin_unlock(&mm->page_table_lock); goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock 2012-09-20 1:19 [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock David Rientjes @ 2012-09-20 4:01 ` Hugh Dickins 2012-09-27 1:40 ` David Rientjes 2012-09-28 13:30 ` Andrea Arcangeli 1 sibling, 1 reply; 15+ messages in thread From: Hugh Dickins @ 2012-09-20 4:01 UTC (permalink / raw) To: David Rientjes Cc: Linus Torvalds, Andrew Morton, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, 18 Sep 2012, David Rientjes wrote: > When a transparent hugepage is mapped and it is included in an mlock() > range, follow_page() incorrectly avoids setting the page's mlock bit and > moving it to the unevictable lru. > > This is evident if you try to mlock(), munlock(), and then mlock() a > range again. Currently: > > #define MAP_SIZE (4 << 30) /* 4GB */ > > void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > mlock(ptr, MAP_SIZE); > > $ grep -E "Unevictable|Inactive\(anon" /proc/meminfo > Inactive(anon): 6304 kB > Unevictable: 4213924 kB > > munlock(ptr, MAP_SIZE); > > Inactive(anon): 4186252 kB > Unevictable: 19652 kB > > mlock(ptr, MAP_SIZE); > > Inactive(anon): 4198556 kB > Unevictable: 21684 kB > > Notice that less than 2MB was added to the unevictable list; this is > because these pages in the range are not transparent hugepages since the > 4GB range was allocated with mmap() and has no specific alignment. If > posix_memalign() were used instead, unevictable would not have grown at > all on the second mlock(). > > The fix is to call mlock_vma_page() so that the mlock bit is set and the > page is added to the unevictable list. With this patch: > > mlock(ptr, MAP_SIZE); > > Inactive(anon): 4056 kB > Unevictable: 4213940 kB > > munlock(ptr, MAP_SIZE); > > Inactive(anon): 4198268 kB > Unevictable: 19636 kB > > mlock(ptr, MAP_SIZE); > > Inactive(anon): 4008 kB > Unevictable: 4213940 kB > > Cc: stable@vger.kernel.org [v2.6.38+] > Signed-off-by: David Rientjes <rientjes@google.com> Good catch, and the patch looks right to me, as far as it goes: but does it go far enough? I hesitate because it looks as if the NR_MLOCK zone page state is maintained (with incs and decs) in ignorance of THP; so although you will be correcting the Unevictable kB with your mlock_vma_page(), the Mlocked kB just above it in /proc/meminfo would still be wrong? And this is all a matter of the numbers shown in /proc/meminfo, isn't it? Hmm, and some list balancing ratios, I suppose. I mean, these pages are effectively locked in memory, aren't they, even without being properly counted? When page reclaim comes to evict them, it will find them in a VM_LOCKED area and then move to unevictable. Ah, but probably after splitting the THP: it would be nice to avoid that. I suppose I'm not sure whether this is material for late-3.6: surely it's not a fix for a recent regression? > --- > include/linux/huge_mm.h | 2 +- > mm/huge_memory.c | 11 ++++++++++- > mm/memory.c | 2 +- > 3 files changed, 12 insertions(+), 3 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -12,7 +12,7 @@ extern int do_huge_pmd_wp_page(struct mm_struct *mm, struct vm_area_struct *vma, > unsigned long address, pmd_t *pmd, > pmd_t orig_pmd); > extern pgtable_t get_pmd_huge_pte(struct mm_struct *mm); > -extern struct page *follow_trans_huge_pmd(struct mm_struct *mm, > +extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > unsigned long addr, > pmd_t *pmd, > unsigned int flags); > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -997,11 +997,12 @@ out: > return ret; > } > > -struct page *follow_trans_huge_pmd(struct mm_struct *mm, > +struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, > unsigned long addr, > pmd_t *pmd, > unsigned int flags) > { > + struct mm_struct *mm = vma->vm_mm; > struct page *page = NULL; > > assert_spin_locked(&mm->page_table_lock); > @@ -1024,6 +1025,14 @@ struct page *follow_trans_huge_pmd(struct mm_struct *mm, > _pmd = pmd_mkyoung(pmd_mkdirty(*pmd)); > set_pmd_at(mm, addr & HPAGE_PMD_MASK, pmd, _pmd); > } > + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { > + if (page->mapping && trylock_page(page)) { > + lru_add_drain(); > + if (page->mapping) Amusingly, in another thread (for mmotm), Hannes and I were discussing this very code block that you have copied from follow_page(), and we concluded that this page->mapping check is not necessary. But you're absolutely right to copy it as is, then I can come and remove it later. Hugh > + mlock_vma_page(page); > + unlock_page(page); > + } > + } > page += (addr & ~HPAGE_PMD_MASK) >> PAGE_SHIFT; > VM_BUG_ON(!PageCompound(page)); > if (flags & FOLL_GET) > diff --git a/mm/memory.c b/mm/memory.c > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -1521,7 +1521,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > spin_unlock(&mm->page_table_lock); > wait_split_huge_page(vma->anon_vma, pmd); > } else { > - page = follow_trans_huge_pmd(mm, address, > + page = follow_trans_huge_pmd(vma, address, > pmd, flags); > spin_unlock(&mm->page_table_lock); > goto out; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock 2012-09-20 4:01 ` Hugh Dickins @ 2012-09-27 1:40 ` David Rientjes 2012-09-27 1:54 ` Hugh Dickins ` (2 more replies) 0 siblings, 3 replies; 15+ messages in thread From: David Rientjes @ 2012-09-27 1:40 UTC (permalink / raw) To: Hugh Dickins, Andrew Morton Cc: Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, 19 Sep 2012, Hugh Dickins wrote: > Good catch, and the patch looks right to me, as far as it goes: > but does it go far enough? > > I hesitate because it looks as if the NR_MLOCK zone page state is > maintained (with incs and decs) in ignorance of THP; so although > you will be correcting the Unevictable kB with your mlock_vma_page(), > the Mlocked kB just above it in /proc/meminfo would still be wrong? > Indeed, NR_MLOCK is a separate problem with regard to thp and it's currently incremented once for every hugepage rather than HPAGE_PMD_NR. mlock_vma_page() needs to increment by hpage_nr_pages(page) like add_page_to_lru_list() does. > I suppose I'm not sure whether this is material for late-3.6: > surely it's not a fix for a recent regression? > Ok, sounds good. If there's no objection, I'd like to ask Andrew to apply this to -mm and remove the cc to stable@vger.kernel.org since the mlock_vma_page() problem above is separate and doesn't conflict with this code, so I'll send a followup patch to address that. Thanks! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock 2012-09-27 1:40 ` David Rientjes @ 2012-09-27 1:54 ` Hugh Dickins 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes 2012-10-01 15:22 ` [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock Linus Torvalds 2 siblings, 0 replies; 15+ messages in thread From: Hugh Dickins @ 2012-09-27 1:54 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, 26 Sep 2012, David Rientjes wrote: > On Wed, 19 Sep 2012, Hugh Dickins wrote: > > > Good catch, and the patch looks right to me, as far as it goes: > > but does it go far enough? > > > > I hesitate because it looks as if the NR_MLOCK zone page state is > > maintained (with incs and decs) in ignorance of THP; so although > > you will be correcting the Unevictable kB with your mlock_vma_page(), > > the Mlocked kB just above it in /proc/meminfo would still be wrong? > > > > Indeed, NR_MLOCK is a separate problem with regard to thp and it's > currently incremented once for every hugepage rather than HPAGE_PMD_NR. > mlock_vma_page() needs to increment by hpage_nr_pages(page) like > add_page_to_lru_list() does. > > > I suppose I'm not sure whether this is material for late-3.6: > > surely it's not a fix for a recent regression? > > > > Ok, sounds good. If there's no objection, I'd like to ask Andrew to apply > this to -mm and remove the cc to stable@vger.kernel.org since the > mlock_vma_page() problem above is separate and doesn't conflict with this > code, so I'll send a followup patch to address that. > > Thanks! Sounds right, certainly no objection from me, thanks for taking care of it. Hugh -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch] mm, thp: fix mlock statistics 2012-09-27 1:40 ` David Rientjes 2012-09-27 1:54 ` Hugh Dickins @ 2012-09-27 2:29 ` David Rientjes 2012-09-28 1:32 ` Hugh Dickins ` (2 more replies) 2012-10-01 15:22 ` [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock Linus Torvalds 2 siblings, 3 replies; 15+ messages in thread From: David Rientjes @ 2012-09-27 2:29 UTC (permalink / raw) To: Hugh Dickins, Andrew Morton Cc: Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable NR_MLOCK is only accounted in single page units: there's no logic to handle transparent hugepages. This patch checks the appropriate number of pages to adjust the statistics by so that the correct amount of memory is reflected. Currently: $ grep Mlocked /proc/meminfo Mlocked: 19636 kB #define MAP_SIZE (4 << 30) /* 4GB */ void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); mlock(ptr, MAP_SIZE); $ grep Mlocked /proc/meminfo Mlocked: 29844 kB munlock(ptr, MAP_SIZE); $ grep Mlocked /proc/meminfo Mlocked: 19636 kB And with this patch: $ grep Mlock /proc/meminfo Mlocked: 19636 kB mlock(ptr, MAP_SIZE); $ grep Mlock /proc/meminfo Mlocked: 4213664 kB munlock(ptr, MAP_SIZE); $ grep Mlock /proc/meminfo Mlocked: 19636 kB Reported-by: Hugh Dickens <hughd@google.com> Signed-off-by: David Rientjes <rientjes@google.com> --- mm/internal.h | 3 ++- mm/mlock.c | 6 ++++-- mm/page_alloc.c | 2 +- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/mm/internal.h b/mm/internal.h --- a/mm/internal.h +++ b/mm/internal.h @@ -180,7 +180,8 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma, return 0; if (!TestSetPageMlocked(page)) { - inc_zone_page_state(page, NR_MLOCK); + mod_zone_page_state(page_zone(page), NR_MLOCK, + hpage_nr_pages(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); } return 1; diff --git a/mm/mlock.c b/mm/mlock.c --- a/mm/mlock.c +++ b/mm/mlock.c @@ -81,7 +81,8 @@ void mlock_vma_page(struct page *page) BUG_ON(!PageLocked(page)); if (!TestSetPageMlocked(page)) { - inc_zone_page_state(page, NR_MLOCK); + mod_zone_page_state(page_zone(page), NR_MLOCK, + hpage_nr_pages(page)); count_vm_event(UNEVICTABLE_PGMLOCKED); if (!isolate_lru_page(page)) putback_lru_page(page); @@ -108,7 +109,8 @@ void munlock_vma_page(struct page *page) BUG_ON(!PageLocked(page)); if (TestClearPageMlocked(page)) { - dec_zone_page_state(page, NR_MLOCK); + mod_zone_page_state(page_zone(page), NR_MLOCK, + -hpage_nr_pages(page)); if (!isolate_lru_page(page)) { int ret = SWAP_AGAIN; diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -604,7 +604,7 @@ out: */ static inline void free_page_mlock(struct page *page) { - __dec_zone_page_state(page, NR_MLOCK); + __mod_zone_page_state(page_zone(page), NR_MLOCK, -hpage_nr_pages(page)); __count_vm_event(UNEVICTABLE_MLOCKFREED); } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] mm, thp: fix mlock statistics 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes @ 2012-09-28 1:32 ` Hugh Dickins 2012-10-03 20:10 ` Andrew Morton 2012-09-28 13:51 ` [patch] mm, thp: fix mlock statistics Andrea Arcangeli 2012-09-29 0:11 ` Michel Lespinasse 2 siblings, 1 reply; 15+ messages in thread From: Hugh Dickins @ 2012-09-28 1:32 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, 26 Sep 2012, David Rientjes wrote: > NR_MLOCK is only accounted in single page units: there's no logic to > handle transparent hugepages. This patch checks the appropriate number > of pages to adjust the statistics by so that the correct amount of memory > is reflected. > > Currently: > > $ grep Mlocked /proc/meminfo > Mlocked: 19636 kB > > #define MAP_SIZE (4 << 30) /* 4GB */ > > void *ptr = mmap(NULL, MAP_SIZE, PROT_READ | PROT_WRITE, > MAP_PRIVATE | MAP_ANONYMOUS, 0, 0); > mlock(ptr, MAP_SIZE); > > $ grep Mlocked /proc/meminfo > Mlocked: 29844 kB > > munlock(ptr, MAP_SIZE); > > $ grep Mlocked /proc/meminfo > Mlocked: 19636 kB > > And with this patch: > > $ grep Mlock /proc/meminfo > Mlocked: 19636 kB > > mlock(ptr, MAP_SIZE); > > $ grep Mlock /proc/meminfo > Mlocked: 4213664 kB > > munlock(ptr, MAP_SIZE); > > $ grep Mlock /proc/meminfo > Mlocked: 19636 kB > > Reported-by: Hugh Dickens <hughd@google.com> I do prefer Dickins :) > Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Hugh Dickins <hughd@google.com> Yes, this now seems to be working nicely, thanks. I would have preferred you to omit the free_page_mlock() part, since that sets me wondering about what flags might be set to mean what at that point; but since it should never get there anyway, and we'll be removing it entirely from v3.7, never mind. (In doing that, I shall need to consider whether clear_page_mlock() then needs hpage_nr_pages, but your patch below is perfectly correct to omit it.) If I understand aright, in another (thp: avoid VM_BUG_ON) thread, Linus remarks that he's noticed this and your matching Unevictable patch (that I had thought too late for v3.6), and is hoping for Acks so that he can put them into v3.6 after all. So despite my earlier reluctance, please take this as an Ack on that one too (I was testing them together): it'll be odd if one of them goes to stable and the other not, but we can sort that out with GregKH later. Hugh > --- > mm/internal.h | 3 ++- > mm/mlock.c | 6 ++++-- > mm/page_alloc.c | 2 +- > 3 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -180,7 +180,8 @@ static inline int mlocked_vma_newpage(struct vm_area_struct *vma, > return 0; > > if (!TestSetPageMlocked(page)) { > - inc_zone_page_state(page, NR_MLOCK); > + mod_zone_page_state(page_zone(page), NR_MLOCK, > + hpage_nr_pages(page)); > count_vm_event(UNEVICTABLE_PGMLOCKED); > } > return 1; > diff --git a/mm/mlock.c b/mm/mlock.c > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -81,7 +81,8 @@ void mlock_vma_page(struct page *page) > BUG_ON(!PageLocked(page)); > > if (!TestSetPageMlocked(page)) { > - inc_zone_page_state(page, NR_MLOCK); > + mod_zone_page_state(page_zone(page), NR_MLOCK, > + hpage_nr_pages(page)); > count_vm_event(UNEVICTABLE_PGMLOCKED); > if (!isolate_lru_page(page)) > putback_lru_page(page); > @@ -108,7 +109,8 @@ void munlock_vma_page(struct page *page) > BUG_ON(!PageLocked(page)); > > if (TestClearPageMlocked(page)) { > - dec_zone_page_state(page, NR_MLOCK); > + mod_zone_page_state(page_zone(page), NR_MLOCK, > + -hpage_nr_pages(page)); > if (!isolate_lru_page(page)) { > int ret = SWAP_AGAIN; > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -604,7 +604,7 @@ out: > */ > static inline void free_page_mlock(struct page *page) > { > - __dec_zone_page_state(page, NR_MLOCK); > + __mod_zone_page_state(page_zone(page), NR_MLOCK, -hpage_nr_pages(page)); > __count_vm_event(UNEVICTABLE_MLOCKFREED); > } > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] mm, thp: fix mlock statistics 2012-09-28 1:32 ` Hugh Dickins @ 2012-10-03 20:10 ` Andrew Morton 2012-10-03 21:10 ` [patch -mm] mm, thp: fix mlock statistics fix David Rientjes 0 siblings, 1 reply; 15+ messages in thread From: Andrew Morton @ 2012-10-03 20:10 UTC (permalink / raw) To: Hugh Dickins Cc: David Rientjes, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Thu, 27 Sep 2012 18:32:33 -0700 (PDT) Hugh Dickins <hughd@google.com> wrote: > So despite my earlier reluctance, please take this as an Ack on that > one too (I was testing them together): it'll be odd if one of them goes > to stable and the other not, but we can sort that out with GregKH later. Yes, all this code has changed so much since 3.6 that new patches will need to be prepared for -stable. The free_page_mlock() hunk gets dropped because free_page_mlock() is removed. And clear_page_mlock() doesn't need this treatment. But please check my handiwork. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* [patch -mm] mm, thp: fix mlock statistics fix 2012-10-03 20:10 ` Andrew Morton @ 2012-10-03 21:10 ` David Rientjes 2012-10-03 21:25 ` Andrew Morton 2012-10-03 21:29 ` Hugh Dickins 0 siblings, 2 replies; 15+ messages in thread From: David Rientjes @ 2012-10-03 21:10 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm On Wed, 3 Oct 2012, Andrew Morton wrote: > The free_page_mlock() hunk gets dropped because free_page_mlock() is > removed. And clear_page_mlock() doesn't need this treatment. But > please check my handiwork. > I reviewed what was merged into -mm and clear_page_mlock() does need this fix as well. It's an easy fix, there's no need to pass "anon" into clear_page_mlock() since PageHuge() is already checked in its only caller. mm, thp: fix mlock statistics fix Signed-off-by: David Rientjes <rientjes@google.com> --- mm/mlock.c | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/mm/mlock.c b/mm/mlock.c --- a/mm/mlock.c +++ b/mm/mlock.c @@ -56,7 +56,8 @@ void clear_page_mlock(struct page *page) if (!TestClearPageMlocked(page)) return; - dec_zone_page_state(page, NR_MLOCK); + mod_zone_page_state(page_zone(page), NR_MLOCK, + -hpage_nr_pages(page)); count_vm_event(UNEVICTABLE_PGCLEARED); if (!isolate_lru_page(page)) { putback_lru_page(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch -mm] mm, thp: fix mlock statistics fix 2012-10-03 21:10 ` [patch -mm] mm, thp: fix mlock statistics fix David Rientjes @ 2012-10-03 21:25 ` Andrew Morton 2012-10-03 21:46 ` Hugh Dickins 2012-10-03 21:29 ` Hugh Dickins 1 sibling, 1 reply; 15+ messages in thread From: Andrew Morton @ 2012-10-03 21:25 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm On Wed, 3 Oct 2012 14:10:41 -0700 (PDT) David Rientjes <rientjes@google.com> wrote: > > The free_page_mlock() hunk gets dropped because free_page_mlock() is > > removed. And clear_page_mlock() doesn't need this treatment. But > > please check my handiwork. > > > > I reviewed what was merged into -mm and clear_page_mlock() does need this > fix as well. argh, it got me *again*. grr. From: Andrew Morton <akpm@linux-foundation.org> Subject: mm: document PageHuge somewhat Cc: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> --- mm/hugetlb.c | 5 +++++ 1 file changed, 5 insertions(+) diff -puN mm/hugetlb.c~mm-document-pagehuge-somewhat mm/hugetlb.c --- a/mm/hugetlb.c~mm-document-pagehuge-somewhat +++ a/mm/hugetlb.c @@ -671,6 +671,11 @@ static void prep_compound_gigantic_page( } } +/* + * PageHuge() only returns true for hugetlbfs pages, but not for normal or + * transparent huge pages. See the PageTransHuge() documentation for more + * details. + */ int PageHuge(struct page *page) { compound_page_dtor *dtor; _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch -mm] mm, thp: fix mlock statistics fix 2012-10-03 21:25 ` Andrew Morton @ 2012-10-03 21:46 ` Hugh Dickins 0 siblings, 0 replies; 15+ messages in thread From: Hugh Dickins @ 2012-10-03 21:46 UTC (permalink / raw) To: Andrew Morton Cc: David Rientjes, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm On Wed, 3 Oct 2012, Andrew Morton wrote: > On Wed, 3 Oct 2012 14:10:41 -0700 (PDT) > David Rientjes <rientjes@google.com> wrote: > > > > The free_page_mlock() hunk gets dropped because free_page_mlock() is > > > removed. And clear_page_mlock() doesn't need this treatment. But > > > please check my handiwork. > > > > > > > I reviewed what was merged into -mm and clear_page_mlock() does need this > > fix as well. > > argh, it got me *again*. grr. I've no objection to more documentation on PageHuge, but neither you nor it were to blame for that "oversight". It's simply that David's original patch clearly did not need such a change in clear_page_mlock(), because it could never be necessary from where it was then called; but I changed where it's called, whereupon it becomes evident that the extra is needed. "evident" puts it rather too strongly. Most munlocking happens through munlock_vma_page() instead, but the clear_page_mlock() path covers truncation. THPages cannot be file pages at present, but perhaps they could be anonymous pages COWed from file pages (I've not checked the exact criteria THP applies)? In which case, subject to truncation too. Hugh > > From: Andrew Morton <akpm@linux-foundation.org> > Subject: mm: document PageHuge somewhat > > Cc: David Rientjes <rientjes@google.com> > Cc: Mel Gorman <mel@csn.ul.ie> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > mm/hugetlb.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff -puN mm/hugetlb.c~mm-document-pagehuge-somewhat mm/hugetlb.c > --- a/mm/hugetlb.c~mm-document-pagehuge-somewhat > +++ a/mm/hugetlb.c > @@ -671,6 +671,11 @@ static void prep_compound_gigantic_page( > } > } > > +/* > + * PageHuge() only returns true for hugetlbfs pages, but not for normal or > + * transparent huge pages. See the PageTransHuge() documentation for more > + * details. > + */ > int PageHuge(struct page *page) > { > compound_page_dtor *dtor; > _ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch -mm] mm, thp: fix mlock statistics fix 2012-10-03 21:10 ` [patch -mm] mm, thp: fix mlock statistics fix David Rientjes 2012-10-03 21:25 ` Andrew Morton @ 2012-10-03 21:29 ` Hugh Dickins 1 sibling, 0 replies; 15+ messages in thread From: Hugh Dickins @ 2012-10-03 21:29 UTC (permalink / raw) To: David Rientjes Cc: Andrew Morton, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm On Wed, 3 Oct 2012, David Rientjes wrote: > On Wed, 3 Oct 2012, Andrew Morton wrote: > > > The free_page_mlock() hunk gets dropped because free_page_mlock() is > > removed. And clear_page_mlock() doesn't need this treatment. But > > please check my handiwork. > > > > I reviewed what was merged into -mm and clear_page_mlock() does need this > fix as well. It's an easy fix, there's no need to pass "anon" into > clear_page_mlock() since PageHuge() is already checked in its only caller. > > > mm, thp: fix mlock statistics fix > > Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Hugh Dickins <hughd@google.com> Thanks for providing that, David, I was just on the point of building and working out a test, suspecting that what you've added is necessary. Probably some equivalent always was missing, but between the THP Mlock counting issue that you're fixing, and the Mlock counting issue that I'm fixing by moving the clear_page_mlock, it's hard to say just where. While clear_page_mlock was being called from truncate.c, we knew that it couldn't happen on a THP. But now that it's from page_remove_rmap, yes, we do want to add in this additional fix. Hugh > --- > mm/mlock.c | 3 ++- > 1 files changed, 2 insertions(+), 1 deletions(-) > > diff --git a/mm/mlock.c b/mm/mlock.c > --- a/mm/mlock.c > +++ b/mm/mlock.c > @@ -56,7 +56,8 @@ void clear_page_mlock(struct page *page) > if (!TestClearPageMlocked(page)) > return; > > - dec_zone_page_state(page, NR_MLOCK); > + mod_zone_page_state(page_zone(page), NR_MLOCK, > + -hpage_nr_pages(page)); > count_vm_event(UNEVICTABLE_PGCLEARED); > if (!isolate_lru_page(page)) { > putback_lru_page(page); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] mm, thp: fix mlock statistics 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes 2012-09-28 1:32 ` Hugh Dickins @ 2012-09-28 13:51 ` Andrea Arcangeli 2012-09-29 0:11 ` Michel Lespinasse 2 siblings, 0 replies; 15+ messages in thread From: Andrea Arcangeli @ 2012-09-28 13:51 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, Andrew Morton, Linus Torvalds, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, Sep 26, 2012 at 07:29:58PM -0700, David Rientjes wrote: > NR_MLOCK is only accounted in single page units: there's no logic to > handle transparent hugepages. This patch checks the appropriate number > of pages to adjust the statistics by so that the correct amount of memory > is reflected. *snip* > Reported-by: Hugh Dickens <hughd@google.com> > Signed-off-by: David Rientjes <rientjes@google.com> > --- > mm/internal.h | 3 ++- > mm/mlock.c | 6 ++++-- > mm/page_alloc.c | 2 +- > 3 files changed, 7 insertions(+), 4 deletions(-) Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Thanks! Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch] mm, thp: fix mlock statistics 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes 2012-09-28 1:32 ` Hugh Dickins 2012-09-28 13:51 ` [patch] mm, thp: fix mlock statistics Andrea Arcangeli @ 2012-09-29 0:11 ` Michel Lespinasse 2 siblings, 0 replies; 15+ messages in thread From: Michel Lespinasse @ 2012-09-29 0:11 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, Andrew Morton, Linus Torvalds, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, linux-kernel, linux-mm, stable On Wed, Sep 26, 2012 at 7:29 PM, David Rientjes <rientjes@google.com> wrote: > NR_MLOCK is only accounted in single page units: there's no logic to > handle transparent hugepages. This patch checks the appropriate number > of pages to adjust the statistics by so that the correct amount of memory > is reflected. > > Reported-by: Hugh Dickens <hughd@google.com> > Signed-off-by: David Rientjes <rientjes@google.com> Looks good, thanks! Reviewed-by: Michel Lespinasse <walken@google.com> -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock 2012-09-27 1:40 ` David Rientjes 2012-09-27 1:54 ` Hugh Dickins 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes @ 2012-10-01 15:22 ` Linus Torvalds 2 siblings, 0 replies; 15+ messages in thread From: Linus Torvalds @ 2012-10-01 15:22 UTC (permalink / raw) To: David Rientjes Cc: Hugh Dickins, Andrew Morton, Andrea Arcangeli, Naoya Horiguchi, KAMEZAWA Hiroyuki, Johannes Weiner, Michel Lespinasse, linux-kernel, linux-mm, stable On Wed, Sep 26, 2012 at 6:40 PM, David Rientjes <rientjes@google.com> wrote: > > Ok, sounds good. If there's no objection, I'd like to ask Andrew to apply > this to -mm and remove the cc to stable@vger.kernel.org since the > mlock_vma_page() problem above is separate and doesn't conflict with this > code, so I'll send a followup patch to address that. So I deferred this (and the "mm, thp: fix mlock statistics" one) to after 3.6, because Andrea indicated that they aren't critical. Now I'd be ready to take them, but I suspect they are already in Andrew's queue and I can forget about them. Please holler if I need to take the two thp patches directly.. Linus -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock 2012-09-20 1:19 [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock David Rientjes 2012-09-20 4:01 ` Hugh Dickins @ 2012-09-28 13:30 ` Andrea Arcangeli 1 sibling, 0 replies; 15+ messages in thread From: Andrea Arcangeli @ 2012-09-28 13:30 UTC (permalink / raw) To: David Rientjes Cc: Linus Torvalds, Andrew Morton, Naoya Horiguchi, KAMEZAWA Hiroyuki, Hugh Dickins, linux-kernel, linux-mm, stable Hi David, On Wed, Sep 19, 2012 at 06:19:27PM -0700, David Rientjes wrote: > + if ((flags & FOLL_MLOCK) && (vma->vm_flags & VM_LOCKED)) { > + if (page->mapping && trylock_page(page)) { > + lru_add_drain(); > + if (page->mapping) > + mlock_vma_page(page); > + unlock_page(page); > + } > + } Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Without the patch the kernel will be perfectly fine too, this is is only to show more "uptodate" values in meminfo. The meminfo would eventually go in sync as the vmscan started walking lrus and the old behavior will still happen when trylock fails. Without the patch the refiling events happen lazily as needed, now they happen even if they're not needed. In some ways we could drop this and also the 4k case and we'd overall improve performance. But transparent hugepages must behave identical to 4k pages, so unless we remove it from the 4k case, it's certainly good to apply the above. The patch can be deferred to 3.7 if needed. Thanks! Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2012-10-03 21:47 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-09-20 1:19 [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock David Rientjes 2012-09-20 4:01 ` Hugh Dickins 2012-09-27 1:40 ` David Rientjes 2012-09-27 1:54 ` Hugh Dickins 2012-09-27 2:29 ` [patch] mm, thp: fix mlock statistics David Rientjes 2012-09-28 1:32 ` Hugh Dickins 2012-10-03 20:10 ` Andrew Morton 2012-10-03 21:10 ` [patch -mm] mm, thp: fix mlock statistics fix David Rientjes 2012-10-03 21:25 ` Andrew Morton 2012-10-03 21:46 ` Hugh Dickins 2012-10-03 21:29 ` Hugh Dickins 2012-09-28 13:51 ` [patch] mm, thp: fix mlock statistics Andrea Arcangeli 2012-09-29 0:11 ` Michel Lespinasse 2012-10-01 15:22 ` [patch for-3.6] mm, thp: fix mapped pages avoiding unevictable list on mlock Linus Torvalds 2012-09-28 13:30 ` Andrea Arcangeli
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).