* THP, rmap and page_referenced_one() @ 2011-03-07 6:50 Michel Lespinasse 2011-03-07 7:33 ` KOSAKI Motohiro 2011-03-07 8:35 ` Minchan Kim 0 siblings, 2 replies; 11+ messages in thread From: Michel Lespinasse @ 2011-03-07 6:50 UTC (permalink / raw) To: Andrea Arcangeli, Rik van Riel, Hugh Dickins; +Cc: Andrew Morton, linux-mm Hi, I have been wondering about the following: Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in page_referenced_one() was placed after the page_check_address() call, but now it is placed above it. Could this be a problem ? My understanding is that the page_check_address() check may return false positives - for example, if an anon page was created before a process forked, rmap will indicate that the page could be mapped in both of the processes, even though one of them might have since broken COW. What would happen if the child process mlocks the corresponding VMA ? my understanding is that this would break COW, but not cause rmap to be updated, so the parent's page would still be marked in rmap as being possibly mapped in the children's VM_LOCKED vma. With the VM_LOCKED check now placed above the page_check_address() call, this would cause vmscan to see both the parent's and the child's pages as being unevictable. Am I missing something there ? In particular, I am not sure if marking the children's VMA as mlocked would somehow cause rmap to realize it can't share pages with the parent anymore (but I don't think that's the case, and it could cause other issues if it was...) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-07 6:50 THP, rmap and page_referenced_one() Michel Lespinasse @ 2011-03-07 7:33 ` KOSAKI Motohiro 2011-03-07 8:35 ` Minchan Kim 1 sibling, 0 replies; 11+ messages in thread From: KOSAKI Motohiro @ 2011-03-07 7:33 UTC (permalink / raw) To: Michel Lespinasse Cc: kosaki.motohiro, Andrea Arcangeli, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm > Hi, > > I have been wondering about the following: > > Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in > page_referenced_one() was placed after the page_check_address() call, > but now it is placed above it. Could this be a problem ? > > My understanding is that the page_check_address() check may return > false positives - for example, if an anon page was created before a > process forked, rmap will indicate that the page could be mapped in > both of the processes, even though one of them might have since broken > COW. What would happen if the child process mlocks the corresponding > VMA ? my understanding is that this would break COW, but not cause > rmap to be updated, so the parent's page would still be marked in rmap > as being possibly mapped in the children's VM_LOCKED vma. With the > VM_LOCKED check now placed above the page_check_address() call, this > would cause vmscan to see both the parent's and the child's pages as > being unevictable. > > Am I missing something there ? In particular, I am not sure if marking > the children's VMA as mlocked would somehow cause rmap to realize it > can't share pages with the parent anymore (but I don't think that's > the case, and it could cause other issues if it was...) Hi I think you are right. page_check_address() should be called before VM_LOCKED check. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-07 6:50 THP, rmap and page_referenced_one() Michel Lespinasse 2011-03-07 7:33 ` KOSAKI Motohiro @ 2011-03-07 8:35 ` Minchan Kim 2011-03-07 10:35 ` Michel Lespinasse 1 sibling, 1 reply; 11+ messages in thread From: Minchan Kim @ 2011-03-07 8:35 UTC (permalink / raw) To: Michel Lespinasse Cc: Andrea Arcangeli, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm On Mon, Mar 7, 2011 at 3:50 PM, Michel Lespinasse <walken@google.com> wrote: > Hi, > > I have been wondering about the following: > > Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in > page_referenced_one() was placed after the page_check_address() call, > but now it is placed above it. Could this be a problem ? > > My understanding is that the page_check_address() check may return > false positives - for example, if an anon page was created before a > process forked, rmap will indicate that the page could be mapped in > both of the processes, even though one of them might have since broken > COW. What would happen if the child process mlocks the corresponding > VMA ? my understanding is that this would break COW, but not cause > rmap to be updated, so the parent's page would still be marked in rmap > as being possibly mapped in the children's VM_LOCKED vma. With the > VM_LOCKED check now placed above the page_check_address() call, this > would cause vmscan to see both the parent's and the child's pages as > being unevictable. I agree. There are two processes called P_A, P_B. P_B is child of P_A. A page "page A" is share between V_A(A's VMA)and V_B(B's VMA) since P_B is created by forking from P_A. When P_B calls mlock the V_B, P_B allocates new page B instead of reusing page A by COW and mapped P_B's page table but rmap of page A still indicates page A is mapped by V_A and V_B. The page_check_address can filter this situation that V_B doesn't include page A any more. So page_check_address should be placed before checking the VM_LOCKED. I think it's valuable to add the comment why we need page_check_address should be placed before the checking VM_LOCKED. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-07 8:35 ` Minchan Kim @ 2011-03-07 10:35 ` Michel Lespinasse 2011-03-08 11:32 ` Andrea Arcangeli 0 siblings, 1 reply; 11+ messages in thread From: Michel Lespinasse @ 2011-03-07 10:35 UTC (permalink / raw) To: Minchan Kim Cc: Andrea Arcangeli, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm There is also the issue that *mapcount will be decremented even if the pmd turns out not to point to the given page. page_referenced() will stop looking at rmap's candidate mappings once the refcount hits zero, so the decrement will cause an actual mapping to be ignored. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-07 10:35 ` Michel Lespinasse @ 2011-03-08 11:32 ` Andrea Arcangeli 2011-03-08 12:21 ` Michel Lespinasse 2011-03-08 13:57 ` Rik van Riel 0 siblings, 2 replies; 11+ messages in thread From: Andrea Arcangeli @ 2011-03-08 11:32 UTC (permalink / raw) To: Michel Lespinasse Cc: Minchan Kim, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm Hello, On Mon, Mar 07, 2011 at 02:35:15AM -0800, Michel Lespinasse wrote: > There is also the issue that *mapcount will be decremented even if the > pmd turns out not to point to the given page. page_referenced() will > stop looking at rmap's candidate mappings once the refcount hits zero, > so the decrement will cause an actual mapping to be ignored. Both the mapcount and vm_flags problems would worst case lead to some page ("parent page" in the example provided with mlock in the child) being evicted too early, not to become unevictable. The mapcount is actually a bigger concern to me as it doesn't require the race. I also added a proper comment before page_check_address* as suggested by Minchan. Frankly when I updated the code I didn't realize that was the actual reason VM_LOCKED was placed in the middle between page_check_address and ptep_clear_flush_young_notify, now it's clear why... I only run some basic testing, please review. I seen no reason to return "referenced = 0" if the pmd is still splitting. So I let it go ahead now and test_and_set_bit the accessed bit even on a splitting pmd. After all the tlb miss could still activate the young bit on a pmd while it's in splitting state. There's no check for splitting in the pmdp_clear_flush_young. The secondary mmu has no secondary spte mapped while it's set to splitting so it shouldn't matter for it if we clear the young bit (and new secondary mmu page faults will wait on splitting to clear and __split_huge_page_map to finish before going ahead creating new secondary sptes with 4k granularity). Here a fix: === Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found From: Andrea Arcangeli <aarcange@redhat.com> When vmscan.c calls page_referenced, if an anon page was created before a process forked, rmap will search for it in both of the processes, even though one of them might have since broken COW. If the child process mlocks the vma where the COWed page belongs to, page_referenced() running on the page mapped by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading to the references on the parent page being ignored and evicting the parent page too early). *mapcount would also be decremented by page_referenced_one even if the page wasn't found by page_check_address. This also let pmdp_clear_flush_young_notify() go ahead on a pmd_trans_splitting() pmd. We hold the page_table_lock so __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to complete before it can modify the pmd. The pmd is also still mapped in userland so the young bit may materialize through a tlb miss before split_huge_page_map runs. This will provide a more accurate page_referenced() behavior during split_huge_page(). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> --- diff --git a/mm/rmap.c b/mm/rmap.c index f21f4a1..e8924bc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; int referenced = 0; - /* - * Don't want to elevate referenced for mlocked page that gets this far, - * in order that it progresses to try_to_unmap and is moved to the - * unevictable list. - */ - if (vma->vm_flags & VM_LOCKED) { - *mapcount = 0; /* break early from loop */ - *vm_flags |= VM_LOCKED; - goto out; - } - - /* Pretend the page is referenced if the task has the - swap token and is in the middle of a page fault. */ - if (mm != current->mm && has_swap_token(mm) && - rwsem_is_locked(&mm->mmap_sem)) - referenced++; - if (unlikely(PageTransHuge(page))) { pmd_t *pmd; spin_lock(&mm->page_table_lock); + /* + * We must update *vm_flags and *mapcount only if + * page_check_address_pmd() succeeds in finding the + * page. + */ pmd = page_check_address_pmd(page, mm, address, PAGE_CHECK_ADDRESS_PMD_FLAG); - if (pmd && !pmd_trans_splitting(*pmd) && - pmdp_clear_flush_young_notify(vma, address, pmd)) + if (!pmd) { + spin_unlock(&mm->page_table_lock); + goto out; + } + + /* + * Don't want to elevate referenced for mlocked page + * that gets this far, in order that it progresses to + * try_to_unmap and is moved to the unevictable list. + */ + if (vma->vm_flags & VM_LOCKED) { + spin_unlock(&mm->page_table_lock); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + + /* go ahead even if the pmd is pmd_trans_splitting() */ + if (pmdp_clear_flush_young_notify(vma, address, pmd)) referenced++; spin_unlock(&mm->page_table_lock); } else { pte_t *pte; spinlock_t *ptl; + /* + * We must update *vm_flags and *mapcount only if + * page_check_address() succeeds in finding the page. + */ pte = page_check_address(page, mm, address, &ptl, 0); if (!pte) goto out; + /* + * Don't want to elevate referenced for mlocked page + * that gets this far, in order that it progresses to + * try_to_unmap and is moved to the unevictable list. + */ + if (vma->vm_flags & VM_LOCKED) { + pte_unmap_unlock(pte, ptl); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + if (ptep_clear_flush_young_notify(vma, address, pte)) { /* * Don't treat a reference through a sequentially read @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, pte_unmap_unlock(pte, ptl); } + /* Pretend the page is referenced if the task has the + swap token and is in the middle of a page fault. */ + if (mm != current->mm && has_swap_token(mm) && + rwsem_is_locked(&mm->mmap_sem)) + referenced++; + (*mapcount)--; if (referenced) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 11:32 ` Andrea Arcangeli @ 2011-03-08 12:21 ` Michel Lespinasse 2011-03-08 12:58 ` Andrea Arcangeli 2011-03-08 13:57 ` Rik van Riel 1 sibling, 1 reply; 11+ messages in thread From: Michel Lespinasse @ 2011-03-08 12:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Minchan Kim, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: > I only run some basic testing, please review. I seen no reason to > return "referenced = 0" if the pmd is still splitting. So I let it go > ahead now and test_and_set_bit the accessed bit even on a splitting > pmd. After all the tlb miss could still activate the young bit on a > pmd while it's in splitting state. There's no check for splitting in > the pmdp_clear_flush_young. The secondary mmu has no secondary spte > mapped while it's set to splitting so it shouldn't matter for it if we > clear the young bit (and new secondary mmu page faults will wait on > splitting to clear and __split_huge_page_map to finish before going > ahead creating new secondary sptes with 4k granularity). Agree, the pmd_trans_splitting check didn't seem necessary. Thanks for the patch, looks fine, I only have a couple nitpicks regarding code comments: > === > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli <aarcange@redhat.com> > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() > to complete before it can modify the pmd. The pmd is also still mapped > in userland so the young bit may materialize through a tlb miss before > split_huge_page_map runs. This will provide a more accurate > page_referenced() behavior during split_huge_page(). > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> > --- > > diff --git a/mm/rmap.c b/mm/rmap.c > index f21f4a1..e8924bc 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, > struct mm_struct *mm = vma->vm_mm; > int referenced = 0; > > - /* > - * Don't want to elevate referenced for mlocked page that gets this far, > - * in order that it progresses to try_to_unmap and is moved to the > - * unevictable list. > - */ > - if (vma->vm_flags & VM_LOCKED) { > - *mapcount = 0; /* break early from loop */ > - *vm_flags |= VM_LOCKED; > - goto out; > - } > - > - /* Pretend the page is referenced if the task has the > - swap token and is in the middle of a page fault. */ > - if (mm != current->mm && has_swap_token(mm) && > - rwsem_is_locked(&mm->mmap_sem)) > - referenced++; > - > if (unlikely(PageTransHuge(page))) { > pmd_t *pmd; > > spin_lock(&mm->page_table_lock); > + /* > + * We must update *vm_flags and *mapcount only if > + * page_check_address_pmd() succeeds in finding the > + * page. > + */ I would propose: /* * rmap might return false positives; we must filter these out * using page_check_address_pmd(). */ (sorry for the nitpick, it's OK if you prefer your own version :) > pmd = page_check_address_pmd(page, mm, address, > PAGE_CHECK_ADDRESS_PMD_FLAG); > - if (pmd && !pmd_trans_splitting(*pmd) && > - pmdp_clear_flush_young_notify(vma, address, pmd)) > + if (!pmd) { > + spin_unlock(&mm->page_table_lock); > + goto out; > + } > + > + /* > + * Don't want to elevate referenced for mlocked page > + * that gets this far, in order that it progresses to > + * try_to_unmap and is moved to the unevictable list. > + */ Actually page_check_references() now ignores the referenced value when vm_flags & VM_LOCKED is set, so it'd be best to lose the above comment IMO. > + if (vma->vm_flags & VM_LOCKED) { > + spin_unlock(&mm->page_table_lock); > + *mapcount = 0; /* break early from loop */ > + *vm_flags |= VM_LOCKED; > + goto out; > + } > + > + /* go ahead even if the pmd is pmd_trans_splitting() */ > + if (pmdp_clear_flush_young_notify(vma, address, pmd)) > referenced++; > spin_unlock(&mm->page_table_lock); > } else { > pte_t *pte; > spinlock_t *ptl; > > + /* > + * We must update *vm_flags and *mapcount only if > + * page_check_address() succeeds in finding the page. > + */ (same as above) > pte = page_check_address(page, mm, address, &ptl, 0); > if (!pte) > goto out; > > + /* > + * Don't want to elevate referenced for mlocked page > + * that gets this far, in order that it progresses to > + * try_to_unmap and is moved to the unevictable list. > + */ (same as above) > + if (vma->vm_flags & VM_LOCKED) { > + pte_unmap_unlock(pte, ptl); > + *mapcount = 0; /* break early from loop */ > + *vm_flags |= VM_LOCKED; > + goto out; > + } > + > if (ptep_clear_flush_young_notify(vma, address, pte)) { > /* > * Don't treat a reference through a sequentially read > @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, > pte_unmap_unlock(pte, ptl); > } > > + /* Pretend the page is referenced if the task has the > + swap token and is in the middle of a page fault. */ > + if (mm != current->mm && has_swap_token(mm) && > + rwsem_is_locked(&mm->mmap_sem)) > + referenced++; > + > (*mapcount)--; > > if (referenced) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 12:21 ` Michel Lespinasse @ 2011-03-08 12:58 ` Andrea Arcangeli 2011-03-08 22:48 ` Minchan Kim 2011-03-09 9:18 ` Johannes Weiner 0 siblings, 2 replies; 11+ messages in thread From: Andrea Arcangeli @ 2011-03-08 12:58 UTC (permalink / raw) To: Michel Lespinasse Cc: Minchan Kim, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm On Tue, Mar 08, 2011 at 04:21:15AM -0800, Michel Lespinasse wrote: > On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: > > I only run some basic testing, please review. I seen no reason to > > return "referenced = 0" if the pmd is still splitting. So I let it go > > ahead now and test_and_set_bit the accessed bit even on a splitting > > pmd. After all the tlb miss could still activate the young bit on a > > pmd while it's in splitting state. There's no check for splitting in > > the pmdp_clear_flush_young. The secondary mmu has no secondary spte > > mapped while it's set to splitting so it shouldn't matter for it if we > > clear the young bit (and new secondary mmu page faults will wait on > > splitting to clear and __split_huge_page_map to finish before going > > ahead creating new secondary sptes with 4k granularity). > > Agree, the pmd_trans_splitting check didn't seem necessary. > > Thanks for the patch, looks fine, I only have a couple nitpicks regarding > code comments: > Ok, updated comments... thanks for the quick review. Try #2: === Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found From: Andrea Arcangeli <aarcange@redhat.com> When vmscan.c calls page_referenced, if an anon page was created before a process forked, rmap will search for it in both of the processes, even though one of them might have since broken COW. If the child process mlocks the vma where the COWed page belongs to, page_referenced() running on the page mapped by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading to the references on the parent page being ignored and evicting the parent page too early). *mapcount would also be decremented by page_referenced_one even if the page wasn't found by page_check_address. This also let pmdp_clear_flush_young_notify() go ahead on a pmd_trans_splitting() pmd. We hold the page_table_lock so __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to complete before it can modify the pmd. The pmd is also still mapped in userland so the young bit may materialize through a tlb miss before split_huge_page_map runs. This will provide a more accurate page_referenced() behavior during split_huge_page(). Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: Michel Lespinasse <walken@google.com> Reviewed-by: Michel Lespinasse <walken@google.com> --- --- mm/rmap.c | 54 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 19 deletions(-) --- a/mm/rmap.c +++ b/mm/rmap.c @@ -497,41 +497,51 @@ int page_referenced_one(struct page *pag struct mm_struct *mm = vma->vm_mm; int referenced = 0; - /* - * Don't want to elevate referenced for mlocked page that gets this far, - * in order that it progresses to try_to_unmap and is moved to the - * unevictable list. - */ - if (vma->vm_flags & VM_LOCKED) { - *mapcount = 0; /* break early from loop */ - *vm_flags |= VM_LOCKED; - goto out; - } - - /* Pretend the page is referenced if the task has the - swap token and is in the middle of a page fault. */ - if (mm != current->mm && has_swap_token(mm) && - rwsem_is_locked(&mm->mmap_sem)) - referenced++; - if (unlikely(PageTransHuge(page))) { pmd_t *pmd; spin_lock(&mm->page_table_lock); + /* + * rmap might return false positives; we must filter + * these out using page_check_address_pmd(). + */ pmd = page_check_address_pmd(page, mm, address, PAGE_CHECK_ADDRESS_PMD_FLAG); - if (pmd && !pmd_trans_splitting(*pmd) && - pmdp_clear_flush_young_notify(vma, address, pmd)) + if (!pmd) { + spin_unlock(&mm->page_table_lock); + goto out; + } + + if (vma->vm_flags & VM_LOCKED) { + spin_unlock(&mm->page_table_lock); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + + /* go ahead even if the pmd is pmd_trans_splitting() */ + if (pmdp_clear_flush_young_notify(vma, address, pmd)) referenced++; spin_unlock(&mm->page_table_lock); } else { pte_t *pte; spinlock_t *ptl; + /* + * rmap might return false positives; we must filter + * these out using page_check_address(). + */ pte = page_check_address(page, mm, address, &ptl, 0); if (!pte) goto out; + if (vma->vm_flags & VM_LOCKED) { + pte_unmap_unlock(pte, ptl); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + if (ptep_clear_flush_young_notify(vma, address, pte)) { /* * Don't treat a reference through a sequentially read @@ -546,6 +556,12 @@ int page_referenced_one(struct page *pag pte_unmap_unlock(pte, ptl); } + /* Pretend the page is referenced if the task has the + swap token and is in the middle of a page fault. */ + if (mm != current->mm && has_swap_token(mm) && + rwsem_is_locked(&mm->mmap_sem)) + referenced++; + (*mapcount)--; if (referenced) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 12:58 ` Andrea Arcangeli @ 2011-03-08 22:48 ` Minchan Kim 2011-03-09 9:18 ` Johannes Weiner 1 sibling, 0 replies; 11+ messages in thread From: Minchan Kim @ 2011-03-08 22:48 UTC (permalink / raw) To: Andrea Arcangeli Cc: Michel Lespinasse, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm On Tue, Mar 8, 2011 at 9:58 PM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Tue, Mar 08, 2011 at 04:21:15AM -0800, Michel Lespinasse wrote: >> On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: >> > I only run some basic testing, please review. I seen no reason to >> > return "referenced = 0" if the pmd is still splitting. So I let it go >> > ahead now and test_and_set_bit the accessed bit even on a splitting >> > pmd. After all the tlb miss could still activate the young bit on a >> > pmd while it's in splitting state. There's no check for splitting in >> > the pmdp_clear_flush_young. The secondary mmu has no secondary spte >> > mapped while it's set to splitting so it shouldn't matter for it if we >> > clear the young bit (and new secondary mmu page faults will wait on >> > splitting to clear and __split_huge_page_map to finish before going >> > ahead creating new secondary sptes with 4k granularity). >> >> Agree, the pmd_trans_splitting check didn't seem necessary. >> >> Thanks for the patch, looks fine, I only have a couple nitpicks regarding >> code comments: >> > > Ok, updated comments... thanks for the quick review. Try #2: > > === > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli <aarcange@redhat.com> > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to > complete before it can modify the pmd. The pmd is also still mapped in userland > so the young bit may materialize through a tlb miss before split_huge_page_map > runs. This will provide a more accurate page_referenced() behavior during > split_huge_page(). > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > Reported-by: Michel Lespinasse <walken@google.com> > Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 12:58 ` Andrea Arcangeli 2011-03-08 22:48 ` Minchan Kim @ 2011-03-09 9:18 ` Johannes Weiner 1 sibling, 0 replies; 11+ messages in thread From: Johannes Weiner @ 2011-03-09 9:18 UTC (permalink / raw) To: Andrea Arcangeli Cc: Michel Lespinasse, Minchan Kim, Rik van Riel, Hugh Dickins, Andrew Morton, linux-mm On Tue, Mar 08, 2011 at 01:58:30PM +0100, Andrea Arcangeli wrote: > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to > complete before it can modify the pmd. The pmd is also still mapped in userland > so the young bit may materialize through a tlb miss before split_huge_page_map > runs. This will provide a more accurate page_referenced() behavior during > split_huge_page(). > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > Reported-by: Michel Lespinasse <walken@google.com> > Reviewed-by: Michel Lespinasse <walken@google.com> Reviewed-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 11:32 ` Andrea Arcangeli 2011-03-08 12:21 ` Michel Lespinasse @ 2011-03-08 13:57 ` Rik van Riel 2011-03-09 7:11 ` KOSAKI Motohiro 1 sibling, 1 reply; 11+ messages in thread From: Rik van Riel @ 2011-03-08 13:57 UTC (permalink / raw) To: Andrea Arcangeli Cc: Michel Lespinasse, Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm On 03/08/2011 06:32 AM, Andrea Arcangeli wrote: > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli<aarcange@redhat.com> > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> > Reported-by: Michel Lespinasse<walken@google.com> Reviewed-by: Rik van Riel<riel@redhat.com> -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: THP, rmap and page_referenced_one() 2011-03-08 13:57 ` Rik van Riel @ 2011-03-09 7:11 ` KOSAKI Motohiro 0 siblings, 0 replies; 11+ messages in thread From: KOSAKI Motohiro @ 2011-03-09 7:11 UTC (permalink / raw) To: Rik van Riel Cc: kosaki.motohiro, Andrea Arcangeli, Michel Lespinasse, Minchan Kim, Hugh Dickins, Andrew Morton, linux-mm > On 03/08/2011 06:32 AM, Andrea Arcangeli wrote: > > > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > > > From: Andrea Arcangeli<aarcange@redhat.com> > > > > When vmscan.c calls page_referenced, if an anon page was created before a > > process forked, rmap will search for it in both of the processes, even though > > one of them might have since broken COW. If the child process mlocks the vma > > where the COWed page belongs to, page_referenced() running on the page mapped > > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > > to the references on the parent page being ignored and evicting the parent page > > too early). > > > Signed-off-by: Andrea Arcangeli<aarcange@redhat.com> > > Reported-by: Michel Lespinasse<walken@google.com> > > Reviewed-by: Rik van Riel<riel@redhat.com> Thank you, Andrea. Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2011-03-09 9:18 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-03-07 6:50 THP, rmap and page_referenced_one() Michel Lespinasse 2011-03-07 7:33 ` KOSAKI Motohiro 2011-03-07 8:35 ` Minchan Kim 2011-03-07 10:35 ` Michel Lespinasse 2011-03-08 11:32 ` Andrea Arcangeli 2011-03-08 12:21 ` Michel Lespinasse 2011-03-08 12:58 ` Andrea Arcangeli 2011-03-08 22:48 ` Minchan Kim 2011-03-09 9:18 ` Johannes Weiner 2011-03-08 13:57 ` Rik van Riel 2011-03-09 7:11 ` KOSAKI Motohiro
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).