From: Michel Lespinasse <walken@google.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>
Subject: Re: THP, rmap and page_referenced_one()
Date: Tue, 8 Mar 2011 04:21:15 -0800 [thread overview]
Message-ID: <20110308122115.GA28054@google.com> (raw)
In-Reply-To: <20110308113245.GR25641@random.random>
On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote:
> I only run some basic testing, please review. I seen no reason to
> return "referenced = 0" if the pmd is still splitting. So I let it go
> ahead now and test_and_set_bit the accessed bit even on a splitting
> pmd. After all the tlb miss could still activate the young bit on a
> pmd while it's in splitting state. There's no check for splitting in
> the pmdp_clear_flush_young. The secondary mmu has no secondary spte
> mapped while it's set to splitting so it shouldn't matter for it if we
> clear the young bit (and new secondary mmu page faults will wait on
> splitting to clear and __split_huge_page_map to finish before going
> ahead creating new secondary sptes with 4k granularity).
Agree, the pmd_trans_splitting check didn't seem necessary.
Thanks for the patch, looks fine, I only have a couple nitpicks regarding
code comments:
> ===
> Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found
>
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> When vmscan.c calls page_referenced, if an anon page was created before a
> process forked, rmap will search for it in both of the processes, even though
> one of them might have since broken COW. If the child process mlocks the vma
> where the COWed page belongs to, page_referenced() running on the page mapped
> by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading
> to the references on the parent page being ignored and evicting the parent page
> too early).
>
> *mapcount would also be decremented by page_referenced_one even if the page
> wasn't found by page_check_address.
>
> This also let pmdp_clear_flush_young_notify() go ahead on a
> pmd_trans_splitting() pmd. We hold the page_table_lock so
> __split_huge_page_map() must wait the pmdp_clear_flush_young_notify()
> to complete before it can modify the pmd. The pmd is also still mapped
> in userland so the young bit may materialize through a tlb miss before
> split_huge_page_map runs. This will provide a more accurate
> page_referenced() behavior during split_huge_page().
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
> ---
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f21f4a1..e8924bc 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> struct mm_struct *mm = vma->vm_mm;
> int referenced = 0;
>
> - /*
> - * Don't want to elevate referenced for mlocked page that gets this far,
> - * in order that it progresses to try_to_unmap and is moved to the
> - * unevictable list.
> - */
> - if (vma->vm_flags & VM_LOCKED) {
> - *mapcount = 0; /* break early from loop */
> - *vm_flags |= VM_LOCKED;
> - goto out;
> - }
> -
> - /* Pretend the page is referenced if the task has the
> - swap token and is in the middle of a page fault. */
> - if (mm != current->mm && has_swap_token(mm) &&
> - rwsem_is_locked(&mm->mmap_sem))
> - referenced++;
> -
> if (unlikely(PageTransHuge(page))) {
> pmd_t *pmd;
>
> spin_lock(&mm->page_table_lock);
> + /*
> + * We must update *vm_flags and *mapcount only if
> + * page_check_address_pmd() succeeds in finding the
> + * page.
> + */
I would propose:
/*
* rmap might return false positives; we must filter these out
* using page_check_address_pmd().
*/
(sorry for the nitpick, it's OK if you prefer your own version :)
> pmd = page_check_address_pmd(page, mm, address,
> PAGE_CHECK_ADDRESS_PMD_FLAG);
> - if (pmd && !pmd_trans_splitting(*pmd) &&
> - pmdp_clear_flush_young_notify(vma, address, pmd))
> + if (!pmd) {
> + spin_unlock(&mm->page_table_lock);
> + goto out;
> + }
> +
> + /*
> + * Don't want to elevate referenced for mlocked page
> + * that gets this far, in order that it progresses to
> + * try_to_unmap and is moved to the unevictable list.
> + */
Actually page_check_references() now ignores the referenced value when
vm_flags & VM_LOCKED is set, so it'd be best to lose the above comment IMO.
> + if (vma->vm_flags & VM_LOCKED) {
> + spin_unlock(&mm->page_table_lock);
> + *mapcount = 0; /* break early from loop */
> + *vm_flags |= VM_LOCKED;
> + goto out;
> + }
> +
> + /* go ahead even if the pmd is pmd_trans_splitting() */
> + if (pmdp_clear_flush_young_notify(vma, address, pmd))
> referenced++;
> spin_unlock(&mm->page_table_lock);
> } else {
> pte_t *pte;
> spinlock_t *ptl;
>
> + /*
> + * We must update *vm_flags and *mapcount only if
> + * page_check_address() succeeds in finding the page.
> + */
(same as above)
> pte = page_check_address(page, mm, address, &ptl, 0);
> if (!pte)
> goto out;
>
> + /*
> + * Don't want to elevate referenced for mlocked page
> + * that gets this far, in order that it progresses to
> + * try_to_unmap and is moved to the unevictable list.
> + */
(same as above)
> + if (vma->vm_flags & VM_LOCKED) {
> + pte_unmap_unlock(pte, ptl);
> + *mapcount = 0; /* break early from loop */
> + *vm_flags |= VM_LOCKED;
> + goto out;
> + }
> +
> if (ptep_clear_flush_young_notify(vma, address, pte)) {
> /*
> * Don't treat a reference through a sequentially read
> @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> pte_unmap_unlock(pte, ptl);
> }
>
> + /* Pretend the page is referenced if the task has the
> + swap token and is in the middle of a page fault. */
> + if (mm != current->mm && has_swap_token(mm) &&
> + rwsem_is_locked(&mm->mmap_sem))
> + referenced++;
> +
> (*mapcount)--;
>
> if (referenced)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-08 12:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-07 6:50 THP, rmap and page_referenced_one() Michel Lespinasse
2011-03-07 7:33 ` KOSAKI Motohiro
2011-03-07 8:35 ` Minchan Kim
2011-03-07 10:35 ` Michel Lespinasse
2011-03-08 11:32 ` Andrea Arcangeli
2011-03-08 12:21 ` Michel Lespinasse [this message]
2011-03-08 12:58 ` Andrea Arcangeli
2011-03-08 22:48 ` Minchan Kim
2011-03-09 9:18 ` Johannes Weiner
2011-03-08 13:57 ` Rik van Riel
2011-03-09 7:11 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110308122115.GA28054@google.com \
--to=walken@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.