From: Michel Lespinasse <walken@google.com>
To: Andrea Arcangeli <aarcange@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>,
Rik van Riel <riel@redhat.com>, Hugh Dickins <hughd@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
linux-mm <linux-mm@kvack.org>
Subject: Re: THP, rmap and page_referenced_one()
Date: Tue, 8 Mar 2011 04:21:15 -0800 [thread overview]
Message-ID: <20110308122115.GA28054@google.com> (raw)
In-Reply-To: <20110308113245.GR25641@random.random>
On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote:
> I only run some basic testing, please review. I seen no reason to
> return "referenced = 0" if the pmd is still splitting. So I let it go
> ahead now and test_and_set_bit the accessed bit even on a splitting
> pmd. After all the tlb miss could still activate the young bit on a
> pmd while it's in splitting state. There's no check for splitting in
> the pmdp_clear_flush_young. The secondary mmu has no secondary spte
> mapped while it's set to splitting so it shouldn't matter for it if we
> clear the young bit (and new secondary mmu page faults will wait on
> splitting to clear and __split_huge_page_map to finish before going
> ahead creating new secondary sptes with 4k granularity).
Agree, the pmd_trans_splitting check didn't seem necessary.
Thanks for the patch, looks fine, I only have a couple nitpicks regarding
code comments:
> ===
> Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found
>
> From: Andrea Arcangeli <aarcange@redhat.com>
>
> When vmscan.c calls page_referenced, if an anon page was created before a
> process forked, rmap will search for it in both of the processes, even though
> one of them might have since broken COW. If the child process mlocks the vma
> where the COWed page belongs to, page_referenced() running on the page mapped
> by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading
> to the references on the parent page being ignored and evicting the parent page
> too early).
>
> *mapcount would also be decremented by page_referenced_one even if the page
> wasn't found by page_check_address.
>
> This also let pmdp_clear_flush_young_notify() go ahead on a
> pmd_trans_splitting() pmd. We hold the page_table_lock so
> __split_huge_page_map() must wait the pmdp_clear_flush_young_notify()
> to complete before it can modify the pmd. The pmd is also still mapped
> in userland so the young bit may materialize through a tlb miss before
> split_huge_page_map runs. This will provide a more accurate
> page_referenced() behavior during split_huge_page().
>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> Reported-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Michel Lespinasse <walken@google.com>
> ---
>
> diff --git a/mm/rmap.c b/mm/rmap.c
> index f21f4a1..e8924bc 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> struct mm_struct *mm = vma->vm_mm;
> int referenced = 0;
>
> - /*
> - * Don't want to elevate referenced for mlocked page that gets this far,
> - * in order that it progresses to try_to_unmap and is moved to the
> - * unevictable list.
> - */
> - if (vma->vm_flags & VM_LOCKED) {
> - *mapcount = 0; /* break early from loop */
> - *vm_flags |= VM_LOCKED;
> - goto out;
> - }
> -
> - /* Pretend the page is referenced if the task has the
> - swap token and is in the middle of a page fault. */
> - if (mm != current->mm && has_swap_token(mm) &&
> - rwsem_is_locked(&mm->mmap_sem))
> - referenced++;
> -
> if (unlikely(PageTransHuge(page))) {
> pmd_t *pmd;
>
> spin_lock(&mm->page_table_lock);
> + /*
> + * We must update *vm_flags and *mapcount only if
> + * page_check_address_pmd() succeeds in finding the
> + * page.
> + */
I would propose:
/*
* rmap might return false positives; we must filter these out
* using page_check_address_pmd().
*/
(sorry for the nitpick, it's OK if you prefer your own version :)
> pmd = page_check_address_pmd(page, mm, address,
> PAGE_CHECK_ADDRESS_PMD_FLAG);
> - if (pmd && !pmd_trans_splitting(*pmd) &&
> - pmdp_clear_flush_young_notify(vma, address, pmd))
> + if (!pmd) {
> + spin_unlock(&mm->page_table_lock);
> + goto out;
> + }
> +
> + /*
> + * Don't want to elevate referenced for mlocked page
> + * that gets this far, in order that it progresses to
> + * try_to_unmap and is moved to the unevictable list.
> + */
Actually page_check_references() now ignores the referenced value when
vm_flags & VM_LOCKED is set, so it'd be best to lose the above comment IMO.
> + if (vma->vm_flags & VM_LOCKED) {
> + spin_unlock(&mm->page_table_lock);
> + *mapcount = 0; /* break early from loop */
> + *vm_flags |= VM_LOCKED;
> + goto out;
> + }
> +
> + /* go ahead even if the pmd is pmd_trans_splitting() */
> + if (pmdp_clear_flush_young_notify(vma, address, pmd))
> referenced++;
> spin_unlock(&mm->page_table_lock);
> } else {
> pte_t *pte;
> spinlock_t *ptl;
>
> + /*
> + * We must update *vm_flags and *mapcount only if
> + * page_check_address() succeeds in finding the page.
> + */
(same as above)
> pte = page_check_address(page, mm, address, &ptl, 0);
> if (!pte)
> goto out;
>
> + /*
> + * Don't want to elevate referenced for mlocked page
> + * that gets this far, in order that it progresses to
> + * try_to_unmap and is moved to the unevictable list.
> + */
(same as above)
> + if (vma->vm_flags & VM_LOCKED) {
> + pte_unmap_unlock(pte, ptl);
> + *mapcount = 0; /* break early from loop */
> + *vm_flags |= VM_LOCKED;
> + goto out;
> + }
> +
> if (ptep_clear_flush_young_notify(vma, address, pte)) {
> /*
> * Don't treat a reference through a sequentially read
> @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
> pte_unmap_unlock(pte, ptl);
> }
>
> + /* Pretend the page is referenced if the task has the
> + swap token and is in the middle of a page fault. */
> + if (mm != current->mm && has_swap_token(mm) &&
> + rwsem_is_locked(&mm->mmap_sem))
> + referenced++;
> +
> (*mapcount)--;
>
> if (referenced)
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2011-03-08 12:21 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-03-07 6:50 THP, rmap and page_referenced_one() Michel Lespinasse
2011-03-07 7:33 ` KOSAKI Motohiro
2011-03-07 8:35 ` Minchan Kim
2011-03-07 10:35 ` Michel Lespinasse
2011-03-08 11:32 ` Andrea Arcangeli
2011-03-08 12:21 ` Michel Lespinasse [this message]
2011-03-08 12:58 ` Andrea Arcangeli
2011-03-08 22:48 ` Minchan Kim
2011-03-09 9:18 ` Johannes Weiner
2011-03-08 13:57 ` Rik van Riel
2011-03-09 7:11 ` KOSAKI Motohiro
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110308122115.GA28054@google.com \
--to=walken@google.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=hughd@google.com \
--cc=linux-mm@kvack.org \
--cc=minchan.kim@gmail.com \
--cc=riel@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).