From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id D943F8D0039 for ; Mon, 7 Mar 2011 01:50:27 -0500 (EST) Received: from hpaq12.eem.corp.google.com (hpaq12.eem.corp.google.com [172.25.149.12]) by smtp-out.google.com with ESMTP id p276oNQc002043 for ; Sun, 6 Mar 2011 22:50:23 -0800 Received: from qyk7 (qyk7.prod.google.com [10.241.83.135]) by hpaq12.eem.corp.google.com with ESMTP id p276oDdC012663 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Sun, 6 Mar 2011 22:50:22 -0800 Received: by qyk7 with SMTP id 7so3207486qyk.3 for ; Sun, 06 Mar 2011 22:50:21 -0800 (PST) MIME-Version: 1.0 Date: Sun, 6 Mar 2011 22:50:21 -0800 Message-ID: Subject: THP, rmap and page_referenced_one() From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli , Rik van Riel , Hugh Dickins Cc: Andrew Morton , linux-mm Hi, I have been wondering about the following: Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in page_referenced_one() was placed after the page_check_address() call, but now it is placed above it. Could this be a problem ? My understanding is that the page_check_address() check may return false positives - for example, if an anon page was created before a process forked, rmap will indicate that the page could be mapped in both of the processes, even though one of them might have since broken COW. What would happen if the child process mlocks the corresponding VMA ? my understanding is that this would break COW, but not cause rmap to be updated, so the parent's page would still be marked in rmap as being possibly mapped in the children's VM_LOCKED vma. With the VM_LOCKED check now placed above the page_check_address() call, this would cause vmscan to see both the parent's and the child's pages as being unevictable. Am I missing something there ? In particular, I am not sure if marking the children's VMA as mlocked would somehow cause rmap to realize it can't share pages with the parent anymore (but I don't think that's the case, and it could cause other issues if it was...) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 549FC8D0039 for ; Mon, 7 Mar 2011 02:33:16 -0500 (EST) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id D19633EE0BB for ; Mon, 7 Mar 2011 16:33:08 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id BAAE045DE53 for ; Mon, 7 Mar 2011 16:33:08 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id A534A45DE4D for ; Mon, 7 Mar 2011 16:33:08 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 979221DB8041 for ; Mon, 7 Mar 2011 16:33:08 +0900 (JST) Received: from ml13.s.css.fujitsu.com (ml13.s.css.fujitsu.com [10.240.81.133]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 5F3891DB803F for ; Mon, 7 Mar 2011 16:33:08 +0900 (JST) From: KOSAKI Motohiro Subject: Re: THP, rmap and page_referenced_one() In-Reply-To: References: Message-Id: <20110307162920.89FB.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit Date: Mon, 7 Mar 2011 16:33:07 +0900 (JST) Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: kosaki.motohiro@jp.fujitsu.com, Andrea Arcangeli , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm > Hi, > > I have been wondering about the following: > > Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in > page_referenced_one() was placed after the page_check_address() call, > but now it is placed above it. Could this be a problem ? > > My understanding is that the page_check_address() check may return > false positives - for example, if an anon page was created before a > process forked, rmap will indicate that the page could be mapped in > both of the processes, even though one of them might have since broken > COW. What would happen if the child process mlocks the corresponding > VMA ? my understanding is that this would break COW, but not cause > rmap to be updated, so the parent's page would still be marked in rmap > as being possibly mapped in the children's VM_LOCKED vma. With the > VM_LOCKED check now placed above the page_check_address() call, this > would cause vmscan to see both the parent's and the child's pages as > being unevictable. > > Am I missing something there ? In particular, I am not sure if marking > the children's VMA as mlocked would somehow cause rmap to realize it > can't share pages with the parent anymore (but I don't think that's > the case, and it could cause other issues if it was...) Hi I think you are right. page_check_address() should be called before VM_LOCKED check. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id 595798D0039 for ; Mon, 7 Mar 2011 03:35:39 -0500 (EST) Received: by iwl42 with SMTP id 42so5020004iwl.14 for ; Mon, 07 Mar 2011 00:35:37 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Mon, 7 Mar 2011 17:35:37 +0900 Message-ID: Subject: Re: THP, rmap and page_referenced_one() From: Minchan Kim Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: Andrea Arcangeli , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm On Mon, Mar 7, 2011 at 3:50 PM, Michel Lespinasse wrote: > Hi, > > I have been wondering about the following: > > Before the THP work, the if (vma->vm_flags & VM_LOCKED) test in > page_referenced_one() was placed after the page_check_address() call, > but now it is placed above it. Could this be a problem ? > > My understanding is that the page_check_address() check may return > false positives - for example, if an anon page was created before a > process forked, rmap will indicate that the page could be mapped in > both of the processes, even though one of them might have since broken > COW. What would happen if the child process mlocks the corresponding > VMA ? my understanding is that this would break COW, but not cause > rmap to be updated, so the parent's page would still be marked in rmap > as being possibly mapped in the children's VM_LOCKED vma. With the > VM_LOCKED check now placed above the page_check_address() call, this > would cause vmscan to see both the parent's and the child's pages as > being unevictable. I agree. There are two processes called P_A, P_B. P_B is child of P_A. A page "page A" is share between V_A(A's VMA)and V_B(B's VMA) since P_B is created by forking from P_A. When P_B calls mlock the V_B, P_B allocates new page B instead of reusing page A by COW and mapped P_B's page table but rmap of page A still indicates page A is mapped by V_A and V_B. The page_check_address can filter this situation that V_B doesn't include page A any more. So page_check_address should be placed before checking the VM_LOCKED. I think it's valuable to add the comment why we need page_check_address should be placed before the checking VM_LOCKED. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail143.messagelabs.com (mail143.messagelabs.com [216.82.254.35]) by kanga.kvack.org (Postfix) with ESMTP id 164288D0039 for ; Mon, 7 Mar 2011 05:35:20 -0500 (EST) Received: from hpaq14.eem.corp.google.com (hpaq14.eem.corp.google.com [172.25.149.14]) by smtp-out.google.com with ESMTP id p27AZHXg002057 for ; Mon, 7 Mar 2011 02:35:18 -0800 Received: from gye5 (gye5.prod.google.com [10.243.50.5]) by hpaq14.eem.corp.google.com with ESMTP id p27AZ6Fw024526 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Mon, 7 Mar 2011 02:35:16 -0800 Received: by gye5 with SMTP id 5so1899143gye.38 for ; Mon, 07 Mar 2011 02:35:16 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: Date: Mon, 7 Mar 2011 02:35:15 -0800 Message-ID: Subject: Re: THP, rmap and page_referenced_one() From: Michel Lespinasse Content-Type: text/plain; charset=ISO-8859-1 Sender: owner-linux-mm@kvack.org List-ID: To: Minchan Kim Cc: Andrea Arcangeli , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm There is also the issue that *mapcount will be decremented even if the pmd turns out not to point to the given page. page_referenced() will stop looking at rmap's candidate mappings once the refcount hits zero, so the decrement will cause an actual mapping to be ignored. -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail202.messagelabs.com (mail202.messagelabs.com [216.82.254.227]) by kanga.kvack.org (Postfix) with SMTP id 255258D0039 for ; Tue, 8 Mar 2011 06:32:53 -0500 (EST) Date: Tue, 8 Mar 2011 12:32:45 +0100 From: Andrea Arcangeli Subject: Re: THP, rmap and page_referenced_one() Message-ID: <20110308113245.GR25641@random.random> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: Minchan Kim , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm Hello, On Mon, Mar 07, 2011 at 02:35:15AM -0800, Michel Lespinasse wrote: > There is also the issue that *mapcount will be decremented even if the > pmd turns out not to point to the given page. page_referenced() will > stop looking at rmap's candidate mappings once the refcount hits zero, > so the decrement will cause an actual mapping to be ignored. Both the mapcount and vm_flags problems would worst case lead to some page ("parent page" in the example provided with mlock in the child) being evicted too early, not to become unevictable. The mapcount is actually a bigger concern to me as it doesn't require the race. I also added a proper comment before page_check_address* as suggested by Minchan. Frankly when I updated the code I didn't realize that was the actual reason VM_LOCKED was placed in the middle between page_check_address and ptep_clear_flush_young_notify, now it's clear why... I only run some basic testing, please review. I seen no reason to return "referenced = 0" if the pmd is still splitting. So I let it go ahead now and test_and_set_bit the accessed bit even on a splitting pmd. After all the tlb miss could still activate the young bit on a pmd while it's in splitting state. There's no check for splitting in the pmdp_clear_flush_young. The secondary mmu has no secondary spte mapped while it's set to splitting so it shouldn't matter for it if we clear the young bit (and new secondary mmu page faults will wait on splitting to clear and __split_huge_page_map to finish before going ahead creating new secondary sptes with 4k granularity). Here a fix: === Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found From: Andrea Arcangeli When vmscan.c calls page_referenced, if an anon page was created before a process forked, rmap will search for it in both of the processes, even though one of them might have since broken COW. If the child process mlocks the vma where the COWed page belongs to, page_referenced() running on the page mapped by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading to the references on the parent page being ignored and evicting the parent page too early). *mapcount would also be decremented by page_referenced_one even if the page wasn't found by page_check_address. This also let pmdp_clear_flush_young_notify() go ahead on a pmd_trans_splitting() pmd. We hold the page_table_lock so __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to complete before it can modify the pmd. The pmd is also still mapped in userland so the young bit may materialize through a tlb miss before split_huge_page_map runs. This will provide a more accurate page_referenced() behavior during split_huge_page(). Signed-off-by: Andrea Arcangeli Reported-by: Michel Lespinasse --- diff --git a/mm/rmap.c b/mm/rmap.c index f21f4a1..e8924bc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; int referenced = 0; - /* - * Don't want to elevate referenced for mlocked page that gets this far, - * in order that it progresses to try_to_unmap and is moved to the - * unevictable list. - */ - if (vma->vm_flags & VM_LOCKED) { - *mapcount = 0; /* break early from loop */ - *vm_flags |= VM_LOCKED; - goto out; - } - - /* Pretend the page is referenced if the task has the - swap token and is in the middle of a page fault. */ - if (mm != current->mm && has_swap_token(mm) && - rwsem_is_locked(&mm->mmap_sem)) - referenced++; - if (unlikely(PageTransHuge(page))) { pmd_t *pmd; spin_lock(&mm->page_table_lock); + /* + * We must update *vm_flags and *mapcount only if + * page_check_address_pmd() succeeds in finding the + * page. + */ pmd = page_check_address_pmd(page, mm, address, PAGE_CHECK_ADDRESS_PMD_FLAG); - if (pmd && !pmd_trans_splitting(*pmd) && - pmdp_clear_flush_young_notify(vma, address, pmd)) + if (!pmd) { + spin_unlock(&mm->page_table_lock); + goto out; + } + + /* + * Don't want to elevate referenced for mlocked page + * that gets this far, in order that it progresses to + * try_to_unmap and is moved to the unevictable list. + */ + if (vma->vm_flags & VM_LOCKED) { + spin_unlock(&mm->page_table_lock); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + + /* go ahead even if the pmd is pmd_trans_splitting() */ + if (pmdp_clear_flush_young_notify(vma, address, pmd)) referenced++; spin_unlock(&mm->page_table_lock); } else { pte_t *pte; spinlock_t *ptl; + /* + * We must update *vm_flags and *mapcount only if + * page_check_address() succeeds in finding the page. + */ pte = page_check_address(page, mm, address, &ptl, 0); if (!pte) goto out; + /* + * Don't want to elevate referenced for mlocked page + * that gets this far, in order that it progresses to + * try_to_unmap and is moved to the unevictable list. + */ + if (vma->vm_flags & VM_LOCKED) { + pte_unmap_unlock(pte, ptl); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + if (ptep_clear_flush_young_notify(vma, address, pte)) { /* * Don't treat a reference through a sequentially read @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, pte_unmap_unlock(pte, ptl); } + /* Pretend the page is referenced if the task has the + swap token and is in the middle of a page fault. */ + if (mm != current->mm && has_swap_token(mm) && + rwsem_is_locked(&mm->mmap_sem)) + referenced++; + (*mapcount)--; if (referenced) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail191.messagelabs.com (mail191.messagelabs.com [216.82.242.19]) by kanga.kvack.org (Postfix) with ESMTP id 9E3408D0039 for ; Tue, 8 Mar 2011 07:21:31 -0500 (EST) Received: from hpaq12.eem.corp.google.com (hpaq12.eem.corp.google.com [172.25.149.12]) by smtp-out.google.com with ESMTP id p28CLTWm014350 for ; Tue, 8 Mar 2011 04:21:29 -0800 Received: from pzk33 (pzk33.prod.google.com [10.243.19.161]) by hpaq12.eem.corp.google.com with ESMTP id p28CLMSC022221 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 8 Mar 2011 04:21:27 -0800 Received: by pzk33 with SMTP id 33so1326507pzk.14 for ; Tue, 08 Mar 2011 04:21:21 -0800 (PST) Date: Tue, 8 Mar 2011 04:21:15 -0800 From: Michel Lespinasse Subject: Re: THP, rmap and page_referenced_one() Message-ID: <20110308122115.GA28054@google.com> References: <20110308113245.GR25641@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110308113245.GR25641@random.random> Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Minchan Kim , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: > I only run some basic testing, please review. I seen no reason to > return "referenced = 0" if the pmd is still splitting. So I let it go > ahead now and test_and_set_bit the accessed bit even on a splitting > pmd. After all the tlb miss could still activate the young bit on a > pmd while it's in splitting state. There's no check for splitting in > the pmdp_clear_flush_young. The secondary mmu has no secondary spte > mapped while it's set to splitting so it shouldn't matter for it if we > clear the young bit (and new secondary mmu page faults will wait on > splitting to clear and __split_huge_page_map to finish before going > ahead creating new secondary sptes with 4k granularity). Agree, the pmd_trans_splitting check didn't seem necessary. Thanks for the patch, looks fine, I only have a couple nitpicks regarding code comments: > === > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() > to complete before it can modify the pmd. The pmd is also still mapped > in userland so the young bit may materialize through a tlb miss before > split_huge_page_map runs. This will provide a more accurate > page_referenced() behavior during split_huge_page(). > > Signed-off-by: Andrea Arcangeli > Reported-by: Michel Lespinasse Reviewed-by: Michel Lespinasse > --- > > diff --git a/mm/rmap.c b/mm/rmap.c > index f21f4a1..e8924bc 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -497,41 +497,62 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, > struct mm_struct *mm = vma->vm_mm; > int referenced = 0; > > - /* > - * Don't want to elevate referenced for mlocked page that gets this far, > - * in order that it progresses to try_to_unmap and is moved to the > - * unevictable list. > - */ > - if (vma->vm_flags & VM_LOCKED) { > - *mapcount = 0; /* break early from loop */ > - *vm_flags |= VM_LOCKED; > - goto out; > - } > - > - /* Pretend the page is referenced if the task has the > - swap token and is in the middle of a page fault. */ > - if (mm != current->mm && has_swap_token(mm) && > - rwsem_is_locked(&mm->mmap_sem)) > - referenced++; > - > if (unlikely(PageTransHuge(page))) { > pmd_t *pmd; > > spin_lock(&mm->page_table_lock); > + /* > + * We must update *vm_flags and *mapcount only if > + * page_check_address_pmd() succeeds in finding the > + * page. > + */ I would propose: /* * rmap might return false positives; we must filter these out * using page_check_address_pmd(). */ (sorry for the nitpick, it's OK if you prefer your own version :) > pmd = page_check_address_pmd(page, mm, address, > PAGE_CHECK_ADDRESS_PMD_FLAG); > - if (pmd && !pmd_trans_splitting(*pmd) && > - pmdp_clear_flush_young_notify(vma, address, pmd)) > + if (!pmd) { > + spin_unlock(&mm->page_table_lock); > + goto out; > + } > + > + /* > + * Don't want to elevate referenced for mlocked page > + * that gets this far, in order that it progresses to > + * try_to_unmap and is moved to the unevictable list. > + */ Actually page_check_references() now ignores the referenced value when vm_flags & VM_LOCKED is set, so it'd be best to lose the above comment IMO. > + if (vma->vm_flags & VM_LOCKED) { > + spin_unlock(&mm->page_table_lock); > + *mapcount = 0; /* break early from loop */ > + *vm_flags |= VM_LOCKED; > + goto out; > + } > + > + /* go ahead even if the pmd is pmd_trans_splitting() */ > + if (pmdp_clear_flush_young_notify(vma, address, pmd)) > referenced++; > spin_unlock(&mm->page_table_lock); > } else { > pte_t *pte; > spinlock_t *ptl; > > + /* > + * We must update *vm_flags and *mapcount only if > + * page_check_address() succeeds in finding the page. > + */ (same as above) > pte = page_check_address(page, mm, address, &ptl, 0); > if (!pte) > goto out; > > + /* > + * Don't want to elevate referenced for mlocked page > + * that gets this far, in order that it progresses to > + * try_to_unmap and is moved to the unevictable list. > + */ (same as above) > + if (vma->vm_flags & VM_LOCKED) { > + pte_unmap_unlock(pte, ptl); > + *mapcount = 0; /* break early from loop */ > + *vm_flags |= VM_LOCKED; > + goto out; > + } > + > if (ptep_clear_flush_young_notify(vma, address, pte)) { > /* > * Don't treat a reference through a sequentially read > @@ -546,6 +567,12 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma, > pte_unmap_unlock(pte, ptl); > } > > + /* Pretend the page is referenced if the task has the > + swap token and is in the middle of a page fault. */ > + if (mm != current->mm && has_swap_token(mm) && > + rwsem_is_locked(&mm->mmap_sem)) > + referenced++; > + > (*mapcount)--; > > if (referenced) -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with SMTP id 888E88D0039 for ; Tue, 8 Mar 2011 07:58:59 -0500 (EST) Date: Tue, 8 Mar 2011 13:58:30 +0100 From: Andrea Arcangeli Subject: Re: THP, rmap and page_referenced_one() Message-ID: <20110308125830.GS25641@random.random> References: <20110308113245.GR25641@random.random> <20110308122115.GA28054@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110308122115.GA28054@google.com> Sender: owner-linux-mm@kvack.org List-ID: To: Michel Lespinasse Cc: Minchan Kim , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm On Tue, Mar 08, 2011 at 04:21:15AM -0800, Michel Lespinasse wrote: > On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: > > I only run some basic testing, please review. I seen no reason to > > return "referenced = 0" if the pmd is still splitting. So I let it go > > ahead now and test_and_set_bit the accessed bit even on a splitting > > pmd. After all the tlb miss could still activate the young bit on a > > pmd while it's in splitting state. There's no check for splitting in > > the pmdp_clear_flush_young. The secondary mmu has no secondary spte > > mapped while it's set to splitting so it shouldn't matter for it if we > > clear the young bit (and new secondary mmu page faults will wait on > > splitting to clear and __split_huge_page_map to finish before going > > ahead creating new secondary sptes with 4k granularity). > > Agree, the pmd_trans_splitting check didn't seem necessary. > > Thanks for the patch, looks fine, I only have a couple nitpicks regarding > code comments: > Ok, updated comments... thanks for the quick review. Try #2: === Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found From: Andrea Arcangeli When vmscan.c calls page_referenced, if an anon page was created before a process forked, rmap will search for it in both of the processes, even though one of them might have since broken COW. If the child process mlocks the vma where the COWed page belongs to, page_referenced() running on the page mapped by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading to the references on the parent page being ignored and evicting the parent page too early). *mapcount would also be decremented by page_referenced_one even if the page wasn't found by page_check_address. This also let pmdp_clear_flush_young_notify() go ahead on a pmd_trans_splitting() pmd. We hold the page_table_lock so __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to complete before it can modify the pmd. The pmd is also still mapped in userland so the young bit may materialize through a tlb miss before split_huge_page_map runs. This will provide a more accurate page_referenced() behavior during split_huge_page(). Signed-off-by: Andrea Arcangeli Reported-by: Michel Lespinasse Reviewed-by: Michel Lespinasse --- --- mm/rmap.c | 54 +++++++++++++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 19 deletions(-) --- a/mm/rmap.c +++ b/mm/rmap.c @@ -497,41 +497,51 @@ int page_referenced_one(struct page *pag struct mm_struct *mm = vma->vm_mm; int referenced = 0; - /* - * Don't want to elevate referenced for mlocked page that gets this far, - * in order that it progresses to try_to_unmap and is moved to the - * unevictable list. - */ - if (vma->vm_flags & VM_LOCKED) { - *mapcount = 0; /* break early from loop */ - *vm_flags |= VM_LOCKED; - goto out; - } - - /* Pretend the page is referenced if the task has the - swap token and is in the middle of a page fault. */ - if (mm != current->mm && has_swap_token(mm) && - rwsem_is_locked(&mm->mmap_sem)) - referenced++; - if (unlikely(PageTransHuge(page))) { pmd_t *pmd; spin_lock(&mm->page_table_lock); + /* + * rmap might return false positives; we must filter + * these out using page_check_address_pmd(). + */ pmd = page_check_address_pmd(page, mm, address, PAGE_CHECK_ADDRESS_PMD_FLAG); - if (pmd && !pmd_trans_splitting(*pmd) && - pmdp_clear_flush_young_notify(vma, address, pmd)) + if (!pmd) { + spin_unlock(&mm->page_table_lock); + goto out; + } + + if (vma->vm_flags & VM_LOCKED) { + spin_unlock(&mm->page_table_lock); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + + /* go ahead even if the pmd is pmd_trans_splitting() */ + if (pmdp_clear_flush_young_notify(vma, address, pmd)) referenced++; spin_unlock(&mm->page_table_lock); } else { pte_t *pte; spinlock_t *ptl; + /* + * rmap might return false positives; we must filter + * these out using page_check_address(). + */ pte = page_check_address(page, mm, address, &ptl, 0); if (!pte) goto out; + if (vma->vm_flags & VM_LOCKED) { + pte_unmap_unlock(pte, ptl); + *mapcount = 0; /* break early from loop */ + *vm_flags |= VM_LOCKED; + goto out; + } + if (ptep_clear_flush_young_notify(vma, address, pte)) { /* * Don't treat a reference through a sequentially read @@ -546,6 +556,12 @@ int page_referenced_one(struct page *pag pte_unmap_unlock(pte, ptl); } + /* Pretend the page is referenced if the task has the + swap token and is in the middle of a page fault. */ + if (mm != current->mm && has_swap_token(mm) && + rwsem_is_locked(&mm->mmap_sem)) + referenced++; + (*mapcount)--; if (referenced) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with SMTP id A0D528D0039 for ; Tue, 8 Mar 2011 08:57:52 -0500 (EST) Message-ID: <4D7635DA.9030707@redhat.com> Date: Tue, 08 Mar 2011 08:57:46 -0500 From: Rik van Riel MIME-Version: 1.0 Subject: Re: THP, rmap and page_referenced_one() References: <20110308113245.GR25641@random.random> In-Reply-To: <20110308113245.GR25641@random.random> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Michel Lespinasse , Minchan Kim , Hugh Dickins , Andrew Morton , linux-mm On 03/08/2011 06:32 AM, Andrea Arcangeli wrote: > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > Signed-off-by: Andrea Arcangeli > Reported-by: Michel Lespinasse Reviewed-by: Rik van Riel -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with ESMTP id A67748D0039 for ; Tue, 8 Mar 2011 17:48:56 -0500 (EST) Received: by iwl42 with SMTP id 42so7204534iwl.14 for ; Tue, 08 Mar 2011 14:48:54 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <20110308125830.GS25641@random.random> References: <20110308113245.GR25641@random.random> <20110308122115.GA28054@google.com> <20110308125830.GS25641@random.random> Date: Wed, 9 Mar 2011 07:48:54 +0900 Message-ID: Subject: Re: THP, rmap and page_referenced_one() From: Minchan Kim Content-Type: text/plain; charset=UTF-8 Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Michel Lespinasse , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm On Tue, Mar 8, 2011 at 9:58 PM, Andrea Arcangeli wrote: > On Tue, Mar 08, 2011 at 04:21:15AM -0800, Michel Lespinasse wrote: >> On Tue, Mar 08, 2011 at 12:32:45PM +0100, Andrea Arcangeli wrote: >> > I only run some basic testing, please review. I seen no reason to >> > return "referenced = 0" if the pmd is still splitting. So I let it go >> > ahead now and test_and_set_bit the accessed bit even on a splitting >> > pmd. After all the tlb miss could still activate the young bit on a >> > pmd while it's in splitting state. There's no check for splitting in >> > the pmdp_clear_flush_young. The secondary mmu has no secondary spte >> > mapped while it's set to splitting so it shouldn't matter for it if we >> > clear the young bit (and new secondary mmu page faults will wait on >> > splitting to clear and __split_huge_page_map to finish before going >> > ahead creating new secondary sptes with 4k granularity). >> >> Agree, the pmd_trans_splitting check didn't seem necessary. >> >> Thanks for the patch, looks fine, I only have a couple nitpicks regarding >> code comments: >> > > Ok, updated comments... thanks for the quick review. Try #2: > > === > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > From: Andrea Arcangeli > > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to > complete before it can modify the pmd. The pmd is also still mapped in userland > so the young bit may materialize through a tlb miss before split_huge_page_map > runs. This will provide a more accurate page_referenced() behavior during > split_huge_page(). > > Signed-off-by: Andrea Arcangeli > Reported-by: Michel Lespinasse > Reviewed-by: Michel Lespinasse Reviewed-by: Minchan Kim -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail144.messagelabs.com (mail144.messagelabs.com [216.82.254.51]) by kanga.kvack.org (Postfix) with ESMTP id 007868D0039 for ; Wed, 9 Mar 2011 02:11:19 -0500 (EST) Received: from m4.gw.fujitsu.co.jp (unknown [10.0.50.74]) by fgwmail6.fujitsu.co.jp (Postfix) with ESMTP id E82E43EE0AE for ; Wed, 9 Mar 2011 16:11:16 +0900 (JST) Received: from smail (m4 [127.0.0.1]) by outgoing.m4.gw.fujitsu.co.jp (Postfix) with ESMTP id CCFEC45DE50 for ; Wed, 9 Mar 2011 16:11:16 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (s4.gw.fujitsu.co.jp [10.0.50.94]) by m4.gw.fujitsu.co.jp (Postfix) with ESMTP id B63ED45DE4F for ; Wed, 9 Mar 2011 16:11:16 +0900 (JST) Received: from s4.gw.fujitsu.co.jp (localhost.localdomain [127.0.0.1]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id A73FF1DB803B for ; Wed, 9 Mar 2011 16:11:16 +0900 (JST) Received: from ml14.s.css.fujitsu.com (ml14.s.css.fujitsu.com [10.240.81.134]) by s4.gw.fujitsu.co.jp (Postfix) with ESMTP id 711F31DB802F for ; Wed, 9 Mar 2011 16:11:16 +0900 (JST) From: KOSAKI Motohiro Subject: Re: THP, rmap and page_referenced_one() In-Reply-To: <4D7635DA.9030707@redhat.com> References: <20110308113245.GR25641@random.random> <4D7635DA.9030707@redhat.com> Message-Id: <20110309161132.0409.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 8bit Date: Wed, 9 Mar 2011 16:11:15 +0900 (JST) Sender: owner-linux-mm@kvack.org List-ID: To: Rik van Riel Cc: kosaki.motohiro@jp.fujitsu.com, Andrea Arcangeli , Michel Lespinasse , Minchan Kim , Hugh Dickins , Andrew Morton , linux-mm > On 03/08/2011 06:32 AM, Andrea Arcangeli wrote: > > > Subject: thp: fix page_referenced to modify mapcount/vm_flags only if page is found > > > > From: Andrea Arcangeli > > > > When vmscan.c calls page_referenced, if an anon page was created before a > > process forked, rmap will search for it in both of the processes, even though > > one of them might have since broken COW. If the child process mlocks the vma > > where the COWed page belongs to, page_referenced() running on the page mapped > > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > > to the references on the parent page being ignored and evicting the parent page > > too early). > > > Signed-off-by: Andrea Arcangeli > > Reported-by: Michel Lespinasse > > Reviewed-by: Rik van Riel Thank you, Andrea. Reviewed-by: KOSAKI Motohiro -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail172.messagelabs.com (mail172.messagelabs.com [216.82.254.3]) by kanga.kvack.org (Postfix) with ESMTP id C68108D0039 for ; Wed, 9 Mar 2011 04:18:57 -0500 (EST) Date: Wed, 9 Mar 2011 10:18:40 +0100 From: Johannes Weiner Subject: Re: THP, rmap and page_referenced_one() Message-ID: <20110309091840.GC30778@cmpxchg.org> References: <20110308113245.GR25641@random.random> <20110308122115.GA28054@google.com> <20110308125830.GS25641@random.random> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20110308125830.GS25641@random.random> Sender: owner-linux-mm@kvack.org List-ID: To: Andrea Arcangeli Cc: Michel Lespinasse , Minchan Kim , Rik van Riel , Hugh Dickins , Andrew Morton , linux-mm On Tue, Mar 08, 2011 at 01:58:30PM +0100, Andrea Arcangeli wrote: > When vmscan.c calls page_referenced, if an anon page was created before a > process forked, rmap will search for it in both of the processes, even though > one of them might have since broken COW. If the child process mlocks the vma > where the COWed page belongs to, page_referenced() running on the page mapped > by the parent would lead to *vm_flags getting VM_LOCKED set erroneously (leading > to the references on the parent page being ignored and evicting the parent page > too early). > > *mapcount would also be decremented by page_referenced_one even if the page > wasn't found by page_check_address. > > This also let pmdp_clear_flush_young_notify() go ahead on a > pmd_trans_splitting() pmd. We hold the page_table_lock so > __split_huge_page_map() must wait the pmdp_clear_flush_young_notify() to > complete before it can modify the pmd. The pmd is also still mapped in userland > so the young bit may materialize through a tlb miss before split_huge_page_map > runs. This will provide a more accurate page_referenced() behavior during > split_huge_page(). > > Signed-off-by: Andrea Arcangeli > Reported-by: Michel Lespinasse > Reviewed-by: Michel Lespinasse Reviewed-by: Johannes Weiner -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: email@kvack.org