* [PATCH] mm: Check if PTE is already allocated during page fault @ 2011-04-15 10:12 Mel Gorman 2011-04-15 13:23 ` Rik van Riel ` (3 more replies) 0 siblings, 4 replies; 15+ messages in thread From: Mel Gorman @ 2011-04-15 10:12 UTC (permalink / raw) To: akpm Cc: Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable With transparent hugepage support, handle_mm_fault() has to be careful that a normal PMD has been established before handling a PTE fault. To achieve this, it used __pte_alloc() directly instead of pte_alloc_map as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() is called once it is known the PMD is safe. pte_alloc_map() is smart enough to check if a PTE is already present before calling __pte_alloc but this check was lost. As a consequence, PTEs may be allocated unnecessarily and the page table lock taken. Thi useless PTE does get cleaned up but it's a performance hit which is visible in page_test from aim9. This patch simply re-adds the check normally done by pte_alloc_map to check if the PTE needs to be allocated before taking the page table lock. The effect is noticable in page_test from aim9. AIM9 2.6.38-vanilla 2.6.38-checkptenone creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) page_test 38.10 ( 0.00%) 42.04 ( 9.37%) brk_test 52.45 ( 0.00%) 51.57 (-1.71%) exec_test 382.00 ( 0.00%) 456.90 (16.39%) fork_test 60.11 ( 0.00%) 67.79 (11.34%) MMTests Statistics: duration Total Elapsed Time (seconds) 611.90 612.22 (While this affects 2.6.38, it is a performance rather than a functional bug and normally outside the rules -stable. While the big performance differences are to a microbench, the difference in fork and exec performance may be significant enough that -stable wants to consider the patch) Reported-by: Raz Ben Yehuda <raziebe@gmail.com> Signed-off-by: Mel Gorman <mgorman@suse.de> -- mm/memory.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 5823698..1659574 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, * run pte_offset_map on the pmd, if an huge pmd could * materialize from under us from a different thread. */ - if (unlikely(__pte_alloc(mm, vma, pmd, address))) + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) return VM_FAULT_OOM; /* if an huge pmd materialized from under us just retry later */ if (unlikely(pmd_trans_huge(*pmd))) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman @ 2011-04-15 13:23 ` Rik van Riel 2011-04-15 14:39 ` Andrea Arcangeli ` (2 subsequent siblings) 3 siblings, 0 replies; 15+ messages in thread From: Rik van Riel @ 2011-04-15 13:23 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, raz ben yehuda, kosaki.motohiro, lkml, linux-mm, stable On 04/15/2011 06:12 AM, Mel Gorman wrote: > This patch simply re-adds the check normally done by pte_alloc_map to > check if the PTE needs to be allocated before taking the page table > lock. The effect is noticable in page_test from aim9. > Reported-by: Raz Ben Yehuda<raziebe@gmail.com> > Signed-off-by: Mel Gorman<mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> -- All rights reversed -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman 2011-04-15 13:23 ` Rik van Riel @ 2011-04-15 14:39 ` Andrea Arcangeli 2011-04-15 15:06 ` Andrea Arcangeli 2011-04-21 6:59 ` Minchan Kim 2011-04-27 13:50 ` Johannes Weiner 3 siblings, 1 reply; 15+ messages in thread From: Andrea Arcangeli @ 2011-04-15 14:39 UTC (permalink / raw) To: Mel Gorman Cc: akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 15, 2011 at 11:12:48AM +0100, Mel Gorman wrote: > diff --git a/mm/memory.c b/mm/memory.c > index 5823698..1659574 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * run pte_offset_map on the pmd, if an huge pmd could > * materialize from under us from a different thread. > */ > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > return VM_FAULT_OOM; > /* if an huge pmd materialized from under us just retry later */ > if (unlikely(pmd_trans_huge(*pmd))) Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 14:39 ` Andrea Arcangeli @ 2011-04-15 15:06 ` Andrea Arcangeli 2011-04-18 7:21 ` raz ben yehuda 2011-04-18 10:23 ` Mel Gorman 0 siblings, 2 replies; 15+ messages in thread From: Andrea Arcangeli @ 2011-04-15 15:06 UTC (permalink / raw) To: Mel Gorman Cc: akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 15, 2011 at 04:39:16PM +0200, Andrea Arcangeli wrote: > On Fri, Apr 15, 2011 at 11:12:48AM +0100, Mel Gorman wrote: > > diff --git a/mm/memory.c b/mm/memory.c > > index 5823698..1659574 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > * run pte_offset_map on the pmd, if an huge pmd could > > * materialize from under us from a different thread. > > */ > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) I started hacking on this and I noticed it'd be better to extend the unlikely through the end. At first review I didn't notice the parenthesis closure stops after pte_none and __pte_alloc is now uncovered. I'd prefer this: if (unlikely(pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, address))) I mean the real unlikely thing is that we return VM_FAULT_OOM, if we end up calling __pte_alloc or not, depends on the app. Generally it sounds more frequent that the pte is not none, so it's not wrong, but it's even less likely that __pte_alloc fails so that can be taken into account too, and __pte_alloc runs still quite frequently. So either above or: if (unlikely(pmd_none(*pmd)) && unlikely(__pte_alloc(mm, vma, pmd, address))) I generally prefer unlikely only when it's 100% sure thing it's less likely (like the VM_FAULT_OOM), so the first version I guess it's enough (I'm afraid unlikely for pte_none too, may make gcc generate a far away jump possibly going out of l1 icache for a case that is only 512 times less likely at best). My point is that it's certainly hugely more unlikely that __pte_alloc fails than the pte is none. This is a real nitpick though ;). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 15:06 ` Andrea Arcangeli @ 2011-04-18 7:21 ` raz ben yehuda 2011-04-18 10:23 ` Mel Gorman 1 sibling, 0 replies; 15+ messages in thread From: raz ben yehuda @ 2011-04-18 7:21 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, akpm, riel, kosaki.motohiro, lkml, linux-mm, stable, shai patch works great. thank you Andrea. On Fri, 2011-04-15 at 17:06 +0200, Andrea Arcangeli wrote: > On Fri, Apr 15, 2011 at 04:39:16PM +0200, Andrea Arcangeli wrote: > > On Fri, Apr 15, 2011 at 11:12:4A.M +0100, Mel Gorman wrote: > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 5823698..1659574 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > > * run pte_offset_map on the pmd, if an huge pmd could > > > * materialize from under us from a different thread. > > > */ > > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > > I started hacking on this and I noticed it'd be better to extend the > unlikely through the end. At first review I didn't notice the > parenthesis closure stops after pte_none and __pte_alloc is now > uncovered. I'd prefer this: > > if (unlikely(pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, address))) > > I mean the real unlikely thing is that we return VM_FAULT_OOM, if we > end up calling __pte_alloc or not, depends on the app. Generally it > sounds more frequent that the pte is not none, so it's not wrong, but > it's even less likely that __pte_alloc fails so that can be taken into > account too, and __pte_alloc runs still quite frequently. So either > above or: > > if (unlikely(pmd_none(*pmd)) && unlikely(__pte_alloc(mm, vma, pmd, address))) > > I generally prefer unlikely only when it's 100% sure thing it's less > likely (like the VM_FAULT_OOM), so the first version I guess it's > enough (I'm afraid unlikely for pte_none too, may make gcc generate a > far away jump possibly going out of l1 icache for a case that is only > 512 times less likely at best). My point is that it's certainly hugely > more unlikely that __pte_alloc fails than the pte is none. > > This is a real nitpick though ;). -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 15:06 ` Andrea Arcangeli 2011-04-18 7:21 ` raz ben yehuda @ 2011-04-18 10:23 ` Mel Gorman 1 sibling, 0 replies; 15+ messages in thread From: Mel Gorman @ 2011-04-18 10:23 UTC (permalink / raw) To: Andrea Arcangeli Cc: akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 15, 2011 at 05:06:06PM +0200, Andrea Arcangeli wrote: > On Fri, Apr 15, 2011 at 04:39:16PM +0200, Andrea Arcangeli wrote: > > On Fri, Apr 15, 2011 at 11:12:48AM +0100, Mel Gorman wrote: > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 5823698..1659574 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > > * run pte_offset_map on the pmd, if an huge pmd could > > > * materialize from under us from a different thread. > > > */ > > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > > I started hacking on this and I noticed it'd be better to extend the > unlikely through the end. At first review I didn't notice the > parenthesis closure stops after pte_none and __pte_alloc is now > uncovered. I'd prefer this: > > if (unlikely(pmd_none(*pmd) && __pte_alloc(mm, vma, pmd, address))) > I had this at one point and then decided to match what we do for pte_alloc_map(). My reasoning was that the most important part of this check is pmd_none(). It's relatively unlikely we even call __pte_alloc which is why I didn't think it belonged in the unlikely block. I also preferred being consistent with pte_alloc_map. > I mean the real unlikely thing is that we return VM_FAULT_OOM, if we > end up calling __pte_alloc or not, depends on the app. Generally it > sounds more frequent that the pte is not none, so it's not wrong, but > it's even less likely that __pte_alloc fails so that can be taken into > account too, and __pte_alloc runs still quite frequently. So either > above or: > > if (unlikely(pmd_none(*pmd)) && unlikely(__pte_alloc(mm, vma, pmd, address))) > I'd prefer this than putting everything inside the same unlikely block. But if this makes a noticeable, why do we not do it for pte_alloc_map, pmd_alloc and other similar functions? > I generally prefer unlikely only when it's 100% sure thing it's less > likely (like the VM_FAULT_OOM), so the first version I guess it's > enough (I'm afraid unlikely for pte_none too, may make gcc generate a > far away jump possibly going out of l1 icache for a case that is only > 512 times less likely at best). My point is that it's certainly hugely > more unlikely that __pte_alloc fails than the pte is none. > For the bug fix, it's best to match what pte_alloc_map, pmd_alloc, pud_alloc and others do in terms of how it uses unlikely. If what we are currently doing is sub-optimal, a single patch should change all the helpers. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman 2011-04-15 13:23 ` Rik van Riel 2011-04-15 14:39 ` Andrea Arcangeli @ 2011-04-21 6:59 ` Minchan Kim 2011-04-21 11:08 ` Mel Gorman 2011-04-27 13:50 ` Johannes Weiner 3 siblings, 1 reply; 15+ messages in thread From: Minchan Kim @ 2011-04-21 6:59 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable Hi Mel, On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote: > With transparent hugepage support, handle_mm_fault() has to be careful > that a normal PMD has been established before handling a PTE fault. To > achieve this, it used __pte_alloc() directly instead of pte_alloc_map > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() > is called once it is known the PMD is safe. > > pte_alloc_map() is smart enough to check if a PTE is already present > before calling __pte_alloc but this check was lost. As a consequence, > PTEs may be allocated unnecessarily and the page table lock taken. > Thi useless PTE does get cleaned up but it's a performance hit which > is visible in page_test from aim9. > > This patch simply re-adds the check normally done by pte_alloc_map to > check if the PTE needs to be allocated before taking the page table > lock. The effect is noticable in page_test from aim9. > > AIM9 > 2.6.38-vanilla 2.6.38-checkptenone > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) > exec_test 382.00 ( 0.00%) 456.90 (16.39%) > fork_test 60.11 ( 0.00%) 67.79 (11.34%) > MMTests Statistics: duration > Total Elapsed Time (seconds) 611.90 612.22 > > (While this affects 2.6.38, it is a performance rather than a > functional bug and normally outside the rules -stable. While the big > performance differences are to a microbench, the difference in fork > and exec performance may be significant enough that -stable wants to > consider the patch) > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> > -- > mm/memory.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index 5823698..1659574 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * run pte_offset_map on the pmd, if an huge pmd could > * materialize from under us from a different thread. > */ > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > return VM_FAULT_OOM; > /* if an huge pmd materialized from under us just retry later */ > if (unlikely(pmd_trans_huge(*pmd))) > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> Sorry for jumping in too late. I have a just nitpick. We have another place, do_huge_pmd_anonymous_page. Although it isn't workload of page_test, is it valuable to expand your patch to cover it? If there is workload there are many thread and share one shared anon vma in ALWAYS THP mode, same problem would happen. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 6:59 ` Minchan Kim @ 2011-04-21 11:08 ` Mel Gorman 2011-04-21 14:26 ` Minchan Kim 0 siblings, 1 reply; 15+ messages in thread From: Mel Gorman @ 2011-04-21 11:08 UTC (permalink / raw) To: Minchan Kim Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote: > Hi Mel, > > On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote: > > With transparent hugepage support, handle_mm_fault() has to be careful > > that a normal PMD has been established before handling a PTE fault. To > > achieve this, it used __pte_alloc() directly instead of pte_alloc_map > > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() > > is called once it is known the PMD is safe. > > > > pte_alloc_map() is smart enough to check if a PTE is already present > > before calling __pte_alloc but this check was lost. As a consequence, > > PTEs may be allocated unnecessarily and the page table lock taken. > > Thi useless PTE does get cleaned up but it's a performance hit which > > is visible in page_test from aim9. > > > > This patch simply re-adds the check normally done by pte_alloc_map to > > check if the PTE needs to be allocated before taking the page table > > lock. The effect is noticable in page_test from aim9. > > > > AIM9 > > 2.6.38-vanilla 2.6.38-checkptenone > > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) > > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) > > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) > > exec_test 382.00 ( 0.00%) 456.90 (16.39%) > > fork_test 60.11 ( 0.00%) 67.79 (11.34%) > > MMTests Statistics: duration > > Total Elapsed Time (seconds) 611.90 612.22 > > > > (While this affects 2.6.38, it is a performance rather than a > > functional bug and normally outside the rules -stable. While the big > > performance differences are to a microbench, the difference in fork > > and exec performance may be significant enough that -stable wants to > > consider the patch) > > > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > -- > > mm/memory.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/memory.c b/mm/memory.c > > index 5823698..1659574 100644 > > --- a/mm/memory.c > > +++ b/mm/memory.c > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > * run pte_offset_map on the pmd, if an huge pmd could > > * materialize from under us from a different thread. > > */ > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > > return VM_FAULT_OOM; > > /* if an huge pmd materialized from under us just retry later */ > > if (unlikely(pmd_trans_huge(*pmd))) > > > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> > > Sorry for jumping in too late. I have a just nitpick. > Better late than never :) > We have another place, do_huge_pmd_anonymous_page. > Although it isn't workload of page_test, is it valuable to expand your > patch to cover it? > If there is workload there are many thread and share one shared anon > vma in ALWAYS THP mode, same problem would happen. We already checked pmd_none() in handle_mm_fault() before calling into do_huge_pmd_anonymous_page(). We could race for the fault while attempting to allocate a huge page but it wouldn't be as severe a problem particularly as it is encountered after failing a 2M allocation. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 11:08 ` Mel Gorman @ 2011-04-21 14:26 ` Minchan Kim 2011-04-21 16:00 ` Mel Gorman 0 siblings, 1 reply; 15+ messages in thread From: Minchan Kim @ 2011-04-21 14:26 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Thu, Apr 21, 2011 at 12:08:41PM +0100, Mel Gorman wrote: > On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote: > > Hi Mel, > > > > On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote: > > > With transparent hugepage support, handle_mm_fault() has to be careful > > > that a normal PMD has been established before handling a PTE fault. To > > > achieve this, it used __pte_alloc() directly instead of pte_alloc_map > > > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() > > > is called once it is known the PMD is safe. > > > > > > pte_alloc_map() is smart enough to check if a PTE is already present > > > before calling __pte_alloc but this check was lost. As a consequence, > > > PTEs may be allocated unnecessarily and the page table lock taken. > > > Thi useless PTE does get cleaned up but it's a performance hit which > > > is visible in page_test from aim9. > > > > > > This patch simply re-adds the check normally done by pte_alloc_map to > > > check if the PTE needs to be allocated before taking the page table > > > lock. The effect is noticable in page_test from aim9. > > > > > > AIM9 > > > 2.6.38-vanilla 2.6.38-checkptenone > > > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) > > > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) > > > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) > > > exec_test 382.00 ( 0.00%) 456.90 (16.39%) > > > fork_test 60.11 ( 0.00%) 67.79 (11.34%) > > > MMTests Statistics: duration > > > Total Elapsed Time (seconds) 611.90 612.22 > > > > > > (While this affects 2.6.38, it is a performance rather than a > > > functional bug and normally outside the rules -stable. While the big > > > performance differences are to a microbench, the difference in fork > > > and exec performance may be significant enough that -stable wants to > > > consider the patch) > > > > > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > -- > > > mm/memory.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 5823698..1659574 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > > * run pte_offset_map on the pmd, if an huge pmd could > > > * materialize from under us from a different thread. > > > */ > > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > > > return VM_FAULT_OOM; > > > /* if an huge pmd materialized from under us just retry later */ > > > if (unlikely(pmd_trans_huge(*pmd))) > > > > > > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> > > > > Sorry for jumping in too late. I have a just nitpick. > > > > Better late than never :) > > > We have another place, do_huge_pmd_anonymous_page. > > Although it isn't workload of page_test, is it valuable to expand your > > patch to cover it? > > If there is workload there are many thread and share one shared anon > > vma in ALWAYS THP mode, same problem would happen. > > We already checked pmd_none() in handle_mm_fault() before calling > into do_huge_pmd_anonymous_page(). We could race for the fault while > attempting to allocate a huge page but it wouldn't be as severe a > problem particularly as it is encountered after failing a 2M allocation. Right you are. Fail ot 2M allocation would affect as throttle. Thanks. As I failed let you add the check, I have to reveal my mind. :) Actually, what I want is consistency of the code. The code have been same in two places but you find the problem in page_test of aim9, you changed one of them slightly. I think in future someone will have a question about that and he will start grep git log but it will take a long time as the log is buried other code piled up. I hope adding the comment in this case. /* * PTEs may be allocated unnecessarily and the page table lock taken. * The useless PTE does get cleaned up but it's a performance hit in * some micro-benchmark. Let's check pmd_none before __pte_alloc to * reduce the overhead. */ - if (unlikely(__pte_alloc(mm, vma, pmd, address))) + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) If you mind it as someone who have a question can find the log at last although he need some time, I wouldn't care of the nitpick any more. :) It's up to you. Thanks, Mel. > > -- > Mel Gorman > SUSE Labs -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 14:26 ` Minchan Kim @ 2011-04-21 16:00 ` Mel Gorman 2011-04-21 16:14 ` Andrea Arcangeli 2011-04-22 1:01 ` Minchan Kim 0 siblings, 2 replies; 15+ messages in thread From: Mel Gorman @ 2011-04-21 16:00 UTC (permalink / raw) To: Minchan Kim Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Thu, Apr 21, 2011 at 11:26:36PM +0900, Minchan Kim wrote: > On Thu, Apr 21, 2011 at 12:08:41PM +0100, Mel Gorman wrote: > > On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote: > > > Hi Mel, > > > > > > On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote: > > > > With transparent hugepage support, handle_mm_fault() has to be careful > > > > that a normal PMD has been established before handling a PTE fault. To > > > > achieve this, it used __pte_alloc() directly instead of pte_alloc_map > > > > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() > > > > is called once it is known the PMD is safe. > > > > > > > > pte_alloc_map() is smart enough to check if a PTE is already present > > > > before calling __pte_alloc but this check was lost. As a consequence, > > > > PTEs may be allocated unnecessarily and the page table lock taken. > > > > Thi useless PTE does get cleaned up but it's a performance hit which > > > > is visible in page_test from aim9. > > > > > > > > This patch simply re-adds the check normally done by pte_alloc_map to > > > > check if the PTE needs to be allocated before taking the page table > > > > lock. The effect is noticable in page_test from aim9. > > > > > > > > AIM9 > > > > 2.6.38-vanilla 2.6.38-checkptenone > > > > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) > > > > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) > > > > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) > > > > exec_test 382.00 ( 0.00%) 456.90 (16.39%) > > > > fork_test 60.11 ( 0.00%) 67.79 (11.34%) > > > > MMTests Statistics: duration > > > > Total Elapsed Time (seconds) 611.90 612.22 > > > > > > > > (While this affects 2.6.38, it is a performance rather than a > > > > functional bug and normally outside the rules -stable. While the big > > > > performance differences are to a microbench, the difference in fork > > > > and exec performance may be significant enough that -stable wants to > > > > consider the patch) > > > > > > > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> > > > > Signed-off-by: Mel Gorman <mgorman@suse.de> > > > > -- > > > > mm/memory.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > > index 5823698..1659574 100644 > > > > --- a/mm/memory.c > > > > +++ b/mm/memory.c > > > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > > > * run pte_offset_map on the pmd, if an huge pmd could > > > > * materialize from under us from a different thread. > > > > */ > > > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > > > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > > > > return VM_FAULT_OOM; > > > > /* if an huge pmd materialized from under us just retry later */ > > > > if (unlikely(pmd_trans_huge(*pmd))) > > > > > > > > > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> > > > > > > Sorry for jumping in too late. I have a just nitpick. > > > > > > > Better late than never :) > > > > > We have another place, do_huge_pmd_anonymous_page. > > > Although it isn't workload of page_test, is it valuable to expand your > > > patch to cover it? > > > If there is workload there are many thread and share one shared anon > > > vma in ALWAYS THP mode, same problem would happen. > > > > We already checked pmd_none() in handle_mm_fault() before calling > > into do_huge_pmd_anonymous_page(). We could race for the fault while > > attempting to allocate a huge page but it wouldn't be as severe a > > problem particularly as it is encountered after failing a 2M allocation. > > Right you are. Fail ot 2M allocation would affect as throttle. > Thanks. > > As I failed let you add the check, I have to reveal my mind. :) > Actually, what I want is consistency of the code. This is a stronger arguement than as a performance fix. I was concerned that if such a check was added that it would confuse someone in a years time trying to figure out why the pmd_none check was really necessary. > The code have been same in two places but you find the problem in page_test of aim9, > you changed one of them slightly. I think in future someone will > have a question about that and he will start grep git log but it will take > a long time as the log is buried other code piled up. > Fair point. > I hope adding the comment in this case. > > /* > * PTEs may be allocated unnecessarily and the page table lock taken. > * The useless PTE does get cleaned up but it's a performance hit in > * some micro-benchmark. Let's check pmd_none before __pte_alloc to > * reduce the overhead. > */ > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) > I think a better justification is /* * Even though handle_mm_fault has already checked pmd_none, we * have failed a huge allocation at this point during which a * valid PTE could have been inserted. Double check a PTE alloc * is still necessary to avoid additional overhead */ > If you mind it as someone who have a question can find the log at last > although he need some time, I wouldn't care of the nitpick any more. :) > It's up to you. > If you want to create a new patch with either your comment or mine (whichever you prefer) I'll add my ack. I'm about to drop offline for a few days but if it's still there Tuesday, I'll put together an appropriate patch and submit. I'd keep it separate from the other patch because it's a performance fix (which I'd like to see in -stable) where as this is more of a cleanup IMO. Thanks -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 16:00 ` Mel Gorman @ 2011-04-21 16:14 ` Andrea Arcangeli 2011-04-22 0:54 ` Minchan Kim 2011-04-22 1:01 ` Minchan Kim 1 sibling, 1 reply; 15+ messages in thread From: Andrea Arcangeli @ 2011-04-21 16:14 UTC (permalink / raw) To: Mel Gorman Cc: Minchan Kim, akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Thu, Apr 21, 2011 at 05:00:57PM +0100, Mel Gorman wrote: > If you want to create a new patch with either your comment or mine > (whichever you prefer) I'll add my ack. I'm about to drop offline > for a few days but if it's still there Tuesday, I'll put together an > appropriate patch and submit. I'd keep it separate from the other patch > because it's a performance fix (which I'd like to see in -stable) where > as this is more of a cleanup IMO. I think the older patch should have more priority agreed. This one may actually waste cpu cycles overall, rather than saving them, it shouldn't be a common occurrence. >From a code consistency point of view maybe we should just implement a pte_alloc macro (to put after pte_alloc_map) and use it in both places, and hide the glory details of the unlikely in the macro. When implementing pte_alloc, I suggest also adding unlikely to both, I mean we added unlikely to the fast path ok, but __pte_alloc is orders of magnitude less likely to fail than pte_none, and it still runs 1 every 512 4k page faults, so I think __pte_alloc deserves an unlikely too. Minchan, you suggested this cleanup, so I suggest you to send a patch, but if you're busy we can help. Thanks! Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 16:14 ` Andrea Arcangeli @ 2011-04-22 0:54 ` Minchan Kim 2011-04-26 13:00 ` Andrea Arcangeli 0 siblings, 1 reply; 15+ messages in thread From: Minchan Kim @ 2011-04-22 0:54 UTC (permalink / raw) To: Andrea Arcangeli Cc: Mel Gorman, akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable Hi Andrea, On Fri, Apr 22, 2011 at 1:14 AM, Andrea Arcangeli <aarcange@redhat.com> wrote: > On Thu, Apr 21, 2011 at 05:00:57PM +0100, Mel Gorman wrote: >> If you want to create a new patch with either your comment or mine >> (whichever you prefer) I'll add my ack. I'm about to drop offline >> for a few days but if it's still there Tuesday, I'll put together an >> appropriate patch and submit. I'd keep it separate from the other patch >> because it's a performance fix (which I'd like to see in -stable) where >> as this is more of a cleanup IMO. > > I think the older patch should have more priority agreed. This one may > actually waste cpu cycles overall, rather than saving them, it > shouldn't be a common occurrence. > > From a code consistency point of view maybe we should just implement a > pte_alloc macro (to put after pte_alloc_map) and use it in both > places, and hide the glory details of the unlikely in the macro. When > implementing pte_alloc, I suggest also adding unlikely to both, I mean > we added unlikely to the fast path ok, but __pte_alloc is orders of > magnitude less likely to fail than pte_none, and it still runs 1 every > 512 4k page faults, so I think __pte_alloc deserves an unlikely too. > > Minchan, you suggested this cleanup, so I suggest you to send a patch, > but if you're busy we can help. It's no problem to send a patch but I can do it at out-of-office time. Maybe weekend. :) Before doing that, let's clear the point. You mentioned it shouldn't be a common occurrence but you are suggesting we should do for code consistency POV. Am I right? > > Thanks! > Andrea > -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-22 0:54 ` Minchan Kim @ 2011-04-26 13:00 ` Andrea Arcangeli 0 siblings, 0 replies; 15+ messages in thread From: Andrea Arcangeli @ 2011-04-26 13:00 UTC (permalink / raw) To: Minchan Kim Cc: Mel Gorman, akpm, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 22, 2011 at 09:54:24AM +0900, Minchan Kim wrote: > Before doing that, let's clear the point. You mentioned it shouldn't > be a common occurrence but you are suggesting we should do for code > consistency POV. Am I right? Yes, for code consistency and cleanup. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-21 16:00 ` Mel Gorman 2011-04-21 16:14 ` Andrea Arcangeli @ 2011-04-22 1:01 ` Minchan Kim 1 sibling, 0 replies; 15+ messages in thread From: Minchan Kim @ 2011-04-22 1:01 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 22, 2011 at 1:00 AM, Mel Gorman <mgorman@suse.de> wrote: > On Thu, Apr 21, 2011 at 11:26:36PM +0900, Minchan Kim wrote: >> On Thu, Apr 21, 2011 at 12:08:41PM +0100, Mel Gorman wrote: >> > On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote: >> > > Hi Mel, >> > > >> > > On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote: >> > > > With transparent hugepage support, handle_mm_fault() has to be careful >> > > > that a normal PMD has been established before handling a PTE fault. To >> > > > achieve this, it used __pte_alloc() directly instead of pte_alloc_map >> > > > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() >> > > > is called once it is known the PMD is safe. >> > > > >> > > > pte_alloc_map() is smart enough to check if a PTE is already present >> > > > before calling __pte_alloc but this check was lost. As a consequence, >> > > > PTEs may be allocated unnecessarily and the page table lock taken. >> > > > Thi useless PTE does get cleaned up but it's a performance hit which >> > > > is visible in page_test from aim9. >> > > > >> > > > This patch simply re-adds the check normally done by pte_alloc_map to >> > > > check if the PTE needs to be allocated before taking the page table >> > > > lock. The effect is noticable in page_test from aim9. >> > > > >> > > > AIM9 >> > > > 2.6.38-vanilla 2.6.38-checkptenone >> > > > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) >> > > > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) >> > > > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) >> > > > exec_test 382.00 ( 0.00%) 456.90 (16.39%) >> > > > fork_test 60.11 ( 0.00%) 67.79 (11.34%) >> > > > MMTests Statistics: duration >> > > > Total Elapsed Time (seconds) 611.90 612.22 >> > > > >> > > > (While this affects 2.6.38, it is a performance rather than a >> > > > functional bug and normally outside the rules -stable. While the big >> > > > performance differences are to a microbench, the difference in fork >> > > > and exec performance may be significant enough that -stable wants to >> > > > consider the patch) >> > > > >> > > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> >> > > > Signed-off-by: Mel Gorman <mgorman@suse.de> >> > > > -- >> > > > mm/memory.c | 2 +- >> > > > 1 file changed, 1 insertion(+), 1 deletion(-) >> > > > >> > > > diff --git a/mm/memory.c b/mm/memory.c >> > > > index 5823698..1659574 100644 >> > > > --- a/mm/memory.c >> > > > +++ b/mm/memory.c >> > > > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> > > > * run pte_offset_map on the pmd, if an huge pmd could >> > > > * materialize from under us from a different thread. >> > > > */ >> > > > - if (unlikely(__pte_alloc(mm, vma, pmd, address))) >> > > > + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) >> > > > return VM_FAULT_OOM; >> > > > /* if an huge pmd materialized from under us just retry later */ >> > > > if (unlikely(pmd_trans_huge(*pmd))) >> > > > >> > > >> > > Reviewed-by: Minchan Kim <minchan.kim@gmail.com> >> > > >> > > Sorry for jumping in too late. I have a just nitpick. >> > > >> > >> > Better late than never :) >> > >> > > We have another place, do_huge_pmd_anonymous_page. >> > > Although it isn't workload of page_test, is it valuable to expand your >> > > patch to cover it? >> > > If there is workload there are many thread and share one shared anon >> > > vma in ALWAYS THP mode, same problem would happen. >> > >> > We already checked pmd_none() in handle_mm_fault() before calling >> > into do_huge_pmd_anonymous_page(). We could race for the fault while >> > attempting to allocate a huge page but it wouldn't be as severe a >> > problem particularly as it is encountered after failing a 2M allocation. >> >> Right you are. Fail ot 2M allocation would affect as throttle. >> Thanks. >> >> As I failed let you add the check, I have to reveal my mind. :) >> Actually, what I want is consistency of the code. > > This is a stronger arguement than as a performance fix. I was concerned > that if such a check was added that it would confuse someone in a years > time trying to figure out why the pmd_none check was really necessary. > >> The code have been same in two places but you find the problem in page_test of aim9, >> you changed one of them slightly. I think in future someone will >> have a question about that and he will start grep git log but it will take >> a long time as the log is buried other code piled up. >> > > Fair point. > >> I hope adding the comment in this case. >> >> /* >> * PTEs may be allocated unnecessarily and the page table lock taken. >> * The useless PTE does get cleaned up but it's a performance hit in >> * some micro-benchmark. Let's check pmd_none before __pte_alloc to >> * reduce the overhead. >> */ >> - if (unlikely(__pte_alloc(mm, vma, pmd, address))) >> + if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address)) >> > > I think a better justification is > > /* > * Even though handle_mm_fault has already checked pmd_none, we > * have failed a huge allocation at this point during which a > * valid PTE could have been inserted. Double check a PTE alloc > * is still necessary to avoid additional overhead > */ > Hmm. If we disable thp, the comment about failing a huge allocation. was not true. So I prefer mine :) But Andrea suggested defining new pte_alloc which checks pmd_none internally for code consistency POV. In such case, I have no concern about comment. Is it okay? >> If you mind it as someone who have a question can find the log at last >> although he need some time, I wouldn't care of the nitpick any more. :) >> It's up to you. >> > > If you want to create a new patch with either your comment or mine > (whichever you prefer) I'll add my ack. I'm about to drop offline > for a few days but if it's still there Tuesday, I'll put together an > appropriate patch and submit. I'd keep it separate from the other patch > because it's a performance fix (which I'd like to see in -stable) where > as this is more of a cleanup IMO. Okay. Thanks, Mel. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] mm: Check if PTE is already allocated during page fault 2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman ` (2 preceding siblings ...) 2011-04-21 6:59 ` Minchan Kim @ 2011-04-27 13:50 ` Johannes Weiner 3 siblings, 0 replies; 15+ messages in thread From: Johannes Weiner @ 2011-04-27 13:50 UTC (permalink / raw) To: Mel Gorman Cc: akpm, Andrea Arcangeli, raz ben yehuda, riel, kosaki.motohiro, lkml, linux-mm, stable On Fri, Apr 15, 2011 at 11:12:48AM +0100, Mel Gorman wrote: > With transparent hugepage support, handle_mm_fault() has to be careful > that a normal PMD has been established before handling a PTE fault. To > achieve this, it used __pte_alloc() directly instead of pte_alloc_map > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map() > is called once it is known the PMD is safe. > > pte_alloc_map() is smart enough to check if a PTE is already present > before calling __pte_alloc but this check was lost. As a consequence, > PTEs may be allocated unnecessarily and the page table lock taken. > Thi useless PTE does get cleaned up but it's a performance hit which > is visible in page_test from aim9. > > This patch simply re-adds the check normally done by pte_alloc_map to > check if the PTE needs to be allocated before taking the page table > lock. The effect is noticable in page_test from aim9. > > AIM9 > 2.6.38-vanilla 2.6.38-checkptenone > creat-clo 446.10 ( 0.00%) 424.47 (-5.10%) > page_test 38.10 ( 0.00%) 42.04 ( 9.37%) > brk_test 52.45 ( 0.00%) 51.57 (-1.71%) > exec_test 382.00 ( 0.00%) 456.90 (16.39%) > fork_test 60.11 ( 0.00%) 67.79 (11.34%) > MMTests Statistics: duration > Total Elapsed Time (seconds) 611.90 612.22 > > (While this affects 2.6.38, it is a performance rather than a > functional bug and normally outside the rules -stable. While the big > performance differences are to a microbench, the difference in fork > and exec performance may be significant enough that -stable wants to > consider the patch) > > Reported-by: Raz Ben Yehuda <raziebe@gmail.com> > Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Johannes Weiner <hannes@cmpxchg.org> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2011-04-27 13:51 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman 2011-04-15 13:23 ` Rik van Riel 2011-04-15 14:39 ` Andrea Arcangeli 2011-04-15 15:06 ` Andrea Arcangeli 2011-04-18 7:21 ` raz ben yehuda 2011-04-18 10:23 ` Mel Gorman 2011-04-21 6:59 ` Minchan Kim 2011-04-21 11:08 ` Mel Gorman 2011-04-21 14:26 ` Minchan Kim 2011-04-21 16:00 ` Mel Gorman 2011-04-21 16:14 ` Andrea Arcangeli 2011-04-22 0:54 ` Minchan Kim 2011-04-26 13:00 ` Andrea Arcangeli 2011-04-22 1:01 ` Minchan Kim 2011-04-27 13:50 ` Johannes Weiner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).