All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: akpm@linux-foundation.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	raz ben yehuda <raziebe@gmail.com>,
	riel@redhat.com, kosaki.motohiro@jp.fujitsu.com,
	lkml <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, stable@kernel.org
Subject: [PATCH] mm: Check if PTE is already allocated during page fault
Date: Fri, 15 Apr 2011 11:12:48 +0100	[thread overview]
Message-ID: <20110415101248.GB22688@suse.de> (raw)

With transparent hugepage support, handle_mm_fault() has to be careful
that a normal PMD has been established before handling a PTE fault. To
achieve this, it used __pte_alloc() directly instead of pte_alloc_map
as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map()
is called once it is known the PMD is safe.

pte_alloc_map() is smart enough to check if a PTE is already present
before calling __pte_alloc but this check was lost. As a consequence,
PTEs may be allocated unnecessarily and the page table lock taken.
Thi useless PTE does get cleaned up but it's a performance hit which
is visible in page_test from aim9.

This patch simply re-adds the check normally done by pte_alloc_map to
check if the PTE needs to be allocated before taking the page table
lock. The effect is noticable in page_test from aim9.

AIM9
                2.6.38-vanilla 2.6.38-checkptenone
creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
exec_test      382.00 ( 0.00%)   456.90 (16.39%)
fork_test       60.11 ( 0.00%)    67.79 (11.34%)
MMTests Statistics: duration
Total Elapsed Time (seconds)                611.90    612.22

(While this affects 2.6.38, it is a performance rather than a
functional bug and normally outside the rules -stable. While the big
performance differences are to a microbench, the difference in fork
and exec performance may be significant enough that -stable wants to
consider the patch)

Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
-- 
 mm/memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5823698..1659574 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * run pte_offset_map on the pmd, if an huge pmd could
 	 * materialize from under us from a different thread.
 	 */
-	if (unlikely(__pte_alloc(mm, vma, pmd, address)))
+	if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address))
 		return VM_FAULT_OOM;
 	/* if an huge pmd materialized from under us just retry later */
 	if (unlikely(pmd_trans_huge(*pmd)))

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@suse.de>
To: akpm@linux-foundation.org
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	raz ben yehuda <raziebe@gmail.com>,
	riel@redhat.com, kosaki.motohiro@jp.fujitsu.com,
	lkml <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, stable@kernel.org
Subject: [PATCH] mm: Check if PTE is already allocated during page fault
Date: Fri, 15 Apr 2011 11:12:48 +0100	[thread overview]
Message-ID: <20110415101248.GB22688@suse.de> (raw)

With transparent hugepage support, handle_mm_fault() has to be careful
that a normal PMD has been established before handling a PTE fault. To
achieve this, it used __pte_alloc() directly instead of pte_alloc_map
as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map()
is called once it is known the PMD is safe.

pte_alloc_map() is smart enough to check if a PTE is already present
before calling __pte_alloc but this check was lost. As a consequence,
PTEs may be allocated unnecessarily and the page table lock taken.
Thi useless PTE does get cleaned up but it's a performance hit which
is visible in page_test from aim9.

This patch simply re-adds the check normally done by pte_alloc_map to
check if the PTE needs to be allocated before taking the page table
lock. The effect is noticable in page_test from aim9.

AIM9
                2.6.38-vanilla 2.6.38-checkptenone
creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
exec_test      382.00 ( 0.00%)   456.90 (16.39%)
fork_test       60.11 ( 0.00%)    67.79 (11.34%)
MMTests Statistics: duration
Total Elapsed Time (seconds)                611.90    612.22

(While this affects 2.6.38, it is a performance rather than a
functional bug and normally outside the rules -stable. While the big
performance differences are to a microbench, the difference in fork
and exec performance may be significant enough that -stable wants to
consider the patch)

Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
-- 
 mm/memory.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index 5823698..1659574 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	 * run pte_offset_map on the pmd, if an huge pmd could
 	 * materialize from under us from a different thread.
 	 */
-	if (unlikely(__pte_alloc(mm, vma, pmd, address)))
+	if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address))
 		return VM_FAULT_OOM;
 	/* if an huge pmd materialized from under us just retry later */
 	if (unlikely(pmd_trans_huge(*pmd)))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

             reply	other threads:[~2011-04-15 10:12 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 10:12 Mel Gorman [this message]
2011-04-15 10:12 ` [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman
2011-04-15 13:23 ` Rik van Riel
2011-04-15 13:23   ` Rik van Riel
2011-04-15 14:39 ` Andrea Arcangeli
2011-04-15 14:39   ` Andrea Arcangeli
2011-04-15 15:06   ` Andrea Arcangeli
2011-04-15 15:06     ` Andrea Arcangeli
2011-04-18  7:21     ` raz ben yehuda
2011-04-18  7:21       ` raz ben yehuda
2011-04-18 10:23     ` Mel Gorman
2011-04-18 10:23       ` Mel Gorman
2011-04-21  6:59 ` Minchan Kim
2011-04-21  6:59   ` Minchan Kim
2011-04-21 11:08   ` Mel Gorman
2011-04-21 11:08     ` Mel Gorman
2011-04-21 14:26     ` Minchan Kim
2011-04-21 14:26       ` Minchan Kim
2011-04-21 16:00       ` Mel Gorman
2011-04-21 16:00         ` Mel Gorman
2011-04-21 16:14         ` Andrea Arcangeli
2011-04-21 16:14           ` Andrea Arcangeli
2011-04-22  0:54           ` Minchan Kim
2011-04-22  0:54             ` Minchan Kim
2011-04-26 13:00             ` Andrea Arcangeli
2011-04-26 13:00               ` Andrea Arcangeli
2011-04-22  1:01         ` Minchan Kim
2011-04-22  1:01           ` Minchan Kim
2011-04-27 13:50 ` Johannes Weiner
2011-04-27 13:50   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110415101248.GB22688@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=raziebe@gmail.com \
    --cc=riel@redhat.com \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.