Re: [PATCH] mm: Check if PTE is already allocated during page fault

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: akpm@linux-foundation.org, Andrea Arcangeli <aarcange@redhat.com>,
	raz ben yehuda <raziebe@gmail.com>,
	riel@redhat.com, kosaki.motohiro@jp.fujitsu.com,
	lkml <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, stable@kernel.org
Subject: Re: [PATCH] mm: Check if PTE is already allocated during page fault
Date: Thu, 21 Apr 2011 12:08:41 +0100	[thread overview]
Message-ID: <20110421110841.GA612@suse.de> (raw)
In-Reply-To: <BANLkTik7H+cmA8iToV4j1ncbQqeraCaeTg@mail.gmail.com>

On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote:
> Hi Mel,
> 
> On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote:
> > With transparent hugepage support, handle_mm_fault() has to be careful
> > that a normal PMD has been established before handling a PTE fault. To
> > achieve this, it used __pte_alloc() directly instead of pte_alloc_map
> > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map()
> > is called once it is known the PMD is safe.
> >
> > pte_alloc_map() is smart enough to check if a PTE is already present
> > before calling __pte_alloc but this check was lost. As a consequence,
> > PTEs may be allocated unnecessarily and the page table lock taken.
> > Thi useless PTE does get cleaned up but it's a performance hit which
> > is visible in page_test from aim9.
> >
> > This patch simply re-adds the check normally done by pte_alloc_map to
> > check if the PTE needs to be allocated before taking the page table
> > lock. The effect is noticable in page_test from aim9.
> >
> > AIM9
> >                2.6.38-vanilla 2.6.38-checkptenone
> > creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
> > page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
> > brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
> > exec_test      382.00 ( 0.00%)   456.90 (16.39%)
> > fork_test       60.11 ( 0.00%)    67.79 (11.34%)
> > MMTests Statistics: duration
> > Total Elapsed Time (seconds)                611.90    612.22
> >
> > (While this affects 2.6.38, it is a performance rather than a
> > functional bug and normally outside the rules -stable. While the big
> > performance differences are to a microbench, the difference in fork
> > and exec performance may be significant enough that -stable wants to
> > consider the patch)
> >
> > Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > --
> >  mm/memory.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 5823698..1659574 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >         * run pte_offset_map on the pmd, if an huge pmd could
> >         * materialize from under us from a different thread.
> >         */
> > -       if (unlikely(__pte_alloc(mm, vma, pmd, address)))
> > +       if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address))
> >                return VM_FAULT_OOM;
> >        /* if an huge pmd materialized from under us just retry later */
> >        if (unlikely(pmd_trans_huge(*pmd)))
> >
> 
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Sorry for jumping in too late. I have a just nitpick.
> 

Better late than never :)

> We have another place, do_huge_pmd_anonymous_page.
> Although it isn't workload of page_test, is it valuable to expand your
> patch to cover it?
> If there is workload there are many thread and share one shared anon
> vma in ALWAYS THP mode, same problem would happen.

We already checked pmd_none() in handle_mm_fault() before calling
into do_huge_pmd_anonymous_page(). We could race for the fault while
attempting to allocate a huge page but it wouldn't be as severe a
problem particularly as it is encountered after failing a 2M allocation.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)

From: Mel Gorman <mgorman@suse.de>
To: Minchan Kim <minchan.kim@gmail.com>
Cc: akpm@linux-foundation.org, Andrea Arcangeli <aarcange@redhat.com>,
	raz ben yehuda <raziebe@gmail.com>,
	riel@redhat.com, kosaki.motohiro@jp.fujitsu.com,
	lkml <linux-kernel@vger.kernel.org>,
	linux-mm@kvack.org, stable@kernel.org
Subject: Re: [PATCH] mm: Check if PTE is already allocated during page fault
Date: Thu, 21 Apr 2011 12:08:41 +0100	[thread overview]
Message-ID: <20110421110841.GA612@suse.de> (raw)
In-Reply-To: <BANLkTik7H+cmA8iToV4j1ncbQqeraCaeTg@mail.gmail.com>

On Thu, Apr 21, 2011 at 03:59:47PM +0900, Minchan Kim wrote:
> Hi Mel,
> 
> On Fri, Apr 15, 2011 at 7:12 PM, Mel Gorman <mgorman@suse.de> wrote:
> > With transparent hugepage support, handle_mm_fault() has to be careful
> > that a normal PMD has been established before handling a PTE fault. To
> > achieve this, it used __pte_alloc() directly instead of pte_alloc_map
> > as pte_alloc_map is unsafe to run against a huge PMD. pte_offset_map()
> > is called once it is known the PMD is safe.
> >
> > pte_alloc_map() is smart enough to check if a PTE is already present
> > before calling __pte_alloc but this check was lost. As a consequence,
> > PTEs may be allocated unnecessarily and the page table lock taken.
> > Thi useless PTE does get cleaned up but it's a performance hit which
> > is visible in page_test from aim9.
> >
> > This patch simply re-adds the check normally done by pte_alloc_map to
> > check if the PTE needs to be allocated before taking the page table
> > lock. The effect is noticable in page_test from aim9.
> >
> > AIM9
> >                2.6.38-vanilla 2.6.38-checkptenone
> > creat-clo      446.10 ( 0.00%)   424.47 (-5.10%)
> > page_test       38.10 ( 0.00%)    42.04 ( 9.37%)
> > brk_test        52.45 ( 0.00%)    51.57 (-1.71%)
> > exec_test      382.00 ( 0.00%)   456.90 (16.39%)
> > fork_test       60.11 ( 0.00%)    67.79 (11.34%)
> > MMTests Statistics: duration
> > Total Elapsed Time (seconds)                611.90    612.22
> >
> > (While this affects 2.6.38, it is a performance rather than a
> > functional bug and normally outside the rules -stable. While the big
> > performance differences are to a microbench, the difference in fork
> > and exec performance may be significant enough that -stable wants to
> > consider the patch)
> >
> > Reported-by: Raz Ben Yehuda <raziebe@gmail.com>
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > --
> >  mm/memory.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 5823698..1659574 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -3322,7 +3322,7 @@ int handle_mm_fault(struct mm_struct *mm, struct vm_area_struct *vma,
> >         * run pte_offset_map on the pmd, if an huge pmd could
> >         * materialize from under us from a different thread.
> >         */
> > -       if (unlikely(__pte_alloc(mm, vma, pmd, address)))
> > +       if (unlikely(pmd_none(*pmd)) && __pte_alloc(mm, vma, pmd, address))
> >                return VM_FAULT_OOM;
> >        /* if an huge pmd materialized from under us just retry later */
> >        if (unlikely(pmd_trans_huge(*pmd)))
> >
> 
> Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
> 
> Sorry for jumping in too late. I have a just nitpick.
> 

Better late than never :)

> We have another place, do_huge_pmd_anonymous_page.
> Although it isn't workload of page_test, is it valuable to expand your
> patch to cover it?
> If there is workload there are many thread and share one shared anon
> vma in ALWAYS THP mode, same problem would happen.

We already checked pmd_none() in handle_mm_fault() before calling
into do_huge_pmd_anonymous_page(). We could race for the fault while
attempting to allocate a huge page but it wouldn't be as severe a
problem particularly as it is encountered after failing a 2M allocation.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

next prev parent reply	other threads:[~2011-04-21 11:08 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-04-15 10:12 [PATCH] mm: Check if PTE is already allocated during page fault Mel Gorman
2011-04-15 10:12 ` Mel Gorman
2011-04-15 13:23 ` Rik van Riel
2011-04-15 13:23   ` Rik van Riel
2011-04-15 14:39 ` Andrea Arcangeli
2011-04-15 14:39   ` Andrea Arcangeli
2011-04-15 15:06   ` Andrea Arcangeli
2011-04-15 15:06     ` Andrea Arcangeli
2011-04-18  7:21     ` raz ben yehuda
2011-04-18  7:21       ` raz ben yehuda
2011-04-18 10:23     ` Mel Gorman
2011-04-18 10:23       ` Mel Gorman
2011-04-21  6:59 ` Minchan Kim
2011-04-21  6:59   ` Minchan Kim
2011-04-21 11:08   ` Mel Gorman [this message]
2011-04-21 11:08     ` Mel Gorman
2011-04-21 14:26     ` Minchan Kim
2011-04-21 14:26       ` Minchan Kim
2011-04-21 16:00       ` Mel Gorman
2011-04-21 16:00         ` Mel Gorman
2011-04-21 16:14         ` Andrea Arcangeli
2011-04-21 16:14           ` Andrea Arcangeli
2011-04-22  0:54           ` Minchan Kim
2011-04-22  0:54             ` Minchan Kim
2011-04-26 13:00             ` Andrea Arcangeli
2011-04-26 13:00               ` Andrea Arcangeli
2011-04-22  1:01         ` Minchan Kim
2011-04-22  1:01           ` Minchan Kim
2011-04-27 13:50 ` Johannes Weiner
2011-04-27 13:50   ` Johannes Weiner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110421110841.GA612@suse.de \
    --to=mgorman@suse.de \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan.kim@gmail.com \
    --cc=raziebe@gmail.com \
    --cc=riel@redhat.com \
    --cc=stable@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.