[PATCH 02/14] mm: fix major/minor fault accounting on retried fault

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
  2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
@ 2009-04-07 11:50 ` Wu Fengguang
  0 siblings, 0 replies; 7+ messages in thread
From: Wu Fengguang @ 2009-04-07 11:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Benjamin Herrenschmidt, Ying Han, Wu Fengguang, David Rientjes,
	Hugh Dickins, Ingo Molnar, Lee Schermerhorn, Mike Waychison,
	Nick Piggin, Peter Zijlstra, Rohit Seth, Edwin, H. Peter Anvin,
	LKML, linux-mm, linux-fsdevel

[-- Attachment #1: filemap-major-fault-retry.patch --]
[-- Type: text/plain, Size: 1558 bytes --]

VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..

Cc: Ying Han <yinghan@google.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
---
 arch/x86/mm/fault.c |    2 ++
 mm/memory.c         |   22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

--- mm.orig/arch/x86/mm/fault.c
+++ mm/arch/x86/mm/fault.c
@@ -1160,6 +1160,8 @@ good_area:
 	if (fault & VM_FAULT_RETRY) {
 		if (retry_flag) {
 			retry_flag = 0;
+			tsk->maj_flt++;
+			tsk->min_flt--;
 			goto retry;
 		}
 		BUG();
--- mm.orig/mm/memory.c
+++ mm/mm/memory.c
@@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
 	pud_t *pud;
 	pmd_t *pmd;
 	pte_t *pte;
+	int ret;
 
 	__set_current_state(TASK_RUNNING);
 
-	count_vm_event(PGFAULT);
-
-	if (unlikely(is_vm_hugetlb_page(vma)))
-		return hugetlb_fault(mm, vma, address, write_access);
+	if (unlikely(is_vm_hugetlb_page(vma))) {
+		ret = hugetlb_fault(mm, vma, address, write_access);
+		goto out;
+	}
 
+	ret = VM_FAULT_OOM;
 	pgd = pgd_offset(mm, address);
 	pud = pud_alloc(mm, pgd, address);
 	if (!pud)
-		return VM_FAULT_OOM;
+		goto out;
 	pmd = pmd_alloc(mm, pud, address);
 	if (!pmd)
-		return VM_FAULT_OOM;
+		goto out;
 	pte = pte_alloc_map(mm, pmd, address);
 	if (!pte)
-		return VM_FAULT_OOM;
+		goto out;
 
-	return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+	ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
+out:
+	if (!(ret & VM_FAULT_RETRY))
+		count_vm_event(PGFAULT);
+	return ret;
 }
 
 #ifndef __PAGETABLE_PUD_FOLDED

-- 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 02/14] mm: fix major/minor fault accounting on retried  fault
       [not found] ` <20090407072132.943283183@intel.com>
@ 2009-04-07 19:58   ` Ying Han
  2009-04-07 22:45     ` Wu Fengguang
  0 siblings, 1 reply; 7+ messages in thread
From: Ying Han @ 2009-04-07 19:58 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..
>
> Cc: Ying Han <yinghan@google.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  arch/x86/mm/fault.c |    2 ++
>  mm/memory.c         |   22 ++++++++++++++--------
>  2 files changed, 16 insertions(+), 8 deletions(-)
>
> --- mm.orig/arch/x86/mm/fault.c
> +++ mm/arch/x86/mm/fault.c
> @@ -1160,6 +1160,8 @@ good_area:
>        if (fault & VM_FAULT_RETRY) {
>                if (retry_flag) {
>                        retry_flag = 0;
> +                       tsk->maj_flt++;
> +                       tsk->min_flt--;
>                        goto retry;
>                }
>                BUG();
sorry, little bit confuse here. are we assuming the retry path will
return min_flt as always?


> --- mm.orig/mm/memory.c
> +++ mm/mm/memory.c
> @@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
>        pud_t *pud;
>        pmd_t *pmd;
>        pte_t *pte;
> +       int ret;
>
>        __set_current_state(TASK_RUNNING);
>
> -       count_vm_event(PGFAULT);
> -
> -       if (unlikely(is_vm_hugetlb_page(vma)))
> -               return hugetlb_fault(mm, vma, address, write_access);
> +       if (unlikely(is_vm_hugetlb_page(vma))) {
> +               ret = hugetlb_fault(mm, vma, address, write_access);
> +               goto out;
> +       }
>
> +       ret = VM_FAULT_OOM;
>        pgd = pgd_offset(mm, address);
>        pud = pud_alloc(mm, pgd, address);
>        if (!pud)
> -               return VM_FAULT_OOM;
> +               goto out;
>        pmd = pmd_alloc(mm, pud, address);
>        if (!pmd)
> -               return VM_FAULT_OOM;
> +               goto out;
>        pte = pte_alloc_map(mm, pmd, address);
>        if (!pte)
> -               return VM_FAULT_OOM;
> +               goto out;
>
> -       return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> +       ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> +out:
> +       if (!(ret & VM_FAULT_RETRY))
> +               count_vm_event(PGFAULT);
> +       return ret;
>  }
>
>  #ifndef __PAGETABLE_PUD_FOLDED
>
> --
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
       [not found] ` <20090407072133.053995305@intel.com>
@ 2009-04-07 20:03   ` Ying Han
  2009-04-07 23:27     ` Wu Fengguang
  0 siblings, 1 reply; 7+ messages in thread
From: Ying Han @ 2009-04-07 20:03 UTC (permalink / raw)
  To: Wu Fengguang; +Cc: Andrew Morton, LKML, linux-fsdevel, linux-mm

On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> Cc: Ying Han <yinghan@google.com>
> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> ---
>  mm/memory.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> --- mm.orig/mm/memory.c
> +++ mm/mm/memory.c
> @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
>  {
>        pgoff_t pgoff = (((address & PAGE_MASK)
>                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> -       int write = write_access & ~FAULT_FLAG_RETRY;
> -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
>
> -       flags |= (write_access & FAULT_FLAG_RETRY);
>        pte_unmap(page_table);
>        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
>  }
So, we got rid of FAULT_FLAG_RETRY flag?
> --
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 02/14] mm: fix major/minor fault accounting on retried fault
  2009-04-07 19:58   ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Ying Han
@ 2009-04-07 22:45     ` Wu Fengguang
  0 siblings, 0 replies; 7+ messages in thread
From: Wu Fengguang @ 2009-04-07 22:45 UTC (permalink / raw)
  To: Ying Han
  Cc: Andrew Morton, LKML, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Wed, Apr 08, 2009 at 03:58:16AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > VM_FAULT_RETRY does make major/minor faults accounting a bit twisted..
> >
> > Cc: Ying Han <yinghan@google.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  arch/x86/mm/fault.c |    2 ++
> >  mm/memory.c         |   22 ++++++++++++++--------
> >  2 files changed, 16 insertions(+), 8 deletions(-)
> >
> > --- mm.orig/arch/x86/mm/fault.c
> > +++ mm/arch/x86/mm/fault.c
> > @@ -1160,6 +1160,8 @@ good_area:
> >        if (fault & VM_FAULT_RETRY) {
> >                if (retry_flag) {
> >                        retry_flag = 0;
> > +                       tsk->maj_flt++;
> > +                       tsk->min_flt--;
> >                        goto retry;
> >                }
> >                BUG();
> sorry, little bit confuse here. are we assuming the retry path will
> return min_flt as always?

Sure - except for some really exceptional ftruncate cases.
The page was there ready, and we'll retry immediately.

maj_flt/min_flt are not _exact_ numbers by their nature, so 99.9%
accuracy shall be fine.

Thanks,
Fengguang

> > --- mm.orig/mm/memory.c
> > +++ mm/mm/memory.c
> > @@ -2882,26 +2882,32 @@ int handle_mm_fault(struct mm_struct *mm
> >        pud_t *pud;
> >        pmd_t *pmd;
> >        pte_t *pte;
> > +       int ret;
> >
> >        __set_current_state(TASK_RUNNING);
> >
> > -       count_vm_event(PGFAULT);
> > -
> > -       if (unlikely(is_vm_hugetlb_page(vma)))
> > -               return hugetlb_fault(mm, vma, address, write_access);
> > +       if (unlikely(is_vm_hugetlb_page(vma))) {
> > +               ret = hugetlb_fault(mm, vma, address, write_access);
> > +               goto out;
> > +       }
> >
> > +       ret = VM_FAULT_OOM;
> >        pgd = pgd_offset(mm, address);
> >        pud = pud_alloc(mm, pgd, address);
> >        if (!pud)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >        pmd = pmd_alloc(mm, pud, address);
> >        if (!pmd)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >        pte = pte_alloc_map(mm, pmd, address);
> >        if (!pte)
> > -               return VM_FAULT_OOM;
> > +               goto out;
> >
> > -       return handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> > +       ret = handle_pte_fault(mm, vma, address, pte, pmd, write_access);
> > +out:
> > +       if (!(ret & VM_FAULT_RETRY))
> > +               count_vm_event(PGFAULT);
> > +       return ret;
> >  }
> >
> >  #ifndef __PAGETABLE_PUD_FOLDED
> >
> > --
> >
> >

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07 20:03   ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Ying Han
@ 2009-04-07 23:27     ` Wu Fengguang
  2009-04-08  1:17       ` Ying Han
  0 siblings, 1 reply; 7+ messages in thread
From: Wu Fengguang @ 2009-04-07 23:27 UTC (permalink / raw)
  To: Ying Han
  Cc: Andrew Morton, LKML, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > Cc: Ying Han <yinghan@google.com>
> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> > ---
> >  mm/memory.c |    4 +---
> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >
> > --- mm.orig/mm/memory.c
> > +++ mm/mm/memory.c
> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
> >  {
> >        pgoff_t pgoff = (((address & PAGE_MASK)
> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> > -       int write = write_access & ~FAULT_FLAG_RETRY;
> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
> >
> > -       flags |= (write_access & FAULT_FLAG_RETRY);
> >        pte_unmap(page_table);
> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> >  }
> So, we got rid of FAULT_FLAG_RETRY flag?

Seems yes for the current mm tree, see the following two commits.

I did this patch on seeing 761fe7bc8193b7. But a closer look
indicates that the following two patches disable the filemap
VM_FAULT_RETRY part totally...

Anyway, if these two patches are to be reverted somehow(I guess yes),
this patch shall be _ignored_.

btw, do you have any test case and performance numbers for
FAULT_FLAG_RETRY? And possible overheads for (the worst case)
sparse random mmap reads on a sparse file?  I cannot find any
in your changelogs..

Thanks,
Fengguang


commit 761fe7bc8193b7858b7dc7eb4a026dc66e49fe1f
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Feb 9 21:08:50 2009 +0100

    A shot in the dark :(
    
    Cc: Mike Waychison <mikew@google.com>
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index bac7d7a..1c6736d 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -1139,8 +1139,6 @@ good_area:
                return;
        }
 
-       write |= retry_flag;
-
        /*
         * If for any reason at all we couldn't handle the fault,
         * make sure we exit gracefully rather than endlessly redo


commit f01ca7a68c37680a4eee22a8722a713c5102b3bb
Author: Andrew Morton <akpm@linux-foundation.org>
Date:   Mon Feb 9 21:08:50 2009 +0100

    Untangle the `write' boolean from the FAULT_FLAG_foo non-boolean field.
    
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Hugh Dickins <hugh@veritas.com>
    Cc: Ingo Molnar <mingo@elte.hu>
    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
    Cc: Mike Waychison <mikew@google.com>
    Cc: Nick Piggin <npiggin@suse.de>
    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
    Cc: Rohit Seth <rohitseth@google.com>
    Cc: T<F6>r<F6>k Edwin <edwintorok@gmail.com>
    Cc: Valdis.Kletnieks@vt.edu
    Cc: Ying Han <yinghan@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index b2cc88f..bac7d7a 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -978,7 +978,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
        struct mm_struct *mm;
        int write;
        int fault;
-       unsigned int retry_flag = FAULT_FLAG_RETRY;
+       int retry_flag = 1;
 
        tsk = current;
        mm = tsk->mm;
@@ -1140,6 +1140,7 @@ good_area:
        }
 
        write |= retry_flag;
+
        /*
         * If for any reason at all we couldn't handle the fault,
         * make sure we exit gracefully rather than endlessly redo
@@ -1159,8 +1160,8 @@ good_area:
         * be removed or changed after the retry.
         */
        if (fault & VM_FAULT_RETRY) {
-               if (write & FAULT_FLAG_RETRY) {
-                       retry_flag &= ~FAULT_FLAG_RETRY;
+               if (retry_flag) {
+                       retry_flag = 0;
                        goto retry;
                }


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-07 23:27     ` Wu Fengguang
@ 2009-04-08  1:17       ` Ying Han
  2009-04-08  2:29         ` Wu Fengguang
  0 siblings, 1 reply; 7+ messages in thread
From: Ying Han @ 2009-04-08  1:17 UTC (permalink / raw)
  To: Wu Fengguang
  Cc: Andrew Morton, LKML, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

On Tue, Apr 7, 2009 at 4:27 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
>> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
>> > Cc: Ying Han <yinghan@google.com>
>> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
>> > ---
>> >  mm/memory.c |    4 +---
>> >  1 file changed, 1 insertion(+), 3 deletions(-)
>> >
>> > --- mm.orig/mm/memory.c
>> > +++ mm/mm/memory.c
>> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
>> >  {
>> >        pgoff_t pgoff = (((address & PAGE_MASK)
>> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
>> > -       int write = write_access & ~FAULT_FLAG_RETRY;
>> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
>> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
>> >
>> > -       flags |= (write_access & FAULT_FLAG_RETRY);
>> >        pte_unmap(page_table);
>> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
>> >  }
>> So, we got rid of FAULT_FLAG_RETRY flag?
>
> Seems yes for the current mm tree, see the following two commits.
>
> I did this patch on seeing 761fe7bc8193b7. But a closer look
> indicates that the following two patches disable the filemap
> VM_FAULT_RETRY part totally...
>
> Anyway, if these two patches are to be reverted somehow(I guess yes),
> this patch shall be _ignored_.
>
> btw, do you have any test case and performance numbers for
> FAULT_FLAG_RETRY? And possible overheads for (the worst case)
> sparse random mmap reads on a sparse file?  I cannot find any
> in your changelogs..

here is the benchmark i posted on [V1] but somehow missed in [V2] describtion

Benchmarks:
case 1. one application has a high count of threads each faulting in
different pages of a hugefile. Benchmark indicate that this double data
structure walking in case of major fault results in << 1% performance hit.

case 2. add another thread in the above application which in a tight loop of
mmap()/munmap(). Here we measure loop count in the new thread while other
threads doing the same amount of work as case one. we got << 3% performance
hit on the Complete Time(benchmark value for case one) and 10% performance
improvement on the mmap()/munmap() counter.

This patch helps a lot in cases we have writer which is waitting behind all
readers, so it could execute much faster.

--Ying

>
> Thanks,
> Fengguang
>
>
> commit 761fe7bc8193b7858b7dc7eb4a026dc66e49fe1f
> Author: Andrew Morton <akpm@linux-foundation.org>
> Date:   Mon Feb 9 21:08:50 2009 +0100
>
>    A shot in the dark :(
>
>    Cc: Mike Waychison <mikew@google.com>
>    Cc: Ying Han <yinghan@google.com>
>    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index bac7d7a..1c6736d 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -1139,8 +1139,6 @@ good_area:
>                return;
>        }
>
> -       write |= retry_flag;
> -
>        /*
>         * If for any reason at all we couldn't handle the fault,
>         * make sure we exit gracefully rather than endlessly redo
>
>
> commit f01ca7a68c37680a4eee22a8722a713c5102b3bb
> Author: Andrew Morton <akpm@linux-foundation.org>
> Date:   Mon Feb 9 21:08:50 2009 +0100
>
>    Untangle the `write' boolean from the FAULT_FLAG_foo non-boolean field.
>
>    Cc: "H. Peter Anvin" <hpa@zytor.com>
>    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>    Cc: David Rientjes <rientjes@google.com>
>    Cc: Hugh Dickins <hugh@veritas.com>
>    Cc: Ingo Molnar <mingo@elte.hu>
>    Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
>    Cc: Mike Waychison <mikew@google.com>
>    Cc: Nick Piggin <npiggin@suse.de>
>    Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>    Cc: Rohit Seth <rohitseth@google.com>
>    Cc: T<F6>r<F6>k Edwin <edwintorok@gmail.com>
>    Cc: Valdis.Kletnieks@vt.edu
>    Cc: Ying Han <yinghan@google.com>
>    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
>
> diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
> index b2cc88f..bac7d7a 100644
> --- a/arch/x86/mm/fault.c
> +++ b/arch/x86/mm/fault.c
> @@ -978,7 +978,7 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code)
>        struct mm_struct *mm;
>        int write;
>        int fault;
> -       unsigned int retry_flag = FAULT_FLAG_RETRY;
> +       int retry_flag = 1;
>
>        tsk = current;
>        mm = tsk->mm;
> @@ -1140,6 +1140,7 @@ good_area:
>        }
>
>        write |= retry_flag;
> +
>        /*
>         * If for any reason at all we couldn't handle the fault,
>         * make sure we exit gracefully rather than endlessly redo
> @@ -1159,8 +1160,8 @@ good_area:
>         * be removed or changed after the retry.
>         */
>        if (fault & VM_FAULT_RETRY) {
> -               if (write & FAULT_FLAG_RETRY) {
> -                       retry_flag &= ~FAULT_FLAG_RETRY;
> +               if (retry_flag) {
> +                       retry_flag = 0;
>                        goto retry;
>                }
>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code
  2009-04-08  1:17       ` Ying Han
@ 2009-04-08  2:29         ` Wu Fengguang
  0 siblings, 0 replies; 7+ messages in thread
From: Wu Fengguang @ 2009-04-08  2:29 UTC (permalink / raw)
  To: Ying Han
  Cc: Andrew Morton, LKML, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org

Hi Ying Han,

On Wed, Apr 08, 2009 at 09:17:26AM +0800, Ying Han wrote:
> On Tue, Apr 7, 2009 at 4:27 PM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> > On Wed, Apr 08, 2009 at 04:03:36AM +0800, Ying Han wrote:
> >> On Tue, Apr 7, 2009 at 12:17 AM, Wu Fengguang <fengguang.wu@intel.com> wrote:
> >> > Cc: Ying Han <yinghan@google.com>
> >> > Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
> >> > ---
> >> >  mm/memory.c |    4 +---
> >> >  1 file changed, 1 insertion(+), 3 deletions(-)
> >> >
> >> > --- mm.orig/mm/memory.c
> >> > +++ mm/mm/memory.c
> >> > @@ -2766,10 +2766,8 @@ static int do_linear_fault(struct mm_str
> >> >  {
> >> >        pgoff_t pgoff = (((address & PAGE_MASK)
> >> >                        - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
> >> > -       int write = write_access & ~FAULT_FLAG_RETRY;
> >> > -       unsigned int flags = (write ? FAULT_FLAG_WRITE : 0);
> >> > +       unsigned int flags = (write_access ? FAULT_FLAG_WRITE : 0);
> >> >
> >> > -       flags |= (write_access & FAULT_FLAG_RETRY);
> >> >        pte_unmap(page_table);
> >> >        return __do_fault(mm, vma, address, pmd, pgoff, flags, orig_pte);
> >> >  }
> >> So, we got rid of FAULT_FLAG_RETRY flag?
> >
> > Seems yes for the current mm tree, see the following two commits.
> >
> > I did this patch on seeing 761fe7bc8193b7. But a closer look
> > indicates that the following two patches disable the filemap
> > VM_FAULT_RETRY part totally...
> >
> > Anyway, if these two patches are to be reverted somehow(I guess yes),
> > this patch shall be _ignored_.
> >
> > btw, do you have any test case and performance numbers for
> > FAULT_FLAG_RETRY? And possible overheads for (the worst case)
> > sparse random mmap reads on a sparse file?  I cannot find any
> > in your changelogs..
> 
> here is the benchmark i posted on [V1] but somehow missed in [V2] describtion
> 
> Benchmarks:
> case 1. one application has a high count of threads each faulting in
> different pages of a hugefile. Benchmark indicate that this double data
> structure walking in case of major fault results in << 1% performance hit.
> 
> case 2. add another thread in the above application which in a tight loop of
> mmap()/munmap(). Here we measure loop count in the new thread while other
> threads doing the same amount of work as case one. we got << 3% performance
> hit on the Complete Time(benchmark value for case one) and 10% performance
> improvement on the mmap()/munmap() counter.
> 
> This patch helps a lot in cases we have writer which is waitting behind all
> readers, so it could execute much faster.
> 

Just tested the sparse-random-read-on-sparse-file case, and found the
performance impact to be 0.4% (8.706s vs 8.744s). Kind of acceptable.

without FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.28s user 5.39s system 99% cpu 8.692 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.17s user 5.54s system 99% cpu 8.742 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.18s user 5.48s system 99% cpu 8.684 total

FAULT_FLAG_RETRY:
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.18s user 5.63s system 99% cpu 8.825 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.22s user 5.47s system 99% cpu 8.718 total
iotrace.rb --load stride-100 --mplay /mnt/btrfs-ram/sparse  3.13s user 5.55s system 99% cpu 8.690 total

In the above faked workload, the mmap read page offsets are loaded from
stride-100 and performed on /mnt/btrfs-ram/sparse, which are created by:

                seq 0 100 1000000 > stride-100
                dd if=/dev/zero of=/mnt/btrfs-ram/sparse bs=1M count=1 seek=1024000

Thanks,
Fengguang

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-04-08  2:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20090407071729.233579162@intel.com>
     [not found] ` <20090407072132.943283183@intel.com>
2009-04-07 19:58   ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Ying Han
2009-04-07 22:45     ` Wu Fengguang
     [not found] ` <20090407072133.053995305@intel.com>
2009-04-07 20:03   ` [PATCH 03/14] mm: remove FAULT_FLAG_RETRY dead code Ying Han
2009-04-07 23:27     ` Wu Fengguang
2009-04-08  1:17       ` Ying Han
2009-04-08  2:29         ` Wu Fengguang
2009-04-07 11:50 [PATCH 00/14] filemap and readahead fixes Wu Fengguang
2009-04-07 11:50 ` [PATCH 02/14] mm: fix major/minor fault accounting on retried fault Wu Fengguang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox