performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
@ 2014-02-18  8:01 Yuanhan Liu
  2014-03-07  8:22 ` Yuanhan Liu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yuanhan Liu @ 2014-02-18  8:01 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel

Hi,

Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
kswapd") caused a big performance regression(73%) for vm-scalability/
lru-file-readonce testcase on a system with 256G memory without swap.

That testcase simply looks like this:
     truncate -s 1T /tmp/vm-scalability.img
     mkfs.xfs -q /tmp/vm-scalability.img
     mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability

     SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
     for i in `seq 1 120`; do
         truncate $SPARESE_FILE-$i -s 36G
         timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
     done

     wait

Actually, it's not the newlly added code(obey proportional scanning)
in that commit caused the regression. But instead, it's the following
change:
+
+               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
+                       continue;
+


-               if (nr_reclaimed >= nr_to_reclaim &&
-                   sc->priority < DEF_PRIORITY)
+               if (global_reclaim(sc) && !current_is_kswapd())
                        break;

The difference is that we might reclaim more than requested before
in the first round reclaimming(sc->priority == DEF_PRIORITY).

So, for a testcase like lru-file-readonce, the dirty rate is fast, and
reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
up the dirty rate. And thus page allocation stalls, and performance drops:

   O for e82e0561
   * for parent commit

                                proc-vmstat.allocstall

     2e+06 ++---------------------------------------------------------------+
   1.8e+06 O+              O                O               O               |
           |                                                                |
   1.6e+06 ++                                                               |
   1.4e+06 ++                                                               |
           |                                                                |
   1.2e+06 ++                                                               |
     1e+06 ++                                                               |
    800000 ++                                                               |
           |                                                                |
    600000 ++                                                               |
    400000 ++                                                               |
           |                                                                |
    200000 *+..............*................*...............*...............*
         0 ++---------------------------------------------------------------+

                               vm-scalability.throughput

   2.2e+07 ++---------------------------------------------------------------+
           |                                                                |
     2e+07 *+..............*................*...............*...............*
   1.8e+07 ++                                                               |
           |                                                                |
   1.6e+07 ++                                                               |
           |                                                                |
   1.4e+07 ++                                                               |
           |                                                                |
   1.2e+07 ++                                                               |
     1e+07 ++                                                               |
           |                                                                |
     8e+06 ++              O                O               O               |
           O                                                                |
     6e+06 ++---------------------------------------------------------------+

I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.

---
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 26ad67f..37004a8 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1828,7 +1828,16 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct blk_plug plug;
-	bool scan_adjusted = false;
+	/*
+	 * On large memory systems, direct reclamming of SWAP_CLUSTER_MAX
+	 * each time may not catch up the dirty rate in some cases(say,
+	 * vm-scalability/lru-file-readonce), which may increase the
+	 * page allocation stall latency in the end.
+	 *
+	 * Here we try to reclaim more than requested for the first round
+	 * (sc->priority == DEF_PRIORITY) to reduce such latency.
+	 */
+	bool scan_adjusted = sc->priority == DEF_PRIORITY;
 
 	get_scan_count(lruvec, sc, nr);
 
-- 
1.7.7.6


	--yliu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-02-18  8:01 performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd") Yuanhan Liu
@ 2014-03-07  8:22 ` Yuanhan Liu
  2014-03-12 16:54 ` Mel Gorman
  2014-03-20 10:03 ` Bob Liu
  2 siblings, 0 replies; 10+ messages in thread
From: Yuanhan Liu @ 2014-03-07  8:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel

ping...

On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> Hi,
> 
> Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> kswapd") caused a big performance regression(73%) for vm-scalability/
> lru-file-readonce testcase on a system with 256G memory without swap.
> 
> That testcase simply looks like this:
>      truncate -s 1T /tmp/vm-scalability.img
>      mkfs.xfs -q /tmp/vm-scalability.img
>      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> 
>      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
>      for i in `seq 1 120`; do
>          truncate $SPARESE_FILE-$i -s 36G
>          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
>      done
> 
>      wait
> 
> Actually, it's not the newlly added code(obey proportional scanning)
> in that commit caused the regression. But instead, it's the following
> change:
> +
> +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> +                       continue;
> +
> 
> 
> -               if (nr_reclaimed >= nr_to_reclaim &&
> -                   sc->priority < DEF_PRIORITY)
> +               if (global_reclaim(sc) && !current_is_kswapd())
>                         break;
> 
> The difference is that we might reclaim more than requested before
> in the first round reclaimming(sc->priority == DEF_PRIORITY).
> 
> So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> up the dirty rate. And thus page allocation stalls, and performance drops:
> 
>    O for e82e0561
>    * for parent commit
> 
>                                 proc-vmstat.allocstall
> 
>      2e+06 ++---------------------------------------------------------------+
>    1.8e+06 O+              O                O               O               |
>            |                                                                |
>    1.6e+06 ++                                                               |
>    1.4e+06 ++                                                               |
>            |                                                                |
>    1.2e+06 ++                                                               |
>      1e+06 ++                                                               |
>     800000 ++                                                               |
>            |                                                                |
>     600000 ++                                                               |
>     400000 ++                                                               |
>            |                                                                |
>     200000 *+..............*................*...............*...............*
>          0 ++---------------------------------------------------------------+
> 
>                                vm-scalability.throughput
> 
>    2.2e+07 ++---------------------------------------------------------------+
>            |                                                                |
>      2e+07 *+..............*................*...............*...............*
>    1.8e+07 ++                                                               |
>            |                                                                |
>    1.6e+07 ++                                                               |
>            |                                                                |
>    1.4e+07 ++                                                               |
>            |                                                                |
>    1.2e+07 ++                                                               |
>      1e+07 ++                                                               |
>            |                                                                |
>      8e+06 ++              O                O               O               |
>            O                                                                |
>      6e+06 ++---------------------------------------------------------------+
> 
> I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> 
> ---
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 26ad67f..37004a8 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1828,7 +1828,16 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
>  	unsigned long nr_reclaimed = 0;
>  	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
>  	struct blk_plug plug;
> -	bool scan_adjusted = false;
> +	/*
> +	 * On large memory systems, direct reclamming of SWAP_CLUSTER_MAX
> +	 * each time may not catch up the dirty rate in some cases(say,
> +	 * vm-scalability/lru-file-readonce), which may increase the
> +	 * page allocation stall latency in the end.
> +	 *
> +	 * Here we try to reclaim more than requested for the first round
> +	 * (sc->priority == DEF_PRIORITY) to reduce such latency.
> +	 */
> +	bool scan_adjusted = sc->priority == DEF_PRIORITY;
>  
>  	get_scan_count(lruvec, sc, nr);
>  
> -- 
> 1.7.7.6
> 
> 
> 	--yliu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-02-18  8:01 performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd") Yuanhan Liu
  2014-03-07  8:22 ` Yuanhan Liu
@ 2014-03-12 16:54 ` Mel Gorman
  2014-03-13 12:44   ` Hugh Dickins
  2014-03-14  4:54   ` Yuanhan Liu
  2014-03-20 10:03 ` Bob Liu
  2 siblings, 2 replies; 10+ messages in thread
From: Mel Gorman @ 2014-03-12 16:54 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: linux-mm, linux-kernel

On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> Hi,
> 
> Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> kswapd") caused a big performance regression(73%) for vm-scalability/
> lru-file-readonce testcase on a system with 256G memory without swap.
> 
> That testcase simply looks like this:
>      truncate -s 1T /tmp/vm-scalability.img
>      mkfs.xfs -q /tmp/vm-scalability.img
>      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> 
>      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
>      for i in `seq 1 120`; do
>          truncate $SPARESE_FILE-$i -s 36G
>          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
>      done
> 
>      wait
> 

The filename implies that it's a sparse file with no IO but does not say
what the truncate function/program/whatever actually does. If it's really a
sparse file then the dd process should be reading zeros and writing them to
NULL without IO. Where are pages being dirtied? Does the truncate command
really create a sparse file or is it something else?

> Actually, it's not the newlly added code(obey proportional scanning)
> in that commit caused the regression. But instead, it's the following
> change:
> +
> +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> +                       continue;
> +
> 
> 
> -               if (nr_reclaimed >= nr_to_reclaim &&
> -                   sc->priority < DEF_PRIORITY)
> +               if (global_reclaim(sc) && !current_is_kswapd())
>                         break;
> 
> The difference is that we might reclaim more than requested before
> in the first round reclaimming(sc->priority == DEF_PRIORITY).
> 
> So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> up the dirty rate. And thus page allocation stalls, and performance drops:
> 
>    O for e82e0561
>    * for parent commit
> 
>                                 proc-vmstat.allocstall
> 
>      2e+06 ++---------------------------------------------------------------+
>    1.8e+06 O+              O                O               O               |
>            |                                                                |
>    1.6e+06 ++                                                               |
>    1.4e+06 ++                                                               |
>            |                                                                |
>    1.2e+06 ++                                                               |
>      1e+06 ++                                                               |
>     800000 ++                                                               |
>            |                                                                |
>     600000 ++                                                               |
>     400000 ++                                                               |
>            |                                                                |
>     200000 *+..............*................*...............*...............*
>          0 ++---------------------------------------------------------------+
> 
>                                vm-scalability.throughput
> 
>    2.2e+07 ++---------------------------------------------------------------+
>            |                                                                |
>      2e+07 *+..............*................*...............*...............*
>    1.8e+07 ++                                                               |
>            |                                                                |
>    1.6e+07 ++                                                               |
>            |                                                                |
>    1.4e+07 ++                                                               |
>            |                                                                |
>    1.2e+07 ++                                                               |
>      1e+07 ++                                                               |
>            |                                                                |
>      8e+06 ++              O                O               O               |
>            O                                                                |
>      6e+06 ++---------------------------------------------------------------+
> 
> I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> 

The impact of the patch is that a direct reclaimer will now scan and
reclaim more pages than requested so the unlucky reclaiming process will
stall for longer than it should while others make forward progress.

That would explain the difference in allocstall figure as each stall is
now doing more work than it did previously. The throughput figure is
harder to explain. What is it measuring?

Any idea why kswapd is failing to keep up?

I'm not saying the patch is wrong but there appears to be more going on
that is explained in the changelog. Is the full source of the benchmark
suite available? If so, can you point me to it and the exact commands
you use to run the testcase please?

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-12 16:54 ` Mel Gorman
@ 2014-03-13 12:44   ` Hugh Dickins
  2014-03-14 14:21     ` Mel Gorman
  2014-03-14  4:54   ` Yuanhan Liu
  1 sibling, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-03-13 12:44 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Yuanhan Liu, Suleiman Souhlal, linux-mm, linux-kernel

On Wed, 12 Mar 2014, Mel Gorman wrote:
> On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> > Hi,
> > 
> > Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> > kswapd") caused a big performance regression(73%) for vm-scalability/
> > lru-file-readonce testcase on a system with 256G memory without swap.
> > 
> > That testcase simply looks like this:
> >      truncate -s 1T /tmp/vm-scalability.img
> >      mkfs.xfs -q /tmp/vm-scalability.img
> >      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> > 
> >      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
> >      for i in `seq 1 120`; do
> >          truncate $SPARESE_FILE-$i -s 36G
> >          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
> >      done
> > 
> >      wait
> > 
> 
> The filename implies that it's a sparse file with no IO but does not say
> what the truncate function/program/whatever actually does. If it's really a
> sparse file then the dd process should be reading zeros and writing them to
> NULL without IO. Where are pages being dirtied? Does the truncate command
> really create a sparse file or is it something else?
> 
> > Actually, it's not the newlly added code(obey proportional scanning)
> > in that commit caused the regression. But instead, it's the following
> > change:
> > +
> > +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> > +                       continue;
> > +
> > 
> > 
> > -               if (nr_reclaimed >= nr_to_reclaim &&
> > -                   sc->priority < DEF_PRIORITY)
> > +               if (global_reclaim(sc) && !current_is_kswapd())
> >                         break;
> > 
> > The difference is that we might reclaim more than requested before
> > in the first round reclaimming(sc->priority == DEF_PRIORITY).
> > 
> > So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> > reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> > up the dirty rate. And thus page allocation stalls, and performance drops:
...
> > I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> > I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> > 
> 
> The impact of the patch is that a direct reclaimer will now scan and
> reclaim more pages than requested so the unlucky reclaiming process will
> stall for longer than it should while others make forward progress.
> 
> That would explain the difference in allocstall figure as each stall is
> now doing more work than it did previously. The throughput figure is
> harder to explain. What is it measuring?
> 
> Any idea why kswapd is failing to keep up?
> 
> I'm not saying the patch is wrong but there appears to be more going on
> that is explained in the changelog. Is the full source of the benchmark
> suite available? If so, can you point me to it and the exact commands
> you use to run the testcase please?

I missed Yuanhan's mail, but seeing your reply reminds me of another
issue with that proportionality patch - or perhaps more thought would
show them to be two sides of the same issue, with just one fix required.
Let me throw our patch into the cauldron.

[PATCH] mm: revisit shrink_lruvec's attempt at proportionality

We have a memcg reclaim test which exerts a certain amount of pressure,
and expects to see a certain range of page reclaim in response.  It's a
very wide range allowed, but the test repeatably failed on v3.11 onwards,
because reclaim goes wild and frees up almost everything.

This wild behaviour bisects to Mel's "scan_adjusted" commit e82e0561dae9
"mm: vmscan: obey proportional scanning requirements for kswapd".  That
attempts to achieve proportionality between anon and file lrus: to the
extent that once one of those is empty, it then tries to empty the other.
Stop that.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

We've been running happily with this for months; but all that time it's
been on my TODO list with a "needs more thought" tag before we could
upstream it, and I never got around to that.  We also have a somewhat
similar, but older and quite independent, fix to get_scan_count() from
Suleiman, which I'd meant to send along at the same time: I'll dig that
one out tomorrow or the day after.

 mm/vmscan.c |   14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

--- 3.14-rc6/mm/vmscan.c	2014-02-02 18:49:07.949302116 -0800
+++ linux/mm/vmscan.c	2014-03-13 04:38:04.664030175 -0700
@@ -2019,7 +2019,6 @@ static void shrink_lruvec(struct lruvec
 	unsigned long nr_reclaimed = 0;
 	unsigned long nr_to_reclaim = sc->nr_to_reclaim;
 	struct blk_plug plug;
-	bool scan_adjusted = false;
 
 	get_scan_count(lruvec, sc, nr);
 
@@ -2042,7 +2041,7 @@ static void shrink_lruvec(struct lruvec
 			}
 		}
 
-		if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
+		if (nr_reclaimed < nr_to_reclaim)
 			continue;
 
 		/*
@@ -2064,6 +2063,15 @@ static void shrink_lruvec(struct lruvec
 		nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
 		nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
 
+		/*
+		 * It's just vindictive to attack the larger once the smaller
+		 * has gone to zero.  And given the way we stop scanning the
+		 * smaller below, this makes sure that we only make one nudge
+		 * towards proportionality once we've got nr_to_reclaim.
+		 */
+		if (!nr_file || !nr_anon)
+			break;
+
 		if (nr_file > nr_anon) {
 			unsigned long scan_target = targets[LRU_INACTIVE_ANON] +
 						targets[LRU_ACTIVE_ANON] + 1;
@@ -2093,8 +2101,6 @@ static void shrink_lruvec(struct lruvec
 		nr_scanned = targets[lru] - nr[lru];
 		nr[lru] = targets[lru] * (100 - percentage) / 100;
 		nr[lru] -= min(nr[lru], nr_scanned);
-
-		scan_adjusted = true;
 	}
 	blk_finish_plug(&plug);
 	sc->nr_reclaimed += nr_reclaimed;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-12 16:54 ` Mel Gorman
  2014-03-13 12:44   ` Hugh Dickins
@ 2014-03-14  4:54   ` Yuanhan Liu
  1 sibling, 0 replies; 10+ messages in thread
From: Yuanhan Liu @ 2014-03-14  4:54 UTC (permalink / raw)
  To: Mel Gorman; +Cc: linux-mm, linux-kernel

On Wed, Mar 12, 2014 at 04:54:47PM +0000, Mel Gorman wrote:
> On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> > Hi,
> > 
> > Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> > kswapd") caused a big performance regression(73%) for vm-scalability/
> > lru-file-readonce testcase on a system with 256G memory without swap.
> > 
> > That testcase simply looks like this:
> >      truncate -s 1T /tmp/vm-scalability.img
> >      mkfs.xfs -q /tmp/vm-scalability.img
> >      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> > 
> >      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
> >      for i in `seq 1 120`; do
> >          truncate $SPARESE_FILE-$i -s 36G
> >          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
> >      done
> > 
> >      wait
> > 
> 
> The filename implies that it's a sparse file with no IO but does not say
> what the truncate function/program/whatever actually does.

It's actually the /usr/bin/truncate file from coreutils.

> If it's really a
> sparse file then the dd process should be reading zeros and writing them to
> NULL without IO. Where are pages being dirtied?

Sorry, my bad. I was wrong and I meant to "the speed of getting new
pages", but not "the speed of dirtying pages".

> Does the truncate command
> really create a sparse file or is it something else?
> 
> > Actually, it's not the newlly added code(obey proportional scanning)
> > in that commit caused the regression. But instead, it's the following
> > change:
> > +
> > +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> > +                       continue;
> > +
> > 
> > 
> > -               if (nr_reclaimed >= nr_to_reclaim &&
> > -                   sc->priority < DEF_PRIORITY)
> > +               if (global_reclaim(sc) && !current_is_kswapd())
> >                         break;
> > 
> > The difference is that we might reclaim more than requested before
> > in the first round reclaimming(sc->priority == DEF_PRIORITY).
> > 
> > So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> > reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> > up the dirty rate. And thus page allocation stalls, and performance drops:
> > 
> >    O for e82e0561
> >    * for parent commit
> > 
> >                                 proc-vmstat.allocstall
> > 
> >      2e+06 ++---------------------------------------------------------------+
> >    1.8e+06 O+              O                O               O               |
> >            |                                                                |
> >    1.6e+06 ++                                                               |
> >    1.4e+06 ++                                                               |
> >            |                                                                |
> >    1.2e+06 ++                                                               |
> >      1e+06 ++                                                               |
> >     800000 ++                                                               |
> >            |                                                                |
> >     600000 ++                                                               |
> >     400000 ++                                                               |
> >            |                                                                |
> >     200000 *+..............*................*...............*...............*
> >          0 ++---------------------------------------------------------------+
> > 
> >                                vm-scalability.throughput
> > 
> >    2.2e+07 ++---------------------------------------------------------------+
> >            |                                                                |
> >      2e+07 *+..............*................*...............*...............*
> >    1.8e+07 ++                                                               |
> >            |                                                                |
> >    1.6e+07 ++                                                               |
> >            |                                                                |
> >    1.4e+07 ++                                                               |
> >            |                                                                |
> >    1.2e+07 ++                                                               |
> >      1e+07 ++                                                               |
> >            |                                                                |
> >      8e+06 ++              O                O               O               |
> >            O                                                                |
> >      6e+06 ++---------------------------------------------------------------+
> > 
> > I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> > I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> > 
> 
> The impact of the patch is that a direct reclaimer will now scan and
> reclaim more pages than requested so the unlucky reclaiming process will
> stall for longer than it should while others make forward progress.
> 
> That would explain the difference in allocstall figure as each stall is
> now doing more work than it did previously. The throughput figure is
> harder to explain. What is it measuring?

It's just a sum of all dd's output like following:

	18267619328 bytes (18 GB) copied, 299.999 s, 60.9 MB/s
	4532509+0 records in
	4532508+0 records out
	18565152768 bytes (19 GB) copied, 299.999 s, 61.9 MB/s
	4487453+0 records in
	...

And as you noticed, the average dd's throughput is about 60 MB/s,
however, it's about 170 MB/s without this bad commit.

> 
> Any idea why kswapd is failing to keep up?

I don't know. But, isn't it normal for case like this?

> 
> I'm not saying the patch is wrong but there appears to be more going on
> that is explained in the changelog. Is the full source of the benchmark
> suite available? If so, can you point me to it and the exact commands
> you use to run the testcase please?

https://github.com/aristeu/vm-scalability/blob/master/case-lru-file-readonce

Where nr_cpu is 120 as I showed in early email.

Thanks.

	--yliu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-13 12:44   ` Hugh Dickins
@ 2014-03-14 14:21     ` Mel Gorman
  2014-03-16  3:56       ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Mel Gorman @ 2014-03-14 14:21 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Yuanhan Liu, Suleiman Souhlal, linux-mm, linux-kernel

On Thu, Mar 13, 2014 at 05:44:57AM -0700, Hugh Dickins wrote:
> On Wed, 12 Mar 2014, Mel Gorman wrote:
> > On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> > > Hi,
> > > 
> > > Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> > > kswapd") caused a big performance regression(73%) for vm-scalability/
> > > lru-file-readonce testcase on a system with 256G memory without swap.
> > > 
> > > That testcase simply looks like this:
> > >      truncate -s 1T /tmp/vm-scalability.img
> > >      mkfs.xfs -q /tmp/vm-scalability.img
> > >      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> > > 
> > >      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
> > >      for i in `seq 1 120`; do
> > >          truncate $SPARESE_FILE-$i -s 36G
> > >          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
> > >      done
> > > 
> > >      wait
> > > 
> > 
> > The filename implies that it's a sparse file with no IO but does not say
> > what the truncate function/program/whatever actually does. If it's really a
> > sparse file then the dd process should be reading zeros and writing them to
> > NULL without IO. Where are pages being dirtied? Does the truncate command
> > really create a sparse file or is it something else?
> > 
> > > Actually, it's not the newlly added code(obey proportional scanning)
> > > in that commit caused the regression. But instead, it's the following
> > > change:
> > > +
> > > +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> > > +                       continue;
> > > +
> > > 
> > > 
> > > -               if (nr_reclaimed >= nr_to_reclaim &&
> > > -                   sc->priority < DEF_PRIORITY)
> > > +               if (global_reclaim(sc) && !current_is_kswapd())
> > >                         break;
> > > 
> > > The difference is that we might reclaim more than requested before
> > > in the first round reclaimming(sc->priority == DEF_PRIORITY).
> > > 
> > > So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> > > reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> > > up the dirty rate. And thus page allocation stalls, and performance drops:
> ...
> > > I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> > > I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> > > 
> > 
> > The impact of the patch is that a direct reclaimer will now scan and
> > reclaim more pages than requested so the unlucky reclaiming process will
> > stall for longer than it should while others make forward progress.
> > 
> > That would explain the difference in allocstall figure as each stall is
> > now doing more work than it did previously. The throughput figure is
> > harder to explain. What is it measuring?
> > 
> > Any idea why kswapd is failing to keep up?
> > 
> > I'm not saying the patch is wrong but there appears to be more going on
> > that is explained in the changelog. Is the full source of the benchmark
> > suite available? If so, can you point me to it and the exact commands
> > you use to run the testcase please?
> 
> I missed Yuanhan's mail, but seeing your reply reminds me of another
> issue with that proportionality patch - or perhaps more thought would
> show them to be two sides of the same issue, with just one fix required.
> Let me throw our patch into the cauldron.
> 
> [PATCH] mm: revisit shrink_lruvec's attempt at proportionality
> 
> We have a memcg reclaim test which exerts a certain amount of pressure,
> and expects to see a certain range of page reclaim in response.  It's a
> very wide range allowed, but the test repeatably failed on v3.11 onwards,
> because reclaim goes wild and frees up almost everything.
> 
> This wild behaviour bisects to Mel's "scan_adjusted" commit e82e0561dae9
> "mm: vmscan: obey proportional scanning requirements for kswapd".  That
> attempts to achieve proportionality between anon and file lrus: to the
> extent that once one of those is empty, it then tries to empty the other.
> Stop that.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> 
> We've been running happily with this for months; but all that time it's
> been on my TODO list with a "needs more thought" tag before we could
> upstream it, and I never got around to that.  We also have a somewhat
> similar, but older and quite independent, fix to get_scan_count() from
> Suleiman, which I'd meant to send along at the same time: I'll dig that
> one out tomorrow or the day after.
> 

I ran a battery of page reclaim related tests against it on top of
3.14-rc6. Workloads showed small improvements in their absolute performance
but actual IO behaviour looked much better in some tests.  This is the
iostats summary for the test that showed the biggest different -- dd of
a large file on ext3.

 	                3.14.0-rc6	3.14.0-rc6
	                   vanilla	proportional-v1r1
Mean	sda-avgqz 	1045.64		224.18	
Mean	sda-await 	2120.12		506.77	
Mean	sda-r_await	18.61		19.78	
Mean	sda-w_await	11089.60	2126.35	
Max 	sda-avgqz 	2294.39		787.13	
Max 	sda-await 	7074.79		2371.67	
Max 	sda-r_await	503.00		414.00	
Max 	sda-w_await	35721.93	7249.84	

Not all workloads benefitted. The same workload on ext4 showed no useful
difference. btrfs looks like

 	             3.14.0-rc6	3.14.0-rc6
	               vanilla	proportional-v1r1
Mean	sda-avgqz 	762.69		650.39	
Mean	sda-await 	2438.46		2495.15	
Mean	sda-r_await	44.18		47.20	
Mean	sda-w_await	6109.19		5139.86	
Max 	sda-avgqz 	2203.50		1870.78	
Max 	sda-await 	7098.26		6847.21	
Max 	sda-r_await	63.02		156.00	
Max 	sda-w_await	19921.70	11085.13	

Better but not as dramatically so. I didn't analyse why. A workload that
had a large anonymous mapping with large amounts of IO in the background
did not show any regressions so based on that and the fact the patch looks
ok, here goes nothing;

Acked-by: Mel Gorman <mgorman@suse.de>

You say it's already been tested for months but it would be nice if the
workload that generated this thread was also tested.  Regrettably I'm not
going to have the chance to setup and do it myself for some time.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-14 14:21     ` Mel Gorman
@ 2014-03-16  3:56       ` Hugh Dickins
  2014-03-18  6:38         ` Yuanhan Liu
  0 siblings, 1 reply; 10+ messages in thread
From: Hugh Dickins @ 2014-03-16  3:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Johannes Weiner, Hugh Dickins, Yuanhan Liu,
	Suleiman Souhlal, linux-mm, linux-kernel

On Fri, 14 Mar 2014, Mel Gorman wrote:
> On Thu, Mar 13, 2014 at 05:44:57AM -0700, Hugh Dickins wrote:
> > On Wed, 12 Mar 2014, Mel Gorman wrote:
> > > On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
> > > > Hi,
> > > > 
> > > > Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> > > > kswapd") caused a big performance regression(73%) for vm-scalability/
> > > > lru-file-readonce testcase on a system with 256G memory without swap.
> > > > 
> > > > That testcase simply looks like this:
> > > >      truncate -s 1T /tmp/vm-scalability.img
> > > >      mkfs.xfs -q /tmp/vm-scalability.img
> > > >      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> > > > 
> > > >      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
> > > >      for i in `seq 1 120`; do
> > > >          truncate $SPARESE_FILE-$i -s 36G
> > > >          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
> > > >      done
> > > > 
> > > >      wait
> > > > 
> > > 
> > > The filename implies that it's a sparse file with no IO but does not say
> > > what the truncate function/program/whatever actually does. If it's really a
> > > sparse file then the dd process should be reading zeros and writing them to
> > > NULL without IO. Where are pages being dirtied? Does the truncate command
> > > really create a sparse file or is it something else?
> > > 
> > > > Actually, it's not the newlly added code(obey proportional scanning)
> > > > in that commit caused the regression. But instead, it's the following
> > > > change:
> > > > +
> > > > +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> > > > +                       continue;
> > > > +
> > > > 
> > > > 
> > > > -               if (nr_reclaimed >= nr_to_reclaim &&
> > > > -                   sc->priority < DEF_PRIORITY)
> > > > +               if (global_reclaim(sc) && !current_is_kswapd())
> > > >                         break;
> > > > 
> > > > The difference is that we might reclaim more than requested before
> > > > in the first round reclaimming(sc->priority == DEF_PRIORITY).
> > > > 
> > > > So, for a testcase like lru-file-readonce, the dirty rate is fast, and
> > > > reclaimming SWAP_CLUSTER_MAX(32 pages) each time is not enough for catching
> > > > up the dirty rate. And thus page allocation stalls, and performance drops:
> > ...
> > > > I made a patch which simply keeps reclaimming more if sc->priority == DEF_PRIORITY.
> > > > I'm not sure it's the right way to go or not. Anyway, I pasted it here for comments.
> > > > 
> > > 
> > > The impact of the patch is that a direct reclaimer will now scan and
> > > reclaim more pages than requested so the unlucky reclaiming process will
> > > stall for longer than it should while others make forward progress.
> > > 
> > > That would explain the difference in allocstall figure as each stall is
> > > now doing more work than it did previously. The throughput figure is
> > > harder to explain. What is it measuring?
> > > 
> > > Any idea why kswapd is failing to keep up?
> > > 
> > > I'm not saying the patch is wrong but there appears to be more going on
> > > that is explained in the changelog. Is the full source of the benchmark
> > > suite available? If so, can you point me to it and the exact commands
> > > you use to run the testcase please?
> > 
> > I missed Yuanhan's mail, but seeing your reply reminds me of another
> > issue with that proportionality patch - or perhaps more thought would
> > show them to be two sides of the same issue, with just one fix required.
> > Let me throw our patch into the cauldron.
> > 
> > [PATCH] mm: revisit shrink_lruvec's attempt at proportionality
> > 
> > We have a memcg reclaim test which exerts a certain amount of pressure,
> > and expects to see a certain range of page reclaim in response.  It's a
> > very wide range allowed, but the test repeatably failed on v3.11 onwards,
> > because reclaim goes wild and frees up almost everything.
> > 
> > This wild behaviour bisects to Mel's "scan_adjusted" commit e82e0561dae9
> > "mm: vmscan: obey proportional scanning requirements for kswapd".  That
> > attempts to achieve proportionality between anon and file lrus: to the
> > extent that once one of those is empty, it then tries to empty the other.
> > Stop that.
> > 
> > Signed-off-by: Hugh Dickins <hughd@google.com>
> > ---
> > 
> > We've been running happily with this for months; but all that time it's
> > been on my TODO list with a "needs more thought" tag before we could
> > upstream it, and I never got around to that.  We also have a somewhat
> > similar, but older and quite independent, fix to get_scan_count() from
> > Suleiman, which I'd meant to send along at the same time: I'll dig that
> > one out tomorrow or the day after.

I've sent that one out now in a new thread
https://lkml.org/lkml/2014/3/15/168
and also let's tie these together with Hannes's
https://lkml.org/lkml/2014/3/14/277

> > 
> 
> I ran a battery of page reclaim related tests against it on top of
> 3.14-rc6. Workloads showed small improvements in their absolute performance
> but actual IO behaviour looked much better in some tests.  This is the
> iostats summary for the test that showed the biggest different -- dd of
> a large file on ext3.
> 
>  	                3.14.0-rc6	3.14.0-rc6
> 	                   vanilla	proportional-v1r1
> Mean	sda-avgqz 	1045.64		224.18	
> Mean	sda-await 	2120.12		506.77	
> Mean	sda-r_await	18.61		19.78	
> Mean	sda-w_await	11089.60	2126.35	
> Max 	sda-avgqz 	2294.39		787.13	
> Max 	sda-await 	7074.79		2371.67	
> Max 	sda-r_await	503.00		414.00	
> Max 	sda-w_await	35721.93	7249.84	
> 
> Not all workloads benefitted. The same workload on ext4 showed no useful
> difference. btrfs looks like
> 
>  	             3.14.0-rc6	3.14.0-rc6
> 	               vanilla	proportional-v1r1
> Mean	sda-avgqz 	762.69		650.39	
> Mean	sda-await 	2438.46		2495.15	
> Mean	sda-r_await	44.18		47.20	
> Mean	sda-w_await	6109.19		5139.86	
> Max 	sda-avgqz 	2203.50		1870.78	
> Max 	sda-await 	7098.26		6847.21	
> Max 	sda-r_await	63.02		156.00	
> Max 	sda-w_await	19921.70	11085.13	
> 
> Better but not as dramatically so. I didn't analyse why. A workload that
> had a large anonymous mapping with large amounts of IO in the background
> did not show any regressions so based on that and the fact the patch looks
> ok, here goes nothing;
> 
> Acked-by: Mel Gorman <mgorman@suse.de>

Big thank you, Mel, for doing so much work on it, and so very quickly.
I get quite lost in the numbers myself: I'm much more convinced of it
by your numbers and ack.

> 
> You say it's already been tested for months but it would be nice if the
> workload that generated this thread was also tested.

Yes indeed: Yuanhan, do you have time to try this patch for your
testcase?  I'm hoping it will prove at least as effective as your
own suggested patch, but please let us know what you find - thanks.

> Regrettably I'm not
> going to have the chance to setup and do it myself for some time.

Understood, I'll continue to Cc you, but not expecting more.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-16  3:56       ` Hugh Dickins
@ 2014-03-18  6:38         ` Yuanhan Liu
  2014-03-19  3:20           ` Hugh Dickins
  0 siblings, 1 reply; 10+ messages in thread
From: Yuanhan Liu @ 2014-03-18  6:38 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Mel Gorman, Andrew Morton, Johannes Weiner, Suleiman Souhlal,
	linux-mm, linux-kernel, Yuanhan Liu

On Sat, Mar 15, 2014 at 08:56:10PM -0700, Hugh Dickins wrote:
> On Fri, 14 Mar 2014, Mel Gorman wrote:
> > On Thu, Mar 13, 2014 at 05:44:57AM -0700, Hugh Dickins wrote:
> > > On Wed, 12 Mar 2014, Mel Gorman wrote:
> > > > On Tue, Feb 18, 2014 at 04:01:22PM +0800, Yuanhan Liu wrote:
... snip ...

> > > I missed Yuanhan's mail, but seeing your reply reminds me of another
> > > issue with that proportionality patch - or perhaps more thought would
> > > show them to be two sides of the same issue, with just one fix required.
> > > Let me throw our patch into the cauldron.
> > > 
> > > [PATCH] mm: revisit shrink_lruvec's attempt at proportionality
> > > 
> > > We have a memcg reclaim test which exerts a certain amount of pressure,
> > > and expects to see a certain range of page reclaim in response.  It's a
> > > very wide range allowed, but the test repeatably failed on v3.11 onwards,
> > > because reclaim goes wild and frees up almost everything.
> > > 
> > > This wild behaviour bisects to Mel's "scan_adjusted" commit e82e0561dae9
> > > "mm: vmscan: obey proportional scanning requirements for kswapd".  That
> > > attempts to achieve proportionality between anon and file lrus: to the
> > > extent that once one of those is empty, it then tries to empty the other.
> > > Stop that.
> > > 
> > > Signed-off-by: Hugh Dickins <hughd@google.com>
> > > ---
> > > 
> > > We've been running happily with this for months; but all that time it's
> > > been on my TODO list with a "needs more thought" tag before we could
> > > upstream it, and I never got around to that.  We also have a somewhat
> > > similar, but older and quite independent, fix to get_scan_count() from
> > > Suleiman, which I'd meant to send along at the same time: I'll dig that
> > > one out tomorrow or the day after.
> 
> I've sent that one out now in a new thread
> https://lkml.org/lkml/2014/3/15/168
> and also let's tie these together with Hannes's
> https://lkml.org/lkml/2014/3/14/277
> 
> > > 
> > 
> > I ran a battery of page reclaim related tests against it on top of
> > 3.14-rc6. Workloads showed small improvements in their absolute performance
> > but actual IO behaviour looked much better in some tests.  This is the
> > iostats summary for the test that showed the biggest different -- dd of
> > a large file on ext3.
> > 
> >  	                3.14.0-rc6	3.14.0-rc6
> > 	                   vanilla	proportional-v1r1
> > Mean	sda-avgqz 	1045.64		224.18	
> > Mean	sda-await 	2120.12		506.77	
> > Mean	sda-r_await	18.61		19.78	
> > Mean	sda-w_await	11089.60	2126.35	
> > Max 	sda-avgqz 	2294.39		787.13	
> > Max 	sda-await 	7074.79		2371.67	
> > Max 	sda-r_await	503.00		414.00	
> > Max 	sda-w_await	35721.93	7249.84	
> > 
> > Not all workloads benefitted. The same workload on ext4 showed no useful
> > difference. btrfs looks like
> > 
> >  	             3.14.0-rc6	3.14.0-rc6
> > 	               vanilla	proportional-v1r1
> > Mean	sda-avgqz 	762.69		650.39	
> > Mean	sda-await 	2438.46		2495.15	
> > Mean	sda-r_await	44.18		47.20	
> > Mean	sda-w_await	6109.19		5139.86	
> > Max 	sda-avgqz 	2203.50		1870.78	
> > Max 	sda-await 	7098.26		6847.21	
> > Max 	sda-r_await	63.02		156.00	
> > Max 	sda-w_await	19921.70	11085.13	
> > 
> > Better but not as dramatically so. I didn't analyse why. A workload that
> > had a large anonymous mapping with large amounts of IO in the background
> > did not show any regressions so based on that and the fact the patch looks
> > ok, here goes nothing;
> > 
> > Acked-by: Mel Gorman <mgorman@suse.de>
> 
> Big thank you, Mel, for doing so much work on it, and so very quickly.
> I get quite lost in the numbers myself: I'm much more convinced of it
> by your numbers and ack.
> 
> > 
> > You say it's already been tested for months but it would be nice if the
> > workload that generated this thread was also tested.
> 
> Yes indeed: Yuanhan, do you have time to try this patch for your
> testcase?  I'm hoping it will prove at least as effective as your
> own suggested patch, but please let us know what you find - thanks.

Hi Hugh,

Sure, and sorry to tell you that this patch introduced another half
performance descrease from avg 60 MB/s to 30 MB/s in this testcase.

Moreover, the dd throughput for each process was steady before, however,
it's quite bumpy from 20 MB/s to 40 MB/s w/ this patch applied, and thus
got a avg of 30 MB/s:

    11327188992 bytes (11 GB) copied, 300.014 s, 37.8 MB/s
    1809373+0 records in
    1809372+0 records out
    7411187712 bytes (7.4 GB) copied, 300.008 s, 24.7 MB/s
    3068285+0 records in
    3068284+0 records out
    12567691264 bytes (13 GB) copied, 300.001 s, 41.9 MB/s
    1883877+0 records in
    1883876+0 records out
    7716356096 bytes (7.7 GB) copied, 300.002 s, 25.7 MB/s
    1807674+0 records in
    1807673+0 records out
    7404228608 bytes (7.4 GB) copied, 300.024 s, 24.7 MB/s
    1796473+0 records in
    1796472+0 records out
    7358349312 bytes (7.4 GB) copied, 300.008 s, 24.5 MB/s
    1905655+0 records in
    1905654+0 records out
    7805558784 bytes (7.8 GB) copied, 300.016 s, 26.0 MB/s
    2819168+0 records in
    2819167+0 records out
    11547308032 bytes (12 GB) copied, 300.025 s, 38.5 MB/s
    1848381+0 records in
    1848380+0 records out
    7570964480 bytes (7.6 GB) copied, 300.005 s, 25.2 MB/s
    3023133+0 records in
    3023132+0 records out
    12382748672 bytes (12 GB) copied, 300.024 s, 41.3 MB/s
    1714585+0 records in
    1714584+0 records out
    7022936064 bytes (7.0 GB) copied, 300.011 s, 23.4 MB/s
    1835132+0 records in
    1835131+0 records out
    7516696576 bytes (7.5 GB) copied, 299.998 s, 25.1 MB/s
    1733341+0 records in
    

	--yliu

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-03-18  6:38         ` Yuanhan Liu
@ 2014-03-19  3:20           ` Hugh Dickins
  0 siblings, 0 replies; 10+ messages in thread
From: Hugh Dickins @ 2014-03-19  3:20 UTC (permalink / raw)
  To: Yuanhan Liu
  Cc: Hugh Dickins, Mel Gorman, Andrew Morton, Johannes Weiner,
	Suleiman Souhlal, linux-mm, linux-kernel

On Tue, 18 Mar 2014, Yuanhan Liu wrote:
> On Sat, Mar 15, 2014 at 08:56:10PM -0700, Hugh Dickins wrote:
> > On Fri, 14 Mar 2014, Mel Gorman wrote:
> > > 
> > > You say it's already been tested for months but it would be nice if the
> > > workload that generated this thread was also tested.
> > 
> > Yes indeed: Yuanhan, do you have time to try this patch for your
> > testcase?  I'm hoping it will prove at least as effective as your
> > own suggested patch, but please let us know what you find - thanks.
> 
> Hi Hugh,
> 
> Sure, and sorry to tell you that this patch introduced another half
> performance descrease from avg 60 MB/s to 30 MB/s in this testcase.

Thanks a lot for trying it out.  I had been hoping that everything
would be wonderful, and I wouldn't have think at all about what's
going on.  You have made me sad :( but I can't blame your honesty!

I'll have to think a little after all, about your test, and Mel's
pertinent questions: I'll come back to you, nothing to say right now.

Hugh

> 
> Moreover, the dd throughput for each process was steady before, however,
> it's quite bumpy from 20 MB/s to 40 MB/s w/ this patch applied, and thus
> got a avg of 30 MB/s:
> 
>     11327188992 bytes (11 GB) copied, 300.014 s, 37.8 MB/s
>     1809373+0 records in
>     1809372+0 records out
>     7411187712 bytes (7.4 GB) copied, 300.008 s, 24.7 MB/s
>     3068285+0 records in
>     3068284+0 records out
>     12567691264 bytes (13 GB) copied, 300.001 s, 41.9 MB/s
>     1883877+0 records in
>     1883876+0 records out
>     7716356096 bytes (7.7 GB) copied, 300.002 s, 25.7 MB/s
>     1807674+0 records in
>     1807673+0 records out
>     7404228608 bytes (7.4 GB) copied, 300.024 s, 24.7 MB/s
>     1796473+0 records in
>     1796472+0 records out
>     7358349312 bytes (7.4 GB) copied, 300.008 s, 24.5 MB/s
>     1905655+0 records in
>     1905654+0 records out
>     7805558784 bytes (7.8 GB) copied, 300.016 s, 26.0 MB/s
>     2819168+0 records in
>     2819167+0 records out
>     11547308032 bytes (12 GB) copied, 300.025 s, 38.5 MB/s
>     1848381+0 records in
>     1848380+0 records out
>     7570964480 bytes (7.6 GB) copied, 300.005 s, 25.2 MB/s
>     3023133+0 records in
>     3023132+0 records out
>     12382748672 bytes (12 GB) copied, 300.024 s, 41.3 MB/s
>     1714585+0 records in
>     1714584+0 records out
>     7022936064 bytes (7.0 GB) copied, 300.011 s, 23.4 MB/s
>     1835132+0 records in
>     1835131+0 records out
>     7516696576 bytes (7.5 GB) copied, 299.998 s, 25.1 MB/s
>     1733341+0 records in
>     
> 
> 	--yliu
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd")
  2014-02-18  8:01 performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd") Yuanhan Liu
  2014-03-07  8:22 ` Yuanhan Liu
  2014-03-12 16:54 ` Mel Gorman
@ 2014-03-20 10:03 ` Bob Liu
  2 siblings, 0 replies; 10+ messages in thread
From: Bob Liu @ 2014-03-20 10:03 UTC (permalink / raw)
  To: Yuanhan Liu; +Cc: Mel Gorman, linux-mm, linux-kernel



On 02/18/2014 04:01 PM, Yuanhan Liu wrote:
> Hi,
> 
> Commit e82e0561("mm: vmscan: obey proportional scanning requirements for
> kswapd") caused a big performance regression(73%) for vm-scalability/
> lru-file-readonce testcase on a system with 256G memory without swap.
> 
> That testcase simply looks like this:
>      truncate -s 1T /tmp/vm-scalability.img
>      mkfs.xfs -q /tmp/vm-scalability.img
>      mount -o loop /tmp/vm-scalability.img /tmp/vm-scalability
> 
>      SPARESE_FILE="/tmp/vm-scalability/sparse-lru-file-readonce"
>      for i in `seq 1 120`; do
>          truncate $SPARESE_FILE-$i -s 36G
>          timeout --foreground -s INT 300 dd bs=4k if=$SPARESE_FILE-$i of=/dev/null
>      done
> 
>      wait
> 
> Actually, it's not the newlly added code(obey proportional scanning)
> in that commit caused the regression. But instead, it's the following
> change:
> +
> +               if (nr_reclaimed < nr_to_reclaim || scan_adjusted)
> +                       continue;
> +
> 
> 
> -               if (nr_reclaimed >= nr_to_reclaim &&
> -                   sc->priority < DEF_PRIORITY)
> +               if (global_reclaim(sc) && !current_is_kswapd())
>                         break;
> 
> The difference is that we might reclaim more than requested before
> in the first round reclaimming(sc->priority == DEF_PRIORITY).
> 

>From my understanding, I also think we used to reclaim more memory if
sc->priority==DEF_PRIORITY. See the while loop:

while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
                                       nr[LRU_INACTIVE_FILE]) {

For kswapd, the loop will continue until nr[LRU_INACTIVE_ANON],
nr[LRU_ACTIVE_FILE] and nr[LRU_INACTIVE_FILE] become zero.

But in commit e82e0561("mm: vmscan: obey proportional scanning
requirements for kswapd"), nr[lru] was set to 0.

/* Stop scanning the smaller of the LRU */
nr[lru] = 0;
nr[lru + LRU_ACTIVE] = 0;

And the other LRU scan count was also recalculated, as a result the
total scan count in this round may less than original code.

So I think this change is reasonable which make the behaviour the same
as before(also no performance drop).

-- 
Regards,
-Bob

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-03-20 10:04 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-02-18  8:01 performance regression due to commit e82e0561("mm: vmscan: obey proportional scanning requirements for kswapd") Yuanhan Liu
2014-03-07  8:22 ` Yuanhan Liu
2014-03-12 16:54 ` Mel Gorman
2014-03-13 12:44   ` Hugh Dickins
2014-03-14 14:21     ` Mel Gorman
2014-03-16  3:56       ` Hugh Dickins
2014-03-18  6:38         ` Yuanhan Liu
2014-03-19  3:20           ` Hugh Dickins
2014-03-14  4:54   ` Yuanhan Liu
2014-03-20 10:03 ` Bob Liu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).