[performance regression, bisected] scheduler: should_we_balance() kills filesystem performance

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
@ 2013-09-10  4:02 Dave Chinner
  2013-09-10  4:47 ` Joonsoo Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-09-10  4:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joonsoo Kim, Paul Turner, Peter Zijlstra, Ingo Molnar

Hi folks,

I just updated my performance test VM to the current 3.12-git
tree after the XFS dev branch was merged. The first test I ran
which was a 16-way concurrent fsmark test to create lots of files
gave me a number about 30% lower than I expected - ~180k files/s
when I was expecting somewhere around 250k files/s.

I did a bisect, and the bisect landed on this commit:

commit 23f0d2093c789e612185180c468fa09063834e87
Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date:   Tue Aug 6 17:36:42 2013 +0900

    sched: Factor out code to should_we_balance()

    Now checking whether this cpu is appropriate to balance or not
    is embedded into update_sg_lb_stats() and this checking has no direct
    relationship to this function. There is not enough reason to place
    this checking at update_sg_lb_stats(), except saving one iteration
    for sched_group_cpus.
....

Now, i couldn't revert that patch by itself, but I reverted the
series of about 10 scheduler patches in that series total from a
current TOT and the regression went away. Hence I'm pretty confident
that the this is the patch causing the issue as i've verified it in
more than one way and the difference between "good" and "bad" was
signficantlt greater than the variance of the test (1.5-2 stddev
difference).

In more detail:

			v4 filesystem		v5 filesystem
3.11+xfsdev:		220k files/s		225k files/s
3.12-git		180k files/s		185k files/s
3.12-git-revert		245k files/s		247k files/s

The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
image sitting on a 4-way RAID0 SSD array formatted with XFS and the
image file is accessed by virtio+direct IO. The fsmark command line
is:

time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
        -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
        -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
        -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
        -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
        -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
        -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
        -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
        -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
        | tee >(stats --trim-outliers | tail -1 1>&2)

The workload on XFS runs to almost being CPU bound - the effect of
the above patch was that there was a lot of idle time left in the
system. The workload consumed the same amount of user and system
CPU, just instantaneous CPU usage was reduced by 20-30% and the
elaspsed time was increased by 20-30%.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
  2013-09-10  4:02 [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Dave Chinner
@ 2013-09-10  4:47 ` Joonsoo Kim
  2013-09-10  6:15   ` Dave Chinner
  2013-09-10  8:06   ` [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Joonsoo Kim @ 2013-09-10  4:47 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, Paul Turner, Peter Zijlstra, Ingo Molnar

On Tue, Sep 10, 2013 at 02:02:54PM +1000, Dave Chinner wrote:
> Hi folks,
> 
> I just updated my performance test VM to the current 3.12-git
> tree after the XFS dev branch was merged. The first test I ran
> which was a 16-way concurrent fsmark test to create lots of files
> gave me a number about 30% lower than I expected - ~180k files/s
> when I was expecting somewhere around 250k files/s.
> 
> I did a bisect, and the bisect landed on this commit:
> 
> commit 23f0d2093c789e612185180c468fa09063834e87
> Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Date:   Tue Aug 6 17:36:42 2013 +0900
> 
>     sched: Factor out code to should_we_balance()
>     
>     Now checking whether this cpu is appropriate to balance or not
>     is embedded into update_sg_lb_stats() and this checking has no direct
>     relationship to this function. There is not enough reason to place
>     this checking at update_sg_lb_stats(), except saving one iteration
>     for sched_group_cpus.
> ....
> 
> Now, i couldn't revert that patch by itself, but I reverted the
> series of about 10 scheduler patches in that series total from a
> current TOT and the regression went away. Hence I'm pretty confident
> that the this is the patch causing the issue as i've verified it in
> more than one way and the difference between "good" and "bad" was
> signficantlt greater than the variance of the test (1.5-2 stddev
> difference).
> 
> In more detail:
> 
> 			v4 filesystem		v5 filesystem
> 3.11+xfsdev:		220k files/s		225k files/s
> 3.12-git		180k files/s		185k files/s
> 3.12-git-revert		245k files/s		247k files/s
> 
> The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
> image sitting on a 4-way RAID0 SSD array formatted with XFS and the
> image file is accessed by virtio+direct IO. The fsmark command line
> is:
> 
> time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
>         -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
>         -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
>         -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
>         -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
>         -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
>         -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
>         -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
>         -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
>         | tee >(stats --trim-outliers | tail -1 1>&2)
> 
> The workload on XFS runs to almost being CPU bound - the effect of
> the above patch was that there was a lot of idle time left in the
> system. The workload consumed the same amount of user and system
> CPU, just instantaneous CPU usage was reduced by 20-30% and the
> elaspsed time was increased by 20-30%.

Hello, Dave.

Now, I look again this patch and find one mistake.
If we find that we are appropriate cpu for balancing, should_we_balance()
should return 1. But current code doesn't do so. This correspond with
your observation that a lot of idle time left.

Could you re-test your benchmark with below?

Thanks.

------------------->8-------------------------
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f0a5e6..9b3fe1c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5151,7 +5151,7 @@ static int should_we_balance(struct lb_env *env)
         * First idle cpu or the first cpu(busiest) in this sched group
         * is eligible for doing load balancing at this and above domains.
         */
-       return balance_cpu != env->dst_cpu;
+       return balance_cpu == env->dst_cpu;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
  2013-09-10  4:47 ` Joonsoo Kim
@ 2013-09-10  6:15   ` Dave Chinner
  2013-09-10  6:54     ` Joonsoo Kim
  2013-09-10  8:06   ` [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-09-10  6:15 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel, Paul Turner, Peter Zijlstra, Ingo Molnar

On Tue, Sep 10, 2013 at 01:47:59PM +0900, Joonsoo Kim wrote:
> On Tue, Sep 10, 2013 at 02:02:54PM +1000, Dave Chinner wrote:
> > Hi folks,
> > 
> > I just updated my performance test VM to the current 3.12-git
> > tree after the XFS dev branch was merged. The first test I ran
> > which was a 16-way concurrent fsmark test to create lots of files
> > gave me a number about 30% lower than I expected - ~180k files/s
> > when I was expecting somewhere around 250k files/s.
> > 
> > I did a bisect, and the bisect landed on this commit:
> > 
> > commit 23f0d2093c789e612185180c468fa09063834e87
> > Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Date:   Tue Aug 6 17:36:42 2013 +0900
> > 
> >     sched: Factor out code to should_we_balance()
.....
> > 
> > 			v4 filesystem		v5 filesystem
> > 3.11+xfsdev:		220k files/s		225k files/s
> > 3.12-git		180k files/s		185k files/s
> > 3.12-git-revert		245k files/s		247k files/s
> > 
> > The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
> > image sitting on a 4-way RAID0 SSD array formatted with XFS and the
> > image file is accessed by virtio+direct IO. The fsmark command line
> > is:
> > 
> > time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
> >         -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
> >         -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
> >         -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
> >         -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
> >         -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
> >         -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
> >         -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
> >         -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
> >         | tee >(stats --trim-outliers | tail -1 1>&2)
> > 
> > The workload on XFS runs to almost being CPU bound - the effect of
> > the above patch was that there was a lot of idle time left in the
> > system. The workload consumed the same amount of user and system
> > CPU, just instantaneous CPU usage was reduced by 20-30% and the
> > elaspsed time was increased by 20-30%.
> 
> Hello, Dave.
> 
> Now, I look again this patch and find one mistake.
> If we find that we are appropriate cpu for balancing, should_we_balance()
> should return 1. But current code doesn't do so. This correspond with
> your observation that a lot of idle time left.
> 
> Could you re-test your benchmark with below?

Sure. It looks like your patch fixes the problem:

			v4 filesystem		v5 filesystem
3.11+xfsdev:		220k files/s		225k files/s
3.12-git		180k files/s		185k files/s
3.12-git-revert		245k files/s		247k files/s
3.12-git-fix		249k files/s		248k files/s

Thanks for the quick turnaround :)

Tested-by: Dave Chinner <dchinner@redhat.com>

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
  2013-09-10  6:15   ` Dave Chinner
@ 2013-09-10  6:54     ` Joonsoo Kim
  2013-09-10  7:25       ` [tip:sched/urgent] sched: Fix load balancing performance regression in should_we_balance() tip-bot for Joonsoo Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Joonsoo Kim @ 2013-09-10  6:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-kernel, Paul Turner, Peter Zijlstra, Ingo Molnar

On Tue, Sep 10, 2013 at 04:15:20PM +1000, Dave Chinner wrote:
> On Tue, Sep 10, 2013 at 01:47:59PM +0900, Joonsoo Kim wrote:
> > On Tue, Sep 10, 2013 at 02:02:54PM +1000, Dave Chinner wrote:
> > > Hi folks,
> > > 
> > > I just updated my performance test VM to the current 3.12-git
> > > tree after the XFS dev branch was merged. The first test I ran
> > > which was a 16-way concurrent fsmark test to create lots of files
> > > gave me a number about 30% lower than I expected - ~180k files/s
> > > when I was expecting somewhere around 250k files/s.
> > > 
> > > I did a bisect, and the bisect landed on this commit:
> > > 
> > > commit 23f0d2093c789e612185180c468fa09063834e87
> > > Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > Date:   Tue Aug 6 17:36:42 2013 +0900
> > > 
> > >     sched: Factor out code to should_we_balance()
> .....
> > > 
> > > 			v4 filesystem		v5 filesystem
> > > 3.11+xfsdev:		220k files/s		225k files/s
> > > 3.12-git		180k files/s		185k files/s
> > > 3.12-git-revert		245k files/s		247k files/s
> > > 
> > > The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
> > > image sitting on a 4-way RAID0 SSD array formatted with XFS and the
> > > image file is accessed by virtio+direct IO. The fsmark command line
> > > is:
> > > 
> > > time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
> > >         -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
> > >         -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
> > >         -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
> > >         -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
> > >         -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
> > >         -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
> > >         -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
> > >         -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
> > >         | tee >(stats --trim-outliers | tail -1 1>&2)
> > > 
> > > The workload on XFS runs to almost being CPU bound - the effect of
> > > the above patch was that there was a lot of idle time left in the
> > > system. The workload consumed the same amount of user and system
> > > CPU, just instantaneous CPU usage was reduced by 20-30% and the
> > > elaspsed time was increased by 20-30%.
> > 
> > Hello, Dave.
> > 
> > Now, I look again this patch and find one mistake.
> > If we find that we are appropriate cpu for balancing, should_we_balance()
> > should return 1. But current code doesn't do so. This correspond with
> > your observation that a lot of idle time left.
> > 
> > Could you re-test your benchmark with below?
> 
> Sure. It looks like your patch fixes the problem:
> 
> 			v4 filesystem		v5 filesystem
> 3.11+xfsdev:		220k files/s		225k files/s
> 3.12-git		180k files/s		185k files/s
> 3.12-git-revert		245k files/s		247k files/s
> 3.12-git-fix		249k files/s		248k files/s
> 
> Thanks for the quick turnaround :)
> 
> Tested-by: Dave Chinner <dchinner@redhat.com>
> 

Thanks for the quick turnaround, too. :)

Hello, Ingo.

I attach the formatted patch with proper SOBs and commit message.
Please merge this to fix above problem.

Thanks.

--------------------->8-------------------------
>From cf8ca492c2206e72d91ca0a1f6c59c5436132c61 Mon Sep 17 00:00:00 2001
From: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date: Tue, 10 Sep 2013 15:28:10 +0900
Subject: [PATCH] sched: return 1 if this cpu is proper for balancing in
 should_we_balance()

Commit 23f0d20('sched: Factor out code to should_we_balance()') introduces
should_we_balance() function. This function should return 1 if this cpu is
appropriate for balancing. But current code doesn't do so. When this
happens, it returns 0, instead of 1.

This introduces performance regression which Dave Chinner reports.

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s

You can find more detailed information in below link.
https://lkml.org/lkml/2013/9/10/1

This patch corrects the return value of should_we_balance() function
as orignally intended.

With this patch, Dave Chinner said that regression are gone.

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s
3.12-git-fix            249k files/s            248k files/s

Reported-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f0a5e6..9b3fe1c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5151,7 +5151,7 @@ static int should_we_balance(struct lb_env *env)
 	 * First idle cpu or the first cpu(busiest) in this sched group
 	 * is eligible for doing load balancing at this and above domains.
 	 */
-	return balance_cpu != env->dst_cpu;
+	return balance_cpu == env->dst_cpu;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:sched/urgent] sched: Fix load balancing performance regression in should_we_balance()
  2013-09-10  6:54     ` Joonsoo Kim
@ 2013-09-10  7:25       ` tip-bot for Joonsoo Kim
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Joonsoo Kim @ 2013-09-10  7:25 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, peterz, pjt, dchinner, tglx,
	iamjoonsoo.kim, david

Commit-ID:  b0cff9d88ce2f3030f73138078c5b1019f17e1cc
Gitweb:     http://git.kernel.org/tip/b0cff9d88ce2f3030f73138078c5b1019f17e1cc
Author:     Joonsoo Kim <iamjoonsoo.kim@lge.com>
AuthorDate: Tue, 10 Sep 2013 15:54:49 +0900
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 10 Sep 2013 09:20:42 +0200

sched: Fix load balancing performance regression in should_we_balance()

Commit 23f0d20 ("sched: Factor out code to should_we_balance()")
introduces the should_we_balance() function.  This function should
return 1 if this cpu is appropriate for balancing. But the newly
introduced code doesn't do so, it returns 0 instead of 1.

This introduces performance regression, reported by Dave Chinner:

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s

You can find more detailed information at:

  https://lkml.org/lkml/2013/9/10/1

This patch corrects the return value of should_we_balance()
function as orignally intended.

With this patch, Dave Chinner reports that the regression is gone:

                        v4 filesystem           v5 filesystem
3.11+xfsdev:            220k files/s            225k files/s
3.12-git                180k files/s            185k files/s
3.12-git-revert         245k files/s            247k files/s
3.12-git-fix            249k files/s            248k files/s

Reported-by: Dave Chinner <dchinner@redhat.com>
Tested-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Link: http://lkml.kernel.org/r/20130910065448.GA20368@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/fair.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7f0a5e6..9b3fe1c 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -5151,7 +5151,7 @@ static int should_we_balance(struct lb_env *env)
 	 * First idle cpu or the first cpu(busiest) in this sched group
 	 * is eligible for doing load balancing at this and above domains.
 	 */
-	return balance_cpu != env->dst_cpu;
+	return balance_cpu == env->dst_cpu;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
  2013-09-10  4:47 ` Joonsoo Kim
  2013-09-10  6:15   ` Dave Chinner
@ 2013-09-10  8:06   ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2013-09-10  8:06 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Dave Chinner, linux-kernel, Paul Turner, Ingo Molnar

On Tue, Sep 10, 2013 at 01:47:59PM +0900, Joonsoo Kim wrote:
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 7f0a5e6..9b3fe1c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -5151,7 +5151,7 @@ static int should_we_balance(struct lb_env *env)
>          * First idle cpu or the first cpu(busiest) in this sched group
>          * is eligible for doing load balancing at this and above domains.
>          */
> -       return balance_cpu != env->dst_cpu;
> +       return balance_cpu == env->dst_cpu;
>  }

Duh.. /me kicks himself for not seeing that.

Thanks!

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-10  8:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-10  4:02 [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Dave Chinner
2013-09-10  4:47 ` Joonsoo Kim
2013-09-10  6:15   ` Dave Chinner
2013-09-10  6:54     ` Joonsoo Kim
2013-09-10  7:25       ` [tip:sched/urgent] sched: Fix load balancing performance regression in should_we_balance() tip-bot for Joonsoo Kim
2013-09-10  8:06   ` [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox