public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance
@ 2013-09-10  4:02 Dave Chinner
  2013-09-10  4:47 ` Joonsoo Kim
  0 siblings, 1 reply; 6+ messages in thread
From: Dave Chinner @ 2013-09-10  4:02 UTC (permalink / raw)
  To: linux-kernel; +Cc: Joonsoo Kim, Paul Turner, Peter Zijlstra, Ingo Molnar

Hi folks,

I just updated my performance test VM to the current 3.12-git
tree after the XFS dev branch was merged. The first test I ran
which was a 16-way concurrent fsmark test to create lots of files
gave me a number about 30% lower than I expected - ~180k files/s
when I was expecting somewhere around 250k files/s.

I did a bisect, and the bisect landed on this commit:

commit 23f0d2093c789e612185180c468fa09063834e87
Author: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Date:   Tue Aug 6 17:36:42 2013 +0900

    sched: Factor out code to should_we_balance()
    
    Now checking whether this cpu is appropriate to balance or not
    is embedded into update_sg_lb_stats() and this checking has no direct
    relationship to this function. There is not enough reason to place
    this checking at update_sg_lb_stats(), except saving one iteration
    for sched_group_cpus.
....

Now, i couldn't revert that patch by itself, but I reverted the
series of about 10 scheduler patches in that series total from a
current TOT and the regression went away. Hence I'm pretty confident
that the this is the patch causing the issue as i've verified it in
more than one way and the difference between "good" and "bad" was
signficantlt greater than the variance of the test (1.5-2 stddev
difference).

In more detail:

			v4 filesystem		v5 filesystem
3.11+xfsdev:		220k files/s		225k files/s
3.12-git		180k files/s		185k files/s
3.12-git-revert		245k files/s		247k files/s

The test vm is a 16p/16GB RAM VM, with a sparse 100TB filesystem
image sitting on a 4-way RAID0 SSD array formatted with XFS and the
image file is accessed by virtio+direct IO. The fsmark command line
is:

time ./fs_mark  -D  10000  -S0  -n  100000  -s  0  -L  32 \
        -d  /mnt/scratch/0  -d  /mnt/scratch/1 \
        -d  /mnt/scratch/2  -d  /mnt/scratch/3 \
        -d  /mnt/scratch/4  -d  /mnt/scratch/5 \
        -d  /mnt/scratch/6  -d  /mnt/scratch/7 \
        -d  /mnt/scratch/8  -d  /mnt/scratch/9 \
        -d  /mnt/scratch/10  -d  /mnt/scratch/11 \
        -d  /mnt/scratch/12  -d  /mnt/scratch/13 \
        -d  /mnt/scratch/14  -d  /mnt/scratch/15 \
        | tee >(stats --trim-outliers | tail -1 1>&2)

The workload on XFS runs to almost being CPU bound - the effect of
the above patch was that there was a lot of idle time left in the
system. The workload consumed the same amount of user and system
CPU, just instantaneous CPU usage was reduced by 20-30% and the
elaspsed time was increased by 20-30%.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-09-10  8:06 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-10  4:02 [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Dave Chinner
2013-09-10  4:47 ` Joonsoo Kim
2013-09-10  6:15   ` Dave Chinner
2013-09-10  6:54     ` Joonsoo Kim
2013-09-10  7:25       ` [tip:sched/urgent] sched: Fix load balancing performance regression in should_we_balance() tip-bot for Joonsoo Kim
2013-09-10  8:06   ` [performance regression, bisected] scheduler: should_we_balance() kills filesystem performance Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox