RE: Regression caused by using node_to_bdi()

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Zhao Lei <zhaolei@cn.fujitsu.com>
To: "'Christoph Hellwig'" <hch@lst.de>, "'Jan Kara'" <jack@suse.cz>
Cc: "'Tejun Heo'" <tj@kernel.org>, "'Jens Axboe'" <axboe@fb.com>,
	"'LKML'" <linux-kernel@vger.kernel.org>
Subject: RE: Regression caused by using node_to_bdi()
Date: Wed, 1 Apr 2015 17:56:53 +0800	[thread overview]
Message-ID: <027501d06c62$2eea3480$8cbe9d80$@cn.fujitsu.com> (raw)
In-Reply-To: 

Hi, Christoph

*From: Zhao Lei [mailto:zhaolei@cn.fujitsu.com]
> Sent: Monday, March 09, 2015 10:47 AM
> To: 'Christoph Hellwig'; 'Jan Kara'
> Cc: 'Tejun Heo'; 'Jens Axboe'
> Subject: RE: Regression caused by using node_to_bdi()
> 
> Hi, Christoph and Jan
> 
> * From: 'Christoph Hellwig' [mailto:hch@lst.de]
> > Sent: Sunday, March 08, 2015 11:34 PM
> > To: Jan Kara
> > Cc: Zhao Lei; 'Christoph Hellwig'; 'Tejun Heo'; 'Jens Axboe'
> > Subject: Re: Regression caused by using node_to_bdi()
> >
> > On Sun, Mar 08, 2015 at 11:29:16AM +0100, Jan Kara wrote:
> > >   Frankly, I doubt the cost of inode_to_bdi() is the reason for the
> > > slowdown here. If I read the numbers right, the throughput dropped
> > > from 135 MB/s on average to 130 MB/s on average. Such load is hardly
> > > going to saturate the CPU enough for additional cycles in
> > > inode_to_bdi() to
> > matter.
> > > The load like this is completely IO bound unless you have really
> > > fast drive (doing GB/s).  What are the throughput number just before
> > > / after this commit?\
> 
> These are performance data before and after this patch In bisect:

What is your opinion about this regression?
Please tell me if you need additional test and result on my env.

Thanks
Zhaolei

> 
> v3.19-rc5_00005_495a27 : io_speed: valcnt=10 avg=137.409
> range=[134.820,139.000] diff=3.10% stdev=1.574 cv=1.15%
> v3.19-rc5_00006_26ff13 : io_speed: valcnt=10 avg=136.534
> range=[132.390,139.500] diff=5.37% stdev=2.659 cv=1.95%
> v3.19-rc5_00007_de1414 : io_speed: valcnt=10 avg=130.358
> range=[129.070,132.150] diff=2.39% stdev=1.120 cv=0.86% <- *this patch*
> v3.19-rc5_00008_b83ae6 : io_speed: valcnt=10 avg=129.549
> range=[129.200,129.910] diff=0.55% stdev=0.241 cv=0.19%
> v3.19-rc5_00011_c4db59 : io_speed: valcnt=10 avg=130.033
> range=[129.050,131.620] diff=1.99% stdev=0.854 cv=0.66%
> 
> 
> > What is the CPU load while the benchmark is running?
> >
> I hadn't record cpu load in testing, I'll do it if it is necessary for debug.
> 
> These are of one of sysbench's log:
> 
> sysbench 0.4.12:  multi-threaded system evaluation benchmark
> 
> 1 files, 4194304Kb each, 4096Mb total
> Creating files for the test...
> sysbench 0.4.12:  multi-threaded system evaluation benchmark
> 
> Running the test with following options:
> Number of threads: 1
> 
> Extra file open flags: 0
> 1 files, 4Gb each
> 4Gb total file size
> Block size 32Kb
> Using synchronous I/O mode
> Doing sequential write (creation) test
> Threads started!
> Done.
> 
> Operations performed:  0 Read, 131072 Write, 0 Other = 131072 Total Read
> 0b  Written 4Gb  Total transferred 4Gb  (132.15Mb/sec)
>  4228.75 Requests/sec executed
> 
> Test execution summary:
>     total time:                          30.9955s
>     total number of events:              131072
>     total time taken by event execution: 30.8731
>     per-request statistics:
>          min:                                  0.01ms
>          avg:                                  0.24ms
>          max:                                 30.80ms
>          approx.  95 percentile:               0.03ms
> 
> Threads fairness:
>     events (avg/stddev):           131072.0000/0.00
>     execution time (avg/stddev):   30.8731/0.00
> 
> sysbench 0.4.12:  multi-threaded system evaluation benchmark
> 
> 
> > How much memory does the machine have?
> >
> 2G mem, 2-core machin, test is running on 1T sata disk.
> 
> [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]# cat
> /proc/meminfo
> MemTotal:        2015812 kB
> MemFree:          627416 kB
> MemAvailable:    1755488 kB
> Buffers:          345876 kB
> Cached:           772788 kB
> SwapCached:            0 kB
> Active:           848864 kB
> Inactive:         320044 kB
> Active(anon):      54128 kB
> Inactive(anon):     5080 kB
> Active(file):     794736 kB
> Inactive(file):   314964 kB
> Unevictable:           0 kB
> Mlocked:               0 kB
> SwapTotal:             0 kB
> SwapFree:              0 kB
> Dirty:                 0 kB
> Writeback:             0 kB
> AnonPages:         50140 kB
> Mapped:            41636 kB
> Shmem:              8984 kB
> Slab:             200312 kB
> SReclaimable:     187308 kB
> SUnreclaim:        13004 kB
> KernelStack:        1728 kB
> PageTables:         4056 kB
> NFS_Unstable:          0 kB
> Bounce:                0 kB
> WritebackTmp:          0 kB
> CommitLimit:     1007904 kB
> Committed_AS:     205956 kB
> VmallocTotal:   34359738367 kB
> VmallocUsed:      539968 kB
> VmallocChunk:   34359195223 kB
> HardwareCorrupted:     0 kB
> AnonHugePages:      6144 kB
> HugePages_Total:       0
> HugePages_Free:        0
> HugePages_Rsvd:        0
> HugePages_Surp:        0
> Hugepagesize:       2048 kB
> DirectMap4k:       61056 kB
> DirectMap2M:     2000896 kB
> [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]# cat
> /proc/cpuinfo
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> stepping        : 10
> microcode       : 0xa0b
> cpu MHz         : 1603.000
> cache size      : 3072 KB
> physical id     : 0
> siblings        : 2
> core id         : 0
> cpu cores       : 2
> apicid          : 0
> initial apicid  : 0
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 13
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64
> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm
> tpr_shadow vnmi flexpriority
> bugs            :
> bogomips        : 5851.89
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
> 
> processor       : 1
> vendor_id       : GenuineIntel
> cpu family      : 6
> model           : 23
> model name      : Intel(R) Core(TM)2 Duo CPU     E7500  @ 2.93GHz
> stepping        : 10
> microcode       : 0xa0b
> cpu MHz         : 1603.000
> cache size      : 3072 KB
> physical id     : 0
> siblings        : 2
> core id         : 1
> cpu cores       : 2
> apicid          : 1
> initial apicid  : 1
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 13
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
> cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm
> constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64
> monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm dtherm
> tpr_shadow vnmi flexpriority
> bugs            :
> bogomips        : 5851.89
> clflush size    : 64
> cache_alignment : 64
> address sizes   : 36 bits physical, 48 bits virtual
> power management:
> 
> [root@btrfs test_nosync_32768__sync_1_seqwr_4G_btrfs_1]#
> 
> Please tell me if you are interesting on more information or operation.
> 
> Thanks
> Zhaolei
> 
> > I remember an issue a few years ago where simply reverting a path that
> > uninlined the rw_sem code fixed a buffered I/O performance regression
> > when using Samba on a very low end arm device, so everything is possible.
> >
> > I'd still like to ensure the numbers are reproducible in this case
> > first, and look at all the information Jan asked for.  Ask a next step
> > we could then look at using an inline version to check if thast helps.

next      parent reply	other threads:[~2015-04-01  9:57 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <003f01d057c6$eb48c3e0$c1da4ba0$@cn.fujitsu.com>
     [not found] ` <20150308102916.GD3743@quack.suse.cz>
     [not found]   ` <20150308153423.GA24154@lst.de>
2015-04-01  9:56     ` Zhao Lei [this message]
2015-04-10 11:25 Regression caused by using node_to_bdi() Zhao Lei
2015-04-12 11:33 ` Boaz Harrosh
2015-04-12 14:39   ` Boaz Harrosh
2015-04-13  1:20     ` Zhao Lei
2015-04-13  7:00     ` Zhao Lei
2015-04-13 10:22     ` Zhao Lei
2015-04-13 12:31       ` Boaz Harrosh
2015-04-14 12:14         ` Zhao Lei
2015-04-13 12:21   ` Jan Kara
2015-04-13 12:44     ` Boaz Harrosh
2015-04-13 17:32 ` 'Christoph Hellwig'
2015-04-14 12:27   ` Zhao Lei
  -- strict thread matches above, loose matches on Subject: below --
2015-03-06  4:37 Zhao Lei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='027501d06c62$2eea3480$8cbe9d80$@cn.fujitsu.com' \
    --to=zhaolei@cn.fujitsu.com \
    --cc=axboe@fb.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).