linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* md raid performance with 3-18-rc3
@ 2014-11-24  8:10 Manish Awasthi
  2014-11-25  2:37 ` NeilBrown
  0 siblings, 1 reply; 8+ messages in thread
From: Manish Awasthi @ 2014-11-24  8:10 UTC (permalink / raw)
  To: linux-raid

Hi,

We benchmarked the md raid driver performance on 3-18-rc3 kernel and 
compared the results with that of 3.6.11. The reason for this exercise 
is to understand if multithreaded raid driver has any performance 
benefits over 3.6.11 which is single threaded. Here are some details 
about the setup

System: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz 4 cores (8threads), 
8GB RAM.
Setup: 3 SSDs create a raid5 array
test tool: iozone (only read/re-read, write/re-write tested), blocksize: 
4k-64k, filesize: 1Gig to 200Gig

Comparison was done for speed of data transfer in kBytes/sec and also 
the CPU utilization as reported by iozone.

raid on 3.18.0-rc3 performed much worse than raid on 3.6.11.

Read/Write: raid on 3.18.0-rc3 operated at almost half the speed of raid 
on 3.6.11

CPU Utilization: With md raid on 3.18.0-rc3, the CPU utilization was 
less than half of md raid on 3.6.11 on WRITE operations. However, for 
READ operations, 3.18-0.rc3 had more CPU utilization than 3.6.11.

Also, I noticed that scaling up the CPU cores of the system scales down 
the raid througput with 3.18.0-rc3.

I do have detailed logs of the comparison but I'm not sure I should send 
those on this mailing list.

If my observation aligns with someone else's, then what is really the 
gain with multithreaded raid.

Manish

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
  2014-11-24  8:10 md raid performance with 3-18-rc3 Manish Awasthi
@ 2014-11-25  2:37 ` NeilBrown
       [not found]   ` <54758B3B.5080907@caviumnetworks.com>
  0 siblings, 1 reply; 8+ messages in thread
From: NeilBrown @ 2014-11-25  2:37 UTC (permalink / raw)
  To: Manish Awasthi; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]

On Mon, 24 Nov 2014 13:40:06 +0530 Manish Awasthi
<manish.awasthi@caviumnetworks.com> wrote:

> Hi,
> 
> We benchmarked the md raid driver performance on 3-18-rc3 kernel and 
> compared the results with that of 3.6.11. The reason for this exercise 
> is to understand if multithreaded raid driver has any performance 
> benefits over 3.6.11 which is single threaded. Here are some details 
> about the setup

Thanks for doing this!!!! I love it when people report test results.


> 
> System: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz 4 cores (8threads), 
> 8GB RAM.
> Setup: 3 SSDs create a raid5 array
> test tool: iozone (only read/re-read, write/re-write tested), blocksize: 
> 4k-64k, filesize: 1Gig to 200Gig
> 
> Comparison was done for speed of data transfer in kBytes/sec and also 
> the CPU utilization as reported by iozone.
> 
> raid on 3.18.0-rc3 performed much worse than raid on 3.6.11.
> 
> Read/Write: raid on 3.18.0-rc3 operated at almost half the speed of raid 
> on 3.6.11

That really isn't very good.... Can you try some of the kernels in between
and see if there was a single point where performance dropped, or if there
were several steps?


> 
> CPU Utilization: With md raid on 3.18.0-rc3, the CPU utilization was 
> less than half of md raid on 3.6.11 on WRITE operations. However, for 
> READ operations, 3.18-0.rc3 had more CPU utilization than 3.6.11.

Can you use "perf" to determine where the extra time is going?

  perf record
  run test
  stop perf
  perf report

or something like that.

> 
> Also, I noticed that scaling up the CPU cores of the system scales down 
> the raid througput with 3.18.0-rc3.

This is by writing numbers to "group_thread_cnt" ??? Can you provide a simple
table comparing thread count to throughput?  Or maybe a graph.  I love
graphs :-)


> 
> I do have detailed logs of the comparison but I'm not sure I should send 
> those on this mailing list.

A few megabytes?  Yes.  100Meg?  No.

If you could put them on a website somewhere that I can browse or download
I'll try to have a look.

> 
> If my observation aligns with someone else's, then what is really the 
> gain with multithreaded raid.

Some testing shows real improvements.  Obviously we cannot test everything
and I'm very glad to have extra testing from other people.
If we can quantify the regressions and confirm exactly when they occurred, we
can start looking for a solution.

Thanks a lot!

NeilBrown


> 
> Manish
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
       [not found]   ` <54758B3B.5080907@caviumnetworks.com>
@ 2014-12-03  5:19     ` NeilBrown
  2014-12-03  6:21     ` NeilBrown
  1 sibling, 0 replies; 8+ messages in thread
From: NeilBrown @ 2014-12-03  5:19 UTC (permalink / raw)
  To: Manish Awasthi; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4767 bytes --]

On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
<manish.awasthi@caviumnetworks.com> wrote:

> 
> On 11/25/2014 08:07 AM, NeilBrown wrote:
> > On Mon, 24 Nov 2014 13:40:06 +0530 Manish Awasthi
> > <manish.awasthi@caviumnetworks.com> wrote:
> >
> >> Hi,
> >>
> >> We benchmarked the md raid driver performance on 3-18-rc3 kernel and
> >> compared the results with that of 3.6.11. The reason for this exercise
> >> is to understand if multithreaded raid driver has any performance
> >> benefits over 3.6.11 which is single threaded. Here are some details
> >> about the setup
> > Thanks for doing this!!!! I love it when people report test results.
> >
> >
> >> System: Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz 4 cores (8threads),
> >> 8GB RAM.
> >> Setup: 3 SSDs create a raid5 array
> >> test tool: iozone (only read/re-read, write/re-write tested), blocksize:
> >> 4k-64k, filesize: 1Gig to 200Gig
> >>
> >> Comparison was done for speed of data transfer in kBytes/sec and also
> >> the CPU utilization as reported by iozone.
> >>
> >> raid on 3.18.0-rc3 performed much worse than raid on 3.6.11.
> >>
> >> Read/Write: raid on 3.18.0-rc3 operated at almost half the speed of raid
> >> on 3.6.11
> > That really isn't very good.... Can you try some of the kernels in between
> > and see if there was a single point where performance dropped, or if there
> > were several steps?
> Can you give me some starting point when multithread support for raid 
> was added. It might be a good starting point. For instance, I have a 
> benchmark on 3.6.11, now I'd like to go to the first kernel that has 
> support for multithread raid and then take it from there.

Multi-thread support appeared in 3.12.


> >
> >
> >> CPU Utilization: With md raid on 3.18.0-rc3, the CPU utilization was
> >> less than half of md raid on 3.6.11 on WRITE operations. However, for
> >> READ operations, 3.18-0.rc3 had more CPU utilization than 3.6.11.
> > Can you use "perf" to determine where the extra time is going?
> >
> >    perf record
> >    run test
> >    stop perf
> >    perf report
> >
> > or something like that.
> I can do this but as I mentioned below, Its better if I can get to 
> understand all the possible tweaks that can be done to get the optimal 
> results unless ofcourse you expect 3.18.0 to perform better than that of 
> 3.6.11 even if default case without any tweaks.

I have no particular expectations.  I like to see concrete measurements and
then try to interpret them.

I prefer to compare default setting (no tweaks) in the first instance.
Because that is what most people will be using.


> >
> >> Also, I noticed that scaling up the CPU cores of the system scales down
> >> the raid througput with 3.18.0-rc3.
> > This is by writing numbers to "group_thread_cnt" ??? Can you provide a simple
> > table comparing thread count to throughput?  Or maybe a graph.  I love
> > graphs :-)
> I did not tweak anything on the 3.18.0 kernel. I assumed all the 
> required support is built-in and did not bother to go into the depth of 
> the code as we're still in nascent stages where we are comparing the 
> data on specific kernel versions. Can you point me to some text that can 
> describe tweaks like "group_thread_cnt" etc?

Multi-threading is disabled by default, so if you haven't explicitly enabled
it, then it cannot be affecting your performance.

If you
  echo 8 > /sys/block/mdXXX/md/group_thread_cnt

it will use 8 thread to perform 'xor' calculations and submit IO requests.


> >
> >> I do have detailed logs of the comparison but I'm not sure I should send
> >> those on this mailing list.
> > A few megabytes?  Yes.  100Meg?  No.
> 
> Whatever data I have on comparison is attached, I have consolidated this 
> from log files to excel. See if this helps.

Thanks.  I'll have a look at the tables and see if anything looks interesting.

Thanks,
NeilBrown



> >
> > If you could put them on a website somewhere that I can browse or download
> > I'll try to have a look.
> >
> >> If my observation aligns with someone else's, then what is really the
> >> gain with multithreaded raid.
> > Some testing shows real improvements.  Obviously we cannot test everything
> > and I'm very glad to have extra testing from other people.
> > If we can quantify the regressions and confirm exactly when they occurred, we
> > can start looking for a solution.
> >
> > Thanks a lot!
> >
> > NeilBrown
> >
> >
> >> Manish
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
       [not found]   ` <54758B3B.5080907@caviumnetworks.com>
  2014-12-03  5:19     ` NeilBrown
@ 2014-12-03  6:21     ` NeilBrown
       [not found]       ` <5486B15C.8060109@caviumnetworks.com>
  1 sibling, 1 reply; 8+ messages in thread
From: NeilBrown @ 2014-12-03  6:21 UTC (permalink / raw)
  To: Manish Awasthi; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1768 bytes --]

On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
<manish.awasthi@caviumnetworks.com> wrote:

> Whatever data I have on comparison is attached, I have consolidated this 
> from log files to excel. See if this helps.

raid_3_18_performance.xls shows read throughput to be consistently 20% down
on 3.18 compared to 3.6.11.

Writes are a few percent better for 4G/8G files, 20% better for 16G/32G files.
unchanged above that.
Given that you have 8G of RAM, that seems like it could be some change in
caching behaviour, and not necessarily a change in RAID behaviour.

The CPU utilization roughly follows the throughput: 40% higher when write
throughput is 20% better.
Could you check if the value of /proc/sys/vm/dirty_ratio is the same for both
tests.  That number has changed occasionally and could affect these tests.


The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" numbers
negative where I expected positive.. i.e. negative mean an increase.

Writes consistently have higher CPU utilisation.
Reads consistently have much lower CPU utilization.

I don't know what that means ... it might not mean anything.

Could you please run the tests between the two kernels *with* RAID.  i.e.
directly on an SSD.  That will give us a baseline for what changes are caused
by other parts of the kernel (filesystem, block layer, MM, etc).  Then we can
see how much change RAID5 is contributing.

The third file, 3SSDs-perf-4Core.xls seems to show significantly reduced
throughput across the board.
CPU utilization is less (better) for writes, but worse for reads.  That is
the reverse of what the second file shows.

I might try running some tests across a set of kernel versions and see what I
can come up with.

NeilBrown

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
       [not found]       ` <5486B15C.8060109@caviumnetworks.com>
@ 2014-12-09  8:24         ` Manish Awasthi
  2014-12-09  8:26           ` Manish Awasthi
  0 siblings, 1 reply; 8+ messages in thread
From: Manish Awasthi @ 2014-12-09  8:24 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

resending:

  dirty_ratio same for both the kernels.
>
> vm.dirty_background_bytes = 0
> vm.dirty_background_ratio = 10
> vm.dirty_bytes = 0
> vm.dirty_expire_centisecs = 3000
> vm.dirty_ratio = 20
> vm.dirty_writeback_centisecs = 500
>
>
> I re-ran the tests with the same set of kernel without enabling 
> multithread support on 3.18 and measured a few things with perf.
>
> perf-stat-<kernel>.txt: test ran for some time and measured various 
> parameters.
>
> Meanwhile I'm also running complete test under perf record. I'll share 
> the results soon.
>
> Manish
>
> On 12/03/2014 11:51 AM, NeilBrown wrote:
>> On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
>> <manish.awasthi@caviumnetworks.com>  wrote:
>>
>>> Whatever data I have on comparison is attached, I have consolidated this
>>> from log files to excel. See if this helps.
>> raid_3_18_performance.xls shows read throughput to be consistently 20% down
>> on 3.18 compared to 3.6.11.
>>
>> Writes are a few percent better for 4G/8G files, 20% better for 16G/32G files.
>> unchanged above that.
>> Given that you have 8G of RAM, that seems like it could be some change in
>> caching behaviour, and not necessarily a change in RAID behaviour.
>>
>> The CPU utilization roughly follows the throughput: 40% higher when write
>> throughput is 20% better.
>> Could you check if the value of /proc/sys/vm/dirty_ratio is the same for both
>> tests.  That number has changed occasionally and could affect these tests.
>>
>>
>> The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" numbers
>> negative where I expected positive.. i.e. negative mean an increase.
>>
>> Writes consistently have higher CPU utilisation.
>> Reads consistently have much lower CPU utilization.
>>
>> I don't know what that means ... it might not mean anything.
>>
>> Could you please run the tests between the two kernels *with* RAID.  i.e.
>> directly on an SSD.  That will give us a baseline for what changes are caused
>> by other parts of the kernel (filesystem, block layer, MM, etc).  Then we can
>> see how much change RAID5 is contributing.
>>
>> The third file, 3SSDs-perf-4Core.xls seems to show significantly reduced
>> throughput across the board.
>> CPU utilization is less (better) for writes, but worse for reads.  That is
>> the reverse of what the second file shows.
>>
>> I might try running some tests across a set of kernel versions and see what I
>> can come up with.
>>
>> NeilBrown
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
  2014-12-09  8:24         ` Manish Awasthi
@ 2014-12-09  8:26           ` Manish Awasthi
       [not found]             ` <5487FD79.7000002@caviumnetworks.com>
  0 siblings, 1 reply; 8+ messages in thread
From: Manish Awasthi @ 2014-12-09  8:26 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2666 bytes --]

this time with attachment:

manish
On 12/09/2014 01:54 PM, Manish Awasthi wrote:
> resending:
>
>  dirty_ratio same for both the kernels.
>>
>> vm.dirty_background_bytes = 0
>> vm.dirty_background_ratio = 10
>> vm.dirty_bytes = 0
>> vm.dirty_expire_centisecs = 3000
>> vm.dirty_ratio = 20
>> vm.dirty_writeback_centisecs = 500
>>
>>
>> I re-ran the tests with the same set of kernel without enabling 
>> multithread support on 3.18 and measured a few things with perf.
>>
>> perf-stat-<kernel>.txt: test ran for some time and measured various 
>> parameters.
>>
>> Meanwhile I'm also running complete test under perf record. I'll 
>> share the results soon.
>>
>> Manish
>>
>> On 12/03/2014 11:51 AM, NeilBrown wrote:
>>> On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
>>> <manish.awasthi@caviumnetworks.com>  wrote:
>>>
>>>> Whatever data I have on comparison is attached, I have consolidated 
>>>> this
>>>> from log files to excel. See if this helps.
>>> raid_3_18_performance.xls shows read throughput to be consistently 
>>> 20% down
>>> on 3.18 compared to 3.6.11.
>>>
>>> Writes are a few percent better for 4G/8G files, 20% better for 
>>> 16G/32G files.
>>> unchanged above that.
>>> Given that you have 8G of RAM, that seems like it could be some 
>>> change in
>>> caching behaviour, and not necessarily a change in RAID behaviour.
>>>
>>> The CPU utilization roughly follows the throughput: 40% higher when 
>>> write
>>> throughput is 20% better.
>>> Could you check if the value of /proc/sys/vm/dirty_ratio is the same 
>>> for both
>>> tests.  That number has changed occasionally and could affect these 
>>> tests.
>>>
>>>
>>> The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" numbers
>>> negative where I expected positive.. i.e. negative mean an increase.
>>>
>>> Writes consistently have higher CPU utilisation.
>>> Reads consistently have much lower CPU utilization.
>>>
>>> I don't know what that means ... it might not mean anything.
>>>
>>> Could you please run the tests between the two kernels *with* RAID.  
>>> i.e.
>>> directly on an SSD.  That will give us a baseline for what changes 
>>> are caused
>>> by other parts of the kernel (filesystem, block layer, MM, etc).  
>>> Then we can
>>> see how much change RAID5 is contributing.
>>>
>>> The third file, 3SSDs-perf-4Core.xls seems to show significantly 
>>> reduced
>>> throughput across the board.
>>> CPU utilization is less (better) for writes, but worse for reads.  
>>> That is
>>> the reverse of what the second file shows.
>>>
>>> I might try running some tests across a set of kernel versions and 
>>> see what I
>>> can come up with.
>>>
>>> NeilBrown
>>
>


[-- Attachment #2: perf-stat-3.6.11.txt --]
[-- Type: text/plain, Size: 3546 bytes --]

perf stat on md125_raid5 -- kernel 3.6.11

# perf stat -p 2613 -e cycles,instructions,cache-references,cache-misses,branches,branch-misses,bus-cycles,stalled-cycles-frontend,ref-cycles,cpu-clock,task-clock,faults,context-switches,cpu-migrations,minor-faults,major-faults,alignment-faults,emulation-faults,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,LLC-loads,LLC-stores,LLC-prefetches,dTLB-load-misses,dTLB-store-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses
^C 
 Performance counter stats for process id '2613':

   103,200,677,721      cycles                    #    2.848 GHz                     [22.72%]
    69,669,813,983      instructions              #    0.68  insns per cycle        
                                                  #    1.07  stalled cycles per insn [27.26%]
     2,668,465,769      cache-references          #   73.648 M/sec                   [27.35%]
     1,408,493,680      cache-misses              #   52.783 % of all cache refs     [27.17%]
    13,609,211,321      branches                  #  375.607 M/sec                   [27.19%]
       121,593,598      branch-misses             #    0.89% of all branches         [27.32%]
     3,420,725,359      bus-cycles                #   94.410 M/sec                   [18.07%]
    74,362,368,252      stalled-cycles-frontend   #   72.06% frontend cycles idle    [18.16%]
   112,553,945,650      ref-cycles                # 3106.427 M/sec                   [22.76%]
      36233.766411      cpu-clock (msec)                                            
      36232.605499      task-clock (msec)         #    0.181 CPUs utilized          
                 0      faults                    #    0.000 K/sec                  
           442,885      context-switches          #    0.012 M/sec                  
             9,646      cpu-migrations            #    0.266 K/sec                  
                 0      minor-faults              #    0.000 K/sec                  
                 0      major-faults              #    0.000 K/sec                  
                 0      alignment-faults          #    0.000 K/sec                  
                 0      emulation-faults          #    0.000 K/sec                  
     3,188,865,936      L1-dcache-load-misses     #   88.011 M/sec                   [22.96%]
     1,658,831,957      L1-dcache-store-misses    #   45.783 M/sec                   [22.89%]
       338,744,029      L1-dcache-prefetch-misses #    9.349 M/sec                   [23.04%]
       445,066,995      L1-icache-load-misses     #   12.284 M/sec                   [22.99%]
     1,578,067,225      LLC-loads                 #   43.554 M/sec                   [18.19%]
     1,317,822,999      LLC-stores                #   36.371 M/sec                   [18.23%]
       798,004,610      LLC-prefetches            #   22.024 M/sec                   [ 9.09%]
                 0      dTLB-load-misses          #    0.000 K/sec                   [13.52%]
         7,633,236      dTLB-store-misses         #    0.211 M/sec                   [18.03%]
        10,024,464      iTLB-loads                #    0.277 M/sec                   [17.92%]
         3,157,141      iTLB-load-misses          #   31.49% of all iTLB cache hits  [18.12%]
    13,616,857,645      branch-loads              #  375.818 M/sec                   [18.16%]
       119,250,450      branch-load-misses        #    3.291 M/sec                   [18.14%]

     200.190181623 seconds time elapsed




[-- Attachment #3: perf-stat-3.18.txt --]
[-- Type: text/plain, Size: 3543 bytes --]

perf stat on md125_raid5 -- kernel 3.18

# perf stat -p 2778 -e cycles,instructions,cache-references,cache-misses,branches,branch-misses,bus-cycles,stalled-cycles-frontend,ref-cycles,cpu-clock,task-clock,faults,context-switches,cpu-migrations,minor-faults,major-faults,alignment-faults,emulation-faults,L1-dcache-load-misses,L1-dcache-store-misses,L1-dcache-prefetch-misses,L1-icache-load-misses,LLC-loads,LLC-stores,LLC-prefetches,dTLB-load-misses,dTLB-store-misses,iTLB-loads,iTLB-load-misses,branch-loads,branch-load-misses
^C
 Performance counter stats for process id '2778':

   191,212,778,981      cycles                    #    2.942 GHz                     [22.99%]
   160,318,628,367      instructions              #    0.84  insns per cycle        
                                                  #    0.77  stalled cycles per insn [27.49%]
     3,800,688,695      cache-references          #   58.485 M/sec                   [27.40%]
     1,418,431,693      cache-misses              #   37.320 % of all cache refs     [27.27%]
    33,635,552,951      branches                  #  517.586 M/sec                   [27.12%]
       352,264,516      branch-misses             #    1.05% of all branches         [27.19%]
     6,035,806,867      bus-cycles                #   92.879 M/sec                   [18.21%]
   122,980,401,285      stalled-cycles-frontend   #   64.32% frontend cycles idle    [18.16%]
   197,829,618,312      ref-cycles                # 3044.216 M/sec                   [22.72%]
      65039.738267      cpu-clock (msec)                                            
      64985.415568      task-clock (msec)         #    0.186 CPUs utilized          
                 0      faults                    #    0.000 K/sec                  
         3,437,945      context-switches          #    0.053 M/sec                  
               237      cpu-migrations            #    0.004 K/sec                  
                 0      minor-faults              #    0.000 K/sec                  
                 0      major-faults              #    0.000 K/sec                  
                 0      alignment-faults          #    0.000 K/sec                  
                 0      emulation-faults          #    0.000 K/sec                  
     5,329,711,939      L1-dcache-load-misses     #   82.014 M/sec                   [22.83%]
     2,138,400,107      L1-dcache-store-misses    #   32.906 M/sec                   [22.52%]
       667,646,968      L1-dcache-prefetch-misses #   10.274 M/sec                   [22.48%]
     2,259,425,830      L1-icache-load-misses     #   34.768 M/sec                   [22.45%]
     2,090,596,777      LLC-loads                 #   32.170 M/sec                   [17.93%]
     1,679,287,271      LLC-stores                #   25.841 M/sec                   [18.04%]
     1,120,086,147      LLC-prefetches            #   17.236 M/sec                   [ 9.09%]
       465,142,622      dTLB-load-misses          #    7.158 M/sec                   [13.69%]
        26,672,298      dTLB-store-misses         #    0.410 M/sec                   [18.26%]
        66,723,475      iTLB-loads                #    1.027 M/sec                   [18.37%]
         9,736,729      iTLB-load-misses          #   14.59% of all iTLB cache hits  [18.43%]
    33,238,082,664      branch-loads              #  511.470 M/sec                   [18.44%]
       346,025,993      branch-load-misses        #    5.325 M/sec                   [18.46%]

     348.946853958 seconds time elapsed




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
       [not found]             ` <5487FD79.7000002@caviumnetworks.com>
@ 2015-01-06  9:49               ` Manish Awasthi
  2015-01-07 10:52                 ` Manish Awasthi
  0 siblings, 1 reply; 8+ messages in thread
From: Manish Awasthi @ 2015-01-06  9:49 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi Neil,

Any findings on some of the logs I shared earlier?

Thanks in advance for reply. I'm having trouble booting 3.12 kernel, 
should probably sort it out soon and come back with results.

Manish

On 12/10/2014 01:29 PM, Manish Awasthi wrote:
> Here is the perf report for the tests run on 3.6-11 and 3.18. 
> Compating both the results, it just appears that raid in older version 
> is busier than it is with the latest version. I will also monitor the 
> system activity via `perf top` now. Also, I should be back with 
> results on 3.12 by the weekend
>
> Manish
>
> On 12/09/2014 01:56 PM, Manish Awasthi wrote:
>> this time with attachment:
>>
>> manish
>> On 12/09/2014 01:54 PM, Manish Awasthi wrote:
>>> resending:
>>>
>>>  dirty_ratio same for both the kernels.
>>>>
>>>> vm.dirty_background_bytes = 0
>>>> vm.dirty_background_ratio = 10
>>>> vm.dirty_bytes = 0
>>>> vm.dirty_expire_centisecs = 3000
>>>> vm.dirty_ratio = 20
>>>> vm.dirty_writeback_centisecs = 500
>>>>
>>>>
>>>> I re-ran the tests with the same set of kernel without enabling 
>>>> multithread support on 3.18 and measured a few things with perf.
>>>>
>>>> perf-stat-<kernel>.txt: test ran for some time and measured various 
>>>> parameters.
>>>>
>>>> Meanwhile I'm also running complete test under perf record. I'll 
>>>> share the results soon.
>>>>
>>>> Manish
>>>>
>>>> On 12/03/2014 11:51 AM, NeilBrown wrote:
>>>>> On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
>>>>> <manish.awasthi@caviumnetworks.com>  wrote:
>>>>>
>>>>>> Whatever data I have on comparison is attached, I have 
>>>>>> consolidated this
>>>>>> from log files to excel. See if this helps.
>>>>> raid_3_18_performance.xls shows read throughput to be consistently 
>>>>> 20% down
>>>>> on 3.18 compared to 3.6.11.
>>>>>
>>>>> Writes are a few percent better for 4G/8G files, 20% better for 
>>>>> 16G/32G files.
>>>>> unchanged above that.
>>>>> Given that you have 8G of RAM, that seems like it could be some 
>>>>> change in
>>>>> caching behaviour, and not necessarily a change in RAID behaviour.
>>>>>
>>>>> The CPU utilization roughly follows the throughput: 40% higher 
>>>>> when write
>>>>> throughput is 20% better.
>>>>> Could you check if the value of /proc/sys/vm/dirty_ratio is the 
>>>>> same for both
>>>>> tests.  That number has changed occasionally and could affect 
>>>>> these tests.
>>>>>
>>>>>
>>>>> The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" numbers
>>>>> negative where I expected positive.. i.e. negative mean an increase.
>>>>>
>>>>> Writes consistently have higher CPU utilisation.
>>>>> Reads consistently have much lower CPU utilization.
>>>>>
>>>>> I don't know what that means ... it might not mean anything.
>>>>>
>>>>> Could you please run the tests between the two kernels *with* 
>>>>> RAID.  i.e.
>>>>> directly on an SSD.  That will give us a baseline for what changes 
>>>>> are caused
>>>>> by other parts of the kernel (filesystem, block layer, MM, etc).  
>>>>> Then we can
>>>>> see how much change RAID5 is contributing.
>>>>>
>>>>> The third file, 3SSDs-perf-4Core.xls seems to show significantly 
>>>>> reduced
>>>>> throughput across the board.
>>>>> CPU utilization is less (better) for writes, but worse for reads.  
>>>>> That is
>>>>> the reverse of what the second file shows.
>>>>>
>>>>> I might try running some tests across a set of kernel versions and 
>>>>> see what I
>>>>> can come up with.
>>>>>
>>>>> NeilBrown
>>>>
>>>
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: md raid performance with 3-18-rc3
  2015-01-06  9:49               ` Manish Awasthi
@ 2015-01-07 10:52                 ` Manish Awasthi
  0 siblings, 0 replies; 8+ messages in thread
From: Manish Awasthi @ 2015-01-07 10:52 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4060 bytes --]

Here are the results with 3.12 kernel and except the better cpu 
utilization for writes, rest of the numbers are lower than those of 
3.6.11. Please let me know if there is any particular area that I can 
check. Since 3.12 introduced multithread raid for the first time, does 
it make sense to go further down the kernel revision to rule in/out the 
multithread support to be the cause of the degradation in throughput?


Manish

On 01/06/2015 03:19 PM, Manish Awasthi wrote:
> Hi Neil,
>
> Any findings on some of the logs I shared earlier?
>
> Thanks in advance for reply. I'm having trouble booting 3.12 kernel, 
> should probably sort it out soon and come back with results.
>
> Manish
>
> On 12/10/2014 01:29 PM, Manish Awasthi wrote:
>> Here is the perf report for the tests run on 3.6-11 and 3.18. 
>> Compating both the results, it just appears that raid in older 
>> version is busier than it is with the latest version. I will also 
>> monitor the system activity via `perf top` now. Also, I should be 
>> back with results on 3.12 by the weekend
>>
>> Manish
>>
>> On 12/09/2014 01:56 PM, Manish Awasthi wrote:
>>> this time with attachment:
>>>
>>> manish
>>> On 12/09/2014 01:54 PM, Manish Awasthi wrote:
>>>> resending:
>>>>
>>>>  dirty_ratio same for both the kernels.
>>>>>
>>>>> vm.dirty_background_bytes = 0
>>>>> vm.dirty_background_ratio = 10
>>>>> vm.dirty_bytes = 0
>>>>> vm.dirty_expire_centisecs = 3000
>>>>> vm.dirty_ratio = 20
>>>>> vm.dirty_writeback_centisecs = 500
>>>>>
>>>>>
>>>>> I re-ran the tests with the same set of kernel without enabling 
>>>>> multithread support on 3.18 and measured a few things with perf.
>>>>>
>>>>> perf-stat-<kernel>.txt: test ran for some time and measured 
>>>>> various parameters.
>>>>>
>>>>> Meanwhile I'm also running complete test under perf record. I'll 
>>>>> share the results soon.
>>>>>
>>>>> Manish
>>>>>
>>>>> On 12/03/2014 11:51 AM, NeilBrown wrote:
>>>>>> On Wed, 26 Nov 2014 13:41:39 +0530 Manish Awasthi
>>>>>> <manish.awasthi@caviumnetworks.com>  wrote:
>>>>>>
>>>>>>> Whatever data I have on comparison is attached, I have 
>>>>>>> consolidated this
>>>>>>> from log files to excel. See if this helps.
>>>>>> raid_3_18_performance.xls shows read throughput to be 
>>>>>> consistently 20% down
>>>>>> on 3.18 compared to 3.6.11.
>>>>>>
>>>>>> Writes are a few percent better for 4G/8G files, 20% better for 
>>>>>> 16G/32G files.
>>>>>> unchanged above that.
>>>>>> Given that you have 8G of RAM, that seems like it could be some 
>>>>>> change in
>>>>>> caching behaviour, and not necessarily a change in RAID behaviour.
>>>>>>
>>>>>> The CPU utilization roughly follows the throughput: 40% higher 
>>>>>> when write
>>>>>> throughput is 20% better.
>>>>>> Could you check if the value of /proc/sys/vm/dirty_ratio is the 
>>>>>> same for both
>>>>>> tests.  That number has changed occasionally and could affect 
>>>>>> these tests.
>>>>>>
>>>>>>
>>>>>> The second file, 3SSDs-perf-2-Cores-3.18-rc1 has the "change" 
>>>>>> numbers
>>>>>> negative where I expected positive.. i.e. negative mean an increase.
>>>>>>
>>>>>> Writes consistently have higher CPU utilisation.
>>>>>> Reads consistently have much lower CPU utilization.
>>>>>>
>>>>>> I don't know what that means ... it might not mean anything.
>>>>>>
>>>>>> Could you please run the tests between the two kernels *with* 
>>>>>> RAID.  i.e.
>>>>>> directly on an SSD.  That will give us a baseline for what 
>>>>>> changes are caused
>>>>>> by other parts of the kernel (filesystem, block layer, MM, etc).  
>>>>>> Then we can
>>>>>> see how much change RAID5 is contributing.
>>>>>>
>>>>>> The third file, 3SSDs-perf-4Core.xls seems to show significantly 
>>>>>> reduced
>>>>>> throughput across the board.
>>>>>> CPU utilization is less (better) for writes, but worse for 
>>>>>> reads.  That is
>>>>>> the reverse of what the second file shows.
>>>>>>
>>>>>> I might try running some tests across a set of kernel versions 
>>>>>> and see what I
>>>>>> can come up with.
>>>>>>
>>>>>> NeilBrown
>>>>>
>>>>
>>>
>>
>


[-- Attachment #2: 3.6.11-vs-3.12.xls --]
[-- Type: application/vnd.ms-excel, Size: 29696 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-01-07 10:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-24  8:10 md raid performance with 3-18-rc3 Manish Awasthi
2014-11-25  2:37 ` NeilBrown
     [not found]   ` <54758B3B.5080907@caviumnetworks.com>
2014-12-03  5:19     ` NeilBrown
2014-12-03  6:21     ` NeilBrown
     [not found]       ` <5486B15C.8060109@caviumnetworks.com>
2014-12-09  8:24         ` Manish Awasthi
2014-12-09  8:26           ` Manish Awasthi
     [not found]             ` <5487FD79.7000002@caviumnetworks.com>
2015-01-06  9:49               ` Manish Awasthi
2015-01-07 10:52                 ` Manish Awasthi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).