From mboxrd@z Thu Jan  1 00:00:00 1970
From: Steven Haigh <netwiz@crc.id.au>
Subject: Re: Xen disk write slowness in kernel 3.8.x
Date: Thu, 04 Apr 2013 01:48:26 +1100
Message-ID: <515C413A.2090808@crc.id.au>
References: <515B8E7B.8070104@crc.id.au>
	<9F2C4E7DFB7839489C89757A66C5AD620899E9@LONPEX01CL03.citrite.net>
	<515C1592.1090102@crc.id.au>
	<9F2C4E7DFB7839489C89757A66C5AD62089BC8@LONPEX01CL03.citrite.net>
	<515C2E21.2010903@crc.id.au>
	<9F2C4E7DFB7839489C89757A66C5AD62089EDF@LONPEX01CL03.citrite.net>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"; Format="flowed"
Content-Transfer-Encoding: 7bit
Return-path: <xen-devel-bounces@lists.xen.org>
In-Reply-To: <9F2C4E7DFB7839489C89757A66C5AD62089EDF@LONPEX01CL03.citrite.net>
List-Unsubscribe: <http://lists.xen.org/cgi-bin/mailman/options/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xen.org>
List-Help: <mailto:xen-devel-request@lists.xen.org?subject=help>
List-Subscribe: <http://lists.xen.org/cgi-bin/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xen.org?subject=subscribe>
Sender: xen-devel-bounces@lists.xen.org
Errors-To: xen-devel-bounces@lists.xen.org
To: Felipe Franciosi <felipe.franciosi@citrix.com>
Cc: "xen-devel@lists.xen.org" <xen-devel@lists.xen.org>
List-Id: xen-devel@lists.xenproject.org

On 04/04/13 01:30, Felipe Franciosi wrote:
> I think you should use i/oflag=direct on your "dd"s to bypass the buffer cache in the guest. Might provide more consistent results.

Ah - you are correct. I usually don't forget that. This has however 
discovered something interesting...

Entirely from the DomU:
# dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 
oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 45.2753 s, 94.9 MB/s
# rm /mnt/fileshare/output.zero
# dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 80.5808 s, 53.3 MB/s

I did this 3-4 times with the same trend...

# dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 
oflag=direct
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 40.693 s, 106 MB/s
# rm /mnt/fileshare/output.zero
# sync
# dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 82.1511 s, 52.3 MB/s

So, when the DomU is using write buffers, write speed is halved. This I 
don't understand.

I haven't tweaked the array at all yet (as I've just been experimenting 
with this problem) - so with a bit of time, I can probably even increase 
its write speed some more - but without this speed decrease solved, it 
isn't really worth it yet...

> The result of oprofile is intriguing. Did you have a chance to try Roger's persistent grant implementation?
>
> He mentions it on his 08/03 e-mail. It's here:
> git://xenbits.xen.org/people/royger/linux.git xen-block-indirect

I can't say I have - Although I package Xen for EL6, we're getting to 
the edge of my knowledge here.

>
> Felipe
>
> -----Original Message-----
> From: Steven Haigh [mailto:netwiz@crc.id.au]
> Sent: 03 April 2013 14:27
> To: xen-devel@lists.xen.org
> Cc: Felipe Franciosi
> Subject: Re: Xen disk write slowness in kernel 3.8.x
>
> On 03/04/13 23:29, Felipe Franciosi wrote:
>> Do you know the size of your requests?
>> You used "iostat -m". Perhaps "iostat -xm" could be more meaningful as it will tell you the average request size in sectors.
>
> Good call. I started just a sequential write with dd from /dev/zero on the DomU:
> # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 91.269 s, 47.1 MB/s
>
> iostat -xm shows:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.11    0.00    2.71    0.23    0.56   96.38
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sdc            1208.40  5337.80  150.00  645.20     5.35    23.39
> 74.00     6.47    8.15   0.91  72.72
> sdf            1199.80  5350.80  148.00  643.40     5.30    23.34
> 74.12     6.88    8.68   0.96  76.06
> sdd            1203.40  5304.60  148.80  638.60     5.28    23.21
> 74.11     5.42    6.88   0.78  61.38
> sde            1213.80  5382.40  148.40  652.40     5.37    23.59
> 74.08     6.40    8.00   0.94  75.20
> md3               0.00     0.00    1.60 1312.20     0.05    42.68
> 66.60     0.00    0.00   0.00   0.00
>
> Shutting down the DomU, mounting /dev/md3 in and doing this directly from the Dom0 shows:
> # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 44.2801 s, 97.0 MB/s
>
> details from iostat -xm:
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>              0.00    0.00    2.16    0.00    0.57   97.27
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
> avgrq-sz avgqu-sz   await  svctm  %util
> sdc            1175.40  5220.60  143.40  633.60     5.13    22.85
> 73.76     6.01    7.73   0.87  67.60
> sdf            1175.20  5154.00  147.40  624.40     5.14    22.55
> 73.46     6.87    8.88   0.97  74.90
> sdd            1183.40  5133.80  145.40  625.20     5.19    22.50
> 73.60     5.19    6.73   0.77  59.60
> sde            1176.40  5229.60  146.00  637.00     5.16    22.99
> 73.62     7.39    9.51   0.99  77.90
> md3               0.00     0.00    0.60 1277.40     0.02    41.56
> 66.63     0.00    0.00   0.00   0.00
>
> This about ties in with what bonnie++ gets as the write speed:
> Version  1.96       ------Sequential Output------ --Sequential Input-
> --Random-
> Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
> /sec %CP
> xenhost.lan.crc. 2G   635  91 97526  15 67864  16   952  97 295833  35
> 409.6  10
>
> bonnie++ maxes out at about the same as dd in the DomU.
>
>> I still didn't understand your environment. I think this is the first time you mention NFS/SMB.
>
> /dev/md3 is passed through as a phy device to a DomU. This DomU runs both samba and NFS shares. It doesn't seem to matter what method is the source of the writes to the drive (dd, bonnie++, sending a file via NFS or Samba), the max write speed seems to be bottlenecking at ~50Mb/sec - which I believe is purely the write speed to the array via the DomU.
>
>> Weren't you testing copies from /dev/zero to your device?
>
> I have tested many ways - this being just one of them.
>
>> I'd also recommend you run "oprofile" to see if there are any time sinks that we are overlooking. You can find out how to use it here:
>> http://wiki.xen.org/wiki/Xen_Profiling:_oprofile_and_perf
>
> I ran 'perf top' on the Dom0, then started a write (dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096) on the DomU. This is what I'd see as the relevant output:
>
> Samples: 574K of event 'cpu-clock', Event count (approx.): 60028
>    88.37%  [kernel]             [k] xen_hypercall_sched_op
>     4.73%  [kernel]             [k] xen_hypercall_xen_version
>     1.00%  [kernel]             [k] xen_hypercall_grant_table_op
>     0.99%  [raid456]            [k] handle_stripe
>     0.77%  [raid6_pq]           [k] raid6_sse24_gen_syndrome
>     0.66%  [raid456]            [k] ops_run_io
>     0.57%  [kernel]             [k] memcpy
>     0.21%  [kernel]             [k] xen_restore_fl_direct
>     0.18%  [raid456]            [k] raid5_end_write_request
>     0.18%  [raid456]            [k] __raid_run_ops
>     0.14%  [kernel]             [k] xen_hypercall_event_channel_op
>     0.11%  [kernel]             [k] get_phys_to_machine
>     0.09%  [raid456]            [k] schedule_reconstruction
>
> Now repeated, with the same on the DomU:
> Samples: 300K of event 'cpu-clock', Event count (approx.): 84845
>
>    97.63%  [kernel]            [k] hypercall_page
>     0.44%  [kernel]            [k] copy_user_generic_string
>     0.41%  [kernel]            [k] __clear_user
>     0.08%  [kernel]            [k] __wake_up_bit
>     0.07%  [kernel]            [k] xen_restore_fl_direct
>     0.07%  [kernel]            [k] __mem_cgroup_commit_charge
>
> Now, interestingly enough... The root drive (xvda) of the DomU in question is on a separate RAID1. This is passed as an LV.
>
> # dd if=/dev/zero of=/output.bin bs=1M count=4096
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 69.7036 s, 61.6 MB/s
>
> Then from the Dom0 to the same RAID1:
> # dd if=/dev/zero of=output.zero bs=1M count=4096
> 4096+0 records in
> 4096+0 records out
> 4294967296 bytes (4.3 GB) copied, 48.4407 s, 88.7 MB/s
>
> I don't really know what else I can do to try and see where the slowdown is here - I am open to suggestions though :)
>
> --
> Steven Haigh
>
> Email: netwiz@crc.id.au
> Web: https://www.crc.id.au
> Phone: (03) 9001 6090 - 0412 935 897
> Fax: (03) 8338 0299
>


-- 
Steven Haigh

Email: netwiz@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897
Fax: (03) 8338 0299