From mboxrd@z Thu Jan 1 00:00:00 1970 From: Steven Haigh Subject: Re: Xen disk write slowness in kernel 3.8.x Date: Thu, 04 Apr 2013 01:48:26 +1100 Message-ID: <515C413A.2090808@crc.id.au> References: <515B8E7B.8070104@crc.id.au> <9F2C4E7DFB7839489C89757A66C5AD620899E9@LONPEX01CL03.citrite.net> <515C1592.1090102@crc.id.au> <9F2C4E7DFB7839489C89757A66C5AD62089BC8@LONPEX01CL03.citrite.net> <515C2E21.2010903@crc.id.au> <9F2C4E7DFB7839489C89757A66C5AD62089EDF@LONPEX01CL03.citrite.net> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; Format="flowed" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: <9F2C4E7DFB7839489C89757A66C5AD62089EDF@LONPEX01CL03.citrite.net> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Felipe Franciosi Cc: "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On 04/04/13 01:30, Felipe Franciosi wrote: > I think you should use i/oflag=direct on your "dd"s to bypass the buffer cache in the guest. Might provide more consistent results. Ah - you are correct. I usually don't forget that. This has however discovered something interesting... Entirely from the DomU: # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 45.2753 s, 94.9 MB/s # rm /mnt/fileshare/output.zero # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 80.5808 s, 53.3 MB/s I did this 3-4 times with the same trend... # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 oflag=direct 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 40.693 s, 106 MB/s # rm /mnt/fileshare/output.zero # sync # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 4096+0 records in 4096+0 records out 4294967296 bytes (4.3 GB) copied, 82.1511 s, 52.3 MB/s So, when the DomU is using write buffers, write speed is halved. This I don't understand. I haven't tweaked the array at all yet (as I've just been experimenting with this problem) - so with a bit of time, I can probably even increase its write speed some more - but without this speed decrease solved, it isn't really worth it yet... > The result of oprofile is intriguing. Did you have a chance to try Roger's persistent grant implementation? > > He mentions it on his 08/03 e-mail. It's here: > git://xenbits.xen.org/people/royger/linux.git xen-block-indirect I can't say I have - Although I package Xen for EL6, we're getting to the edge of my knowledge here. > > Felipe > > -----Original Message----- > From: Steven Haigh [mailto:netwiz@crc.id.au] > Sent: 03 April 2013 14:27 > To: xen-devel@lists.xen.org > Cc: Felipe Franciosi > Subject: Re: Xen disk write slowness in kernel 3.8.x > > On 03/04/13 23:29, Felipe Franciosi wrote: >> Do you know the size of your requests? >> You used "iostat -m". Perhaps "iostat -xm" could be more meaningful as it will tell you the average request size in sectors. > > Good call. I started just a sequential write with dd from /dev/zero on the DomU: > # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 91.269 s, 47.1 MB/s > > iostat -xm shows: > avg-cpu: %user %nice %system %iowait %steal %idle > 0.11 0.00 2.71 0.23 0.56 96.38 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await svctm %util > sdc 1208.40 5337.80 150.00 645.20 5.35 23.39 > 74.00 6.47 8.15 0.91 72.72 > sdf 1199.80 5350.80 148.00 643.40 5.30 23.34 > 74.12 6.88 8.68 0.96 76.06 > sdd 1203.40 5304.60 148.80 638.60 5.28 23.21 > 74.11 5.42 6.88 0.78 61.38 > sde 1213.80 5382.40 148.40 652.40 5.37 23.59 > 74.08 6.40 8.00 0.94 75.20 > md3 0.00 0.00 1.60 1312.20 0.05 42.68 > 66.60 0.00 0.00 0.00 0.00 > > Shutting down the DomU, mounting /dev/md3 in and doing this directly from the Dom0 shows: > # dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096 > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 44.2801 s, 97.0 MB/s > > details from iostat -xm: > avg-cpu: %user %nice %system %iowait %steal %idle > 0.00 0.00 2.16 0.00 0.57 97.27 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s > avgrq-sz avgqu-sz await svctm %util > sdc 1175.40 5220.60 143.40 633.60 5.13 22.85 > 73.76 6.01 7.73 0.87 67.60 > sdf 1175.20 5154.00 147.40 624.40 5.14 22.55 > 73.46 6.87 8.88 0.97 74.90 > sdd 1183.40 5133.80 145.40 625.20 5.19 22.50 > 73.60 5.19 6.73 0.77 59.60 > sde 1176.40 5229.60 146.00 637.00 5.16 22.99 > 73.62 7.39 9.51 0.99 77.90 > md3 0.00 0.00 0.60 1277.40 0.02 41.56 > 66.63 0.00 0.00 0.00 0.00 > > This about ties in with what bonnie++ gets as the write speed: > Version 1.96 ------Sequential Output------ --Sequential Input- > --Random- > Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- > --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP > /sec %CP > xenhost.lan.crc. 2G 635 91 97526 15 67864 16 952 97 295833 35 > 409.6 10 > > bonnie++ maxes out at about the same as dd in the DomU. > >> I still didn't understand your environment. I think this is the first time you mention NFS/SMB. > > /dev/md3 is passed through as a phy device to a DomU. This DomU runs both samba and NFS shares. It doesn't seem to matter what method is the source of the writes to the drive (dd, bonnie++, sending a file via NFS or Samba), the max write speed seems to be bottlenecking at ~50Mb/sec - which I believe is purely the write speed to the array via the DomU. > >> Weren't you testing copies from /dev/zero to your device? > > I have tested many ways - this being just one of them. > >> I'd also recommend you run "oprofile" to see if there are any time sinks that we are overlooking. You can find out how to use it here: >> http://wiki.xen.org/wiki/Xen_Profiling:_oprofile_and_perf > > I ran 'perf top' on the Dom0, then started a write (dd if=/dev/zero of=/mnt/fileshare/output.zero bs=1M count=4096) on the DomU. This is what I'd see as the relevant output: > > Samples: 574K of event 'cpu-clock', Event count (approx.): 60028 > 88.37% [kernel] [k] xen_hypercall_sched_op > 4.73% [kernel] [k] xen_hypercall_xen_version > 1.00% [kernel] [k] xen_hypercall_grant_table_op > 0.99% [raid456] [k] handle_stripe > 0.77% [raid6_pq] [k] raid6_sse24_gen_syndrome > 0.66% [raid456] [k] ops_run_io > 0.57% [kernel] [k] memcpy > 0.21% [kernel] [k] xen_restore_fl_direct > 0.18% [raid456] [k] raid5_end_write_request > 0.18% [raid456] [k] __raid_run_ops > 0.14% [kernel] [k] xen_hypercall_event_channel_op > 0.11% [kernel] [k] get_phys_to_machine > 0.09% [raid456] [k] schedule_reconstruction > > Now repeated, with the same on the DomU: > Samples: 300K of event 'cpu-clock', Event count (approx.): 84845 > > 97.63% [kernel] [k] hypercall_page > 0.44% [kernel] [k] copy_user_generic_string > 0.41% [kernel] [k] __clear_user > 0.08% [kernel] [k] __wake_up_bit > 0.07% [kernel] [k] xen_restore_fl_direct > 0.07% [kernel] [k] __mem_cgroup_commit_charge > > Now, interestingly enough... The root drive (xvda) of the DomU in question is on a separate RAID1. This is passed as an LV. > > # dd if=/dev/zero of=/output.bin bs=1M count=4096 > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 69.7036 s, 61.6 MB/s > > Then from the Dom0 to the same RAID1: > # dd if=/dev/zero of=output.zero bs=1M count=4096 > 4096+0 records in > 4096+0 records out > 4294967296 bytes (4.3 GB) copied, 48.4407 s, 88.7 MB/s > > I don't really know what else I can do to try and see where the slowdown is here - I am open to suggestions though :) > > -- > Steven Haigh > > Email: netwiz@crc.id.au > Web: https://www.crc.id.au > Phone: (03) 9001 6090 - 0412 935 897 > Fax: (03) 8338 0299 > -- Steven Haigh Email: netwiz@crc.id.au Web: https://www.crc.id.au Phone: (03) 9001 6090 - 0412 935 897 Fax: (03) 8338 0299