From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: IO speed limited by size of IO request (for RBD driver) Date: Fri, 24 May 2013 10:29:38 -0400 Message-ID: <20130524142938.GI3900@phenom.dumpdata.com> References: <517EC975.7030807@crc.id.au> <517ECE64.6000503@crc.id.au> <9F2C4E7DFB7839489C89757A66C5AD620E57EA@LONPEX01CL03.citrite.net> <518A0AB8.90506@crc.id.au> <518A0DC8.4080501@citrix.com> <518A29DA.3080501@crc.id.au> <518A2CB3.7090106@citrix.com> <9F2C4E7DFB7839489C89757A66C5AD620F1713@LONPEX01CL03.citrite.net> <20130522201308.GB12372@phenom.dumpdata.com> <44C85321-F0BF-45DF-AD44-F02EE9A2391B@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Content-Disposition: inline In-Reply-To: <44C85321-F0BF-45DF-AD44-F02EE9A2391B@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Felipe Franciosi Cc: Roger Pau Monne , Steven Haigh , "xen-devel@lists.xen.org" List-Id: xen-devel@lists.xenproject.org On Thu, May 23, 2013 at 07:22:27AM +0000, Felipe Franciosi wrote: > = > = > On 22 May 2013, at 21:13, "Konrad Rzeszutek Wilk" wrote: > = > > On Wed, May 08, 2013 at 11:14:26AM +0000, Felipe Franciosi wrote: > >> However we didn't "prove" it properly, I think it is worth mentioning = that this boils down to what we originally thought it was: > >> Steven's environment is writing to a filesystem in the guest. On top o= f that, it's using the guest's buffer cache to do the writes. > > = > > If he is using O_DIRECT it bypasses the cache in the guest. > = > Certainly, but the issues were when _not_ using O_DIRECT. I am confused. Are the feature-indirect-descriptor making it worst or bette= r when !O_DIRECT? Or are there no difference when using !O_DIRECT with the feature-indirect-d= escriptor? > = > F > = > = > > = > >> This means that we cannot (easily?) control how the cache and the fs a= re flushing these writes through blkfront/blkback. echo 3 > /proc/..something/drop_cache does it? > >> = > >> In other words, it's very likely that it generates a workload that sim= ply doesn't perform well on the "stock" PV protocol. 'fio' is an excellent tool to run the tests without using the cache. > >> This is a good example of how indirect descriptors help (remembering R= oger and I were struggling to find use cases where indirect descriptors sho= wed a substantial gain). You mean using the O_DIRECT? Yes, all tests that involve any I/O should use= O_DIRECT. Otherwise they are misleading. And my understanding from this thread that S= teven did that and found that: a) without the feature-indirect-descriptor - the I/O was sucky b) with the initial feature-indirect-descriptior - the I/O was less sucky c) with the feature-indirect-descriptor and a tweak to the frontend of how= mant segments to use - the I/O was the same as on baremetal. Sorry about being soo verbose here - I feel that I am missing something her= e and I am not exactly sure what this is. Could you please enlighten me? > >> = > >> Cheers, > >> Felipe > >> = > >> -----Original Message----- > >> From: Roger Pau Monne = > >> Sent: 08 May 2013 11:45 > >> To: Steven Haigh > >> Cc: Felipe Franciosi; xen-devel@lists.xen.org > >> Subject: Re: IO speed limited by size of IO request (for RBD driver) > >> = > >> On 08/05/13 12:32, Steven Haigh wrote: > >>> On 8/05/2013 6:33 PM, Roger Pau Monn=E9 wrote: > >>>> On 08/05/13 10:20, Steven Haigh wrote: > >>>>> On 30/04/2013 8:07 PM, Felipe Franciosi wrote: > >>>>>> I noticed you copied your results from "dd", but I didn't see any = conclusions drawn from experiment. > >>>>>> = > >>>>>> Did I understand it wrong or now you have comparable performance o= n dom0 and domU when using DIRECT? > >>>>>> = > >>>>>> domU: > >>>>>> # dd if=3D/dev/zero of=3Doutput.zero bs=3D1M count=3D2048 oflag=3D= direct > >>>>>> 2048+0 records in > >>>>>> 2048+0 records out > >>>>>> 2147483648 bytes (2.1 GB) copied, 25.4705 s, 84.3 MB/s > >>>>>> = > >>>>>> dom0: > >>>>>> # dd if=3D/dev/zero of=3Doutput.zero bs=3D1M count=3D2048 oflag=3D= direct > >>>>>> 2048+0 records in > >>>>>> 2048+0 records out > >>>>>> 2147483648 bytes (2.1 GB) copied, 24.8914 s, 86.3 MB/s > >>>>>> = > >>>>>> = > >>>>>> I think that if the performance differs when NOT using DIRECT, the= issue must be related to the way your guest is flushing the cache. This mu= st be generating a workload that doesn't perform well on Xen's PV protocol. > >>>>> = > >>>>> Just wondering if there is any further input on this... While DIREC= T = > >>>>> writes are as good as can be expected, NON-DIRECT writes in certain = > >>>>> cases (specifically with a mdadm raid in the Dom0) are affected by = > >>>>> about a 50% loss in throughput... > >>>>> = > >>>>> The hard part is that this is the default mode of writing! > >>>> = > >>>> As another test with indirect descriptors, could you change = > >>>> xen_blkif_max_segments in xen-blkfront.c to 128 (it is 32 by = > >>>> default), recompile the DomU kernel and see if that helps? > >>> = > >>> Ok, here we go.... compiled as 3.8.0-2 with the above change. 3.8.0-2 = > >>> is running on both the Dom0 and DomU. > >>> = > >>> # dd if=3D/dev/zero of=3Doutput.zero bs=3D1M count=3D2048 > >>> 2048+0 records in > >>> 2048+0 records out > >>> 2147483648 bytes (2.1 GB) copied, 22.1703 s, 96.9 MB/s > >>> = > >>> avg-cpu: %user %nice %system %iowait %steal %idle > >>> 0.34 0.00 17.10 0.00 0.23 82.33 > >>> = > >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s = > >>> avgrq-sz avgqu-sz await svctm %util > >>> sdd 980.97 11936.47 53.11 429.78 4.00 48.77 = > >>> 223.81 12.75 26.10 2.11 101.79 > >>> sdc 872.71 11957.87 45.98 435.67 3.55 49.30 = > >>> 224.71 13.77 28.43 2.11 101.49 > >>> sde 949.26 11981.88 51.30 429.33 3.91 48.90 = > >>> 225.03 21.29 43.91 2.27 109.08 > >>> sdf 915.52 11968.52 48.58 428.88 3.73 48.92 = > >>> 225.84 21.44 44.68 2.27 108.56 > >>> md2 0.00 0.00 0.00 1155.61 0.00 97.51 = > >>> 172.80 0.00 0.00 0.00 0.00 > >>> = > >>> # dd if=3D/dev/zero of=3Doutput.zero bs=3D1M count=3D2048 oflag=3Ddir= ect > >>> 2048+0 records in > >>> 2048+0 records out > >>> 2147483648 bytes (2.1 GB) copied, 25.3708 s, 84.6 MB/s > >>> = > >>> avg-cpu: %user %nice %system %iowait %steal %idle > >>> 0.11 0.00 13.92 0.00 0.22 85.75 > >>> = > >>> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s = > >>> avgrq-sz avgqu-sz await svctm %util > >>> sdd 0.00 13986.08 0.00 263.20 0.00 55.76 = > >>> 433.87 0.43 1.63 1.07 28.27 > >>> sdc 202.10 13741.55 6.52 256.57 0.81 54.77 = > >>> 432.65 0.50 1.88 1.25 32.78 > >>> sde 47.96 11437.57 1.55 261.77 0.19 45.79 = > >>> 357.63 0.80 3.02 1.85 48.60 > >>> sdf 2233.37 11756.13 71.93 191.38 8.99 46.80 = > >>> 433.90 1.49 5.66 3.27 86.15 > >>> md2 0.00 0.00 0.00 731.93 0.00 91.49 = > >>> 256.00 0.00 0.00 0.00 0.00 > >>> = > >>> Now this is pretty much exactly what I would expect the system to do.= ... = > >>> ~96MB/sec buffered, and 85MB/sec direct. > >> = > >> I'm sorry to be such a PITA, but could you also try with 64? If we hav= e to increase the maximum number of indirect descriptors I would like to se= t it to the lowest value that provides good performance to prevent using to= o much memory. > >> = > >>> So - it turns out that xen_blkif_max_segments at 32 is a killer in th= e = > >>> DomU. Now it makes me wonder what we can do about this in kernels tha= t = > >>> don't have your series of patches against it? And also about the = > >>> backend stuff in 3.8.x etc? > >> = > >> There isn't much we can do regarding kernels without indirect descript= ors, there's no easy way to increase the number of segments in a request. > >> = > >> = > >> _______________________________________________ > >> Xen-devel mailing list > >> Xen-devel@lists.xen.org > >> http://lists.xen.org/xen-devel > >> = > = > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > =