From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38153) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1frnDS-0000Mp-GZ for qemu-devel@nongnu.org; Mon, 20 Aug 2018 12:39:45 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1frnDQ-0003Kb-Dx for qemu-devel@nongnu.org; Mon, 20 Aug 2018 12:39:41 -0400 References: <20180807174311.32454-1-vsementsov@virtuozzo.com> <13910182-771b-c5dc-26a7-0958a7241fe8@redhat.com> <6c318533-dc87-daeb-1fe8-6b11b0cbec8d@virtuozzo.com> <747506be-aceb-0ab8-a4ee-c79f9a6b929a@redhat.com> From: Max Reitz Message-ID: <60e47db0-873a-56e0-4c28-faa44896526f@redhat.com> Date: Mon, 20 Aug 2018 18:39:26 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="zHtGEPGEME2xbqaNFg3GGv0vCo4qNQee4" Subject: Re: [Qemu-devel] [PATCH 0/7] qcow2: async handling of fragmented io List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: kwolf@redhat.com, den@openvz.org This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --zHtGEPGEME2xbqaNFg3GGv0vCo4qNQee4 From: Max Reitz To: Vladimir Sementsov-Ogievskiy , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: kwolf@redhat.com, den@openvz.org Message-ID: <60e47db0-873a-56e0-4c28-faa44896526f@redhat.com> Subject: Re: [PATCH 0/7] qcow2: async handling of fragmented io References: <20180807174311.32454-1-vsementsov@virtuozzo.com> <13910182-771b-c5dc-26a7-0958a7241fe8@redhat.com> <6c318533-dc87-daeb-1fe8-6b11b0cbec8d@virtuozzo.com> <747506be-aceb-0ab8-a4ee-c79f9a6b929a@redhat.com> In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2018-08-20 18:33, Vladimir Sementsov-Ogievskiy wrote: > 17.08.2018 22:34, Max Reitz wrote: >> On 2018-08-16 15:58, Vladimir Sementsov-Ogievskiy wrote: >>> 16.08.2018 03:51, Max Reitz wrote: >>>> On 2018-08-07 19:43, Vladimir Sementsov-Ogievskiy wrote: >>>>> Hi all! >>>>> >>>>> Here is an asynchronous scheme for handling fragmented qcow2 >>>>> reads and writes. Both qcow2 read and write functions loops through= >>>>> sequential portions of data. The series aim it to parallelize these= >>>>> loops iterations. >>>>> >>>>> It improves performance for fragmented qcow2 images, I've tested it= >>>>> as follows: >>>>> >>>>> I have four 4G qcow2 images (with default 64k block size) on my ssd= >>>>> disk: >>>>> t-seq.qcow2 - sequentially written qcow2 image >>>>> t-reverse.qcow2 - filled by writing 64k portions from end to the st= art >>>>> t-rand.qcow2 - filled by writing 64k portions (aligned) in random >>>>> order >>>>> t-part-rand.qcow2 - filled by shuffling order of 64k writes in 1m >>>>> clusters >>>>> (see source code of image generation in the end for details) >>>>> >>>>> and the test (sequential io by 1mb chunks): >>>>> >>>>> test write: >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 for t in /ssd/t-*; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 do sync; echo 1 > = /proc/sys/vm/drop_caches; echo =3D=3D=3D=C2=A0 $t=C2=A0 >>>>> =3D=3D=3D; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ./qemu-img bench -= c 4096 -d 1 -f qcow2 -n -s 1m -t none -w >>>>> $t; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 done >>>>> >>>>> test read (same, just drop -w parameter): >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 for t in /ssd/t-*; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 do sync; echo 1 > = /proc/sys/vm/drop_caches; echo =3D=3D=3D=C2=A0 $t=C2=A0 >>>>> =3D=3D=3D; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ./qemu-img bench -= c 4096 -d 1 -f qcow2 -n -s 1m -t none $t; \ >>>>> =C2=A0=C2=A0=C2=A0=C2=A0 done >>>>> >>>>> short info about parameters: >>>>> =C2=A0=C2=A0 -w - do writes (otherwise do reads) >>>>> =C2=A0=C2=A0 -c - count of blocks >>>>> =C2=A0=C2=A0 -s - block size >>>>> =C2=A0=C2=A0 -t none - disable cache >>>>> =C2=A0=C2=A0 -n - native aio >>>>> =C2=A0=C2=A0 -d 1 - don't use parallel requests provided by qemu-im= g bench >>>>> itself >>>> Hm, actually, why not?=C2=A0 And how does a guest behave? >>>> >>>> If parallel requests on an SSD perform better, wouldn't a guest issu= e >>>> parallel requests to the virtual device and thus to qcow2 anyway? >>> Guest knows nothing about qcow2 fragmentation, so this kind of >>> "asynchronization" could be done only at qcow2 level. >> Hm, yes.=C2=A0 I'm sorry, but without having looked closer at the seri= es >> (which is why I'm sorry in advance), I would suspect that the >> performance improvement comes from us being able to send parallel >> requests to an SSD. >> >> So if you send large requests to an SSD, you may either send them in >> parallel or sequentially, it doesn't matter.=C2=A0 But for small reque= sts, >> it's better to send them in parallel so the SSD always has requests in= >> its queue. >> >> I would think this is where the performance improvement comes from.=C2= =A0 But >> I would also think that a guest OS knows this and it would also send >> many requests in parallel so the virtual block device never runs out o= f >> requests. >> >>> However, if guest do async io, send a lot of parallel requests, it >>> behave like qemu-img without -d 1 option, and in this case, >>> parallel loop iterations in qcow2 doesn't have such great sense. >>> However, I think that async parallel requests are better in >>> general than sequential, because if device have some unused opportuni= ty >>> of parallelization, it will be utilized. >> I agree that it probably doesn't make things worse performance-wise, b= ut >> it's always added complexity (see the diffstat), which is why I'm just= >> routinely asking how useful it is in practice. :-) >> >> Anyway, I suspect there are indeed cases where a guest doesn't send ma= ny >> requests in parallel but it makes sense for the qcow2 driver to >> parallelize it.=C2=A0 That would be mainly when the guest reads seemin= gly >> sequential data that is then fragmented in the qcow2 file.=C2=A0 So ba= sically >> what your benchmark is testing. :-) >> >> Then, the guest could assume that there is no sense in parallelizing i= t >> because the latency from the device is large enough, whereas in qemu >> itself we always run dry and wait for different parts of the single >> large request to finish.=C2=A0 So, yes, in that case, parallelization = that's >> internal to qcow2 would make sense. >> >> Now another question is, does this negatively impact devices where >> seeking is slow, i.e. HDDs?=C2=A0 Unfortunately I'm not home right now= , so I >> don't have access to an HDD to test myself... >=20 >=20 > hdd: >=20 > +-----------+-----------+----------+-----------+----------+ > |=C2=A0=C2=A0 file=C2=A0=C2=A0=C2=A0 | wr before | wr after | rd before= | rd after | > +-----------+-----------+----------+-----------+----------+ > | seq=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 39.821 |=C2= =A0=C2=A0 40.513 |=C2=A0=C2=A0=C2=A0 38.600 |=C2=A0=C2=A0 38.916 | > | reverse=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 60.320 |=C2=A0=C2=A0 57.902 |= =C2=A0=C2=A0=C2=A0 98.223 |=C2=A0 111.717 | > | rand=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 614.826 |=C2=A0 580.= 452 |=C2=A0=C2=A0 672.600 |=C2=A0 465.120 | > | part-rand |=C2=A0=C2=A0=C2=A0 52.311 |=C2=A0=C2=A0 52.450 |=C2=A0=C2=A0= =C2=A0 37.663 |=C2=A0=C2=A0 37.989 | > +-----------+-----------+----------+-----------+----------+ >=20 > hmm. 10% degradation on "reverse" case, strange magic.. However reverse= > is near to impossible. I tend to agree. It's faster for random, and that's what matters more. (Distinguishing between the cases in qcow2 seems like not so good of an idea, and making it user-configurable is probably pointless because noone will change the default.) Max --zHtGEPGEME2xbqaNFg3GGv0vCo4qNQee4 Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEkb62CjDbPohX0Rgp9AfbAGHVz0AFAlt67r4ACgkQ9AfbAGHV z0D1WwgAmcKajYpTNrgOAV+ramiDaWGKSe7YBE+hF5gGZgAQFdS2mhbLkkSc+OPW /2/UnACMf90IznkBHIe7MtkMvcbDDCrDF/E3WS3oidhoJAuk2hNogPpAmQkXcDjV OmGI3kfFSM7WUn/LW9DY3ed1PjQopuLYtfP2b2xsiJrAyRT6GqI/pauZ2AWx6C1x yhwP054ja1vBUCNUvZ1AhtBOS+sZLEfVk03zX8J3l4Z3GnjjVlaiQBi7AHROrIHG gz1ALx34vIWLRX0HgrqyCxCpbbHG3rsLihnp5aQ85qST1fC6ZW9uh2kCVJOYjFww 3demhQS61aN8bSanJqh67Ea5Nct78g== =ZN3a -----END PGP SIGNATURE----- --zHtGEPGEME2xbqaNFg3GGv0vCo4qNQee4--