From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51106) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X0Uqe-0000TP-0Q for qemu-devel@nongnu.org; Fri, 27 Jun 2014 08:01:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1X0UqX-0002vH-Q8 for qemu-devel@nongnu.org; Fri, 27 Jun 2014 08:01:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12547) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1X0UqX-0002vA-En for qemu-devel@nongnu.org; Fri, 27 Jun 2014 08:01:37 -0400 Date: Fri, 27 Jun 2014 14:01:29 +0200 From: Stefan Hajnoczi Message-ID: <20140627120129.GO12061@stefanha-thinkpad.muc.redhat.com> References: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="DRfr/2Y1Zz/5r+Kb" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [regression] dataplane: throughout -40% by commit 580b6b2aa2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei Cc: Kevin Wolf , Paolo Bonzini , Fam Zheng , qemu-devel@nongnu.org, "Michael S. Tsirkin" --DRfr/2Y1Zz/5r+Kb Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Jun 26, 2014 at 11:14:16PM +0800, Ming Lei wrote: > Hi Stefan, >=20 > I found VM block I/O thoughput is decreased by more than 40% > on my laptop, and looks much worsen in my server environment, > and it is caused by your commit 580b6b2aa2: >=20 > dataplane: use the QEMU block layer for I/O >=20 > I run fio with below config to test random read: >=20 > [global] > direct=3D1 > size=3D4G > bsrange=3D4k-4k > timeout=3D20 > numjobs=3D4 > ioengine=3Dlibaio > iodepth=3D64 > filename=3D/dev/vdc > group_reporting=3D1 >=20 > [f] > rw=3Drandread >=20 > Together with throughput drop, the latency is improved a little. >=20 > With this commit, I/O block submitted to fs becomes much smaller > than before, and more io_submit() need to be called to kernel, that > means iodepth may become much less. >=20 > I am not surprised with the result since I did compare VM I/O > performance between qemu and lkvm before, which has no big qemu > lock problem and handle I/O in a dedicated thread, but lkvm's block > IO is still much worse than qemu from view of throughput, because > lkvm doesn't submit block I/O at batch like the way of previous > dataplane, IMO. >=20 > But now you change the way of submitting I/O, could you share > the motivation about the change? Is the throughput drop you expect? Thanks for reporting this. 40% is a serious regression. We were expecting a regression since the custom Linux AIO codepath has been replaced with the QEMU block layer (which offers features like image formats, snapshots, I/O throttling). Let me know if you get stuck working on a patch. Implementing batching sounds like a good idea. I never measured the impact when I wrote the ioq code, it just seemed like a natural way to structure the code. Hopefully this 40% number is purely due to batching and we can get most of the performance back. Stefan --DRfr/2Y1Zz/5r+Kb Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJTrV0ZAAoJEJykq7OBq3PI7o4H/00usiFHLR0RG7xjwK9YWqkt HoTHuyJOdgo0vVQT4pCF8Z8r+gsHHo6poO7R4vAeQfoPVNzzttaI7FRg+spRynYv lF2qIrnR7X6zZ8RtvPOKuE/+RpxxC1NcpkAYGa/IIgxoRl/NLarWNLNk0vA8c+H7 WA55hoOslZoqnmhCxYFOAh/6wtPokuCytKHXCndXGh0gtzqu/3LNUFfn3ttyOqIS UKrlmcKFc+KmN5o+MnL2vWSpXqUp3TtNtAkAIBZsRtxHJXg5O6ifv9Rw1B4WtJKB lWqfIdcAoyINSJ7W0O69pyqTW3PdRr4amr8wkL6SgRb7eBNi6xfoD075aCGUlbM= =pl4A -----END PGP SIGNATURE----- --DRfr/2Y1Zz/5r+Kb--