From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36485) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEf6M-00044B-54 for qemu-devel@nongnu.org; Tue, 05 Aug 2014 09:48:34 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XEf6G-0008U8-KD for qemu-devel@nongnu.org; Tue, 05 Aug 2014 09:48:30 -0400 Received: from mx1.redhat.com ([209.132.183.28]:7752) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XEf6G-0008Tw-Bb for qemu-devel@nongnu.org; Tue, 05 Aug 2014 09:48:24 -0400 Date: Tue, 5 Aug 2014 14:48:15 +0100 From: Stefan Hajnoczi Message-ID: <20140805134815.GD12251@stefanha-thinkpad.redhat.com> References: <1407209598-2572-1-git-send-email-ming.lei@canonical.com> <20140805094844.GF4391@noname.str.redhat.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="TybLhxa8M7aNoW+V" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH v1 00/17] dataplane: optimization and multi virtqueue support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei Cc: Kevin Wolf , Peter Maydell , Fam Zheng , "Michael S. Tsirkin" , qemu-devel , Paolo Bonzini --TybLhxa8M7aNoW+V Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Aug 05, 2014 at 06:00:22PM +0800, Ming Lei wrote: > On Tue, Aug 5, 2014 at 5:48 PM, Kevin Wolf wrote: > > Am 05.08.2014 um 05:33 hat Ming Lei geschrieben: > >> Hi, > >> > >> These patches bring up below 4 changes: > >> - introduce object allocation pool and apply it to > >> virtio-blk dataplane for improving its performance > >> > >> - introduce selective coroutine bypass mechanism > >> for improving performance of virtio-blk dataplane with > >> raw format image > > > > Before applying any bypassing patches, I think we should understand in > > detail where we are losing performance with coroutines enabled. >=20 > From the below profiling data, CPU becomes slow to run instructions > with coroutine, and CPU dcache miss is increased so it is very > likely caused by switching stack frequently. >=20 > http://marc.info/?l=3Dqemu-devel&m=3D140679721126306&w=3D2 >=20 > http://pastebin.com/ae0vnQ6V I have been wondering how to prove that the root cause is the ucontext coroutine mechanism (stack switching). Here is an idea: Hack your "bypass" code path to run the request inside a coroutine. That way you can compare "bypass without coroutine" against "bypass with coroutine". Right now I think there are doubts because the bypass code path is indeed a different (and not 100% correct) code path. So this approach might prove that the coroutines are adding the overhead and not something that you bypassed. Stefan --TybLhxa8M7aNoW+V Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJT4OCfAAoJEJykq7OBq3PIzJ0H/0DxPxCr6tMVshART66C38dL A4hx5QzslVuc1Yf0+uAsXtLm+PHDj7oow/Buh5/q1ouDNQU99I5dQnt19I/ctcpN m/tRL13EuwtlG28QSBq+beqXX2/AEbrTs2PRJBZiilDEcbLSAHDIHkVFxAngFMv6 tc1FA1EAA9wIzWTaCKCPs2ZOp4468Xba9BwCRwxSErYIE26JnWjoz9okl39b3CtY 9Pqu1WiyHCXXjj810fC1F6xfTnNQV2zWHc61IB1ioGM5y05yUn1BNOzz8LFV/Apx 62F8pR2hjC6fmKdpkcRFFAjzgtSGarFMd/+iD1EFLRNt1tMZ7j8iOaVbeXmH0U8= =jXoC -----END PGP SIGNATURE----- --TybLhxa8M7aNoW+V--