From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36985) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCmSv-000238-4G for qemu-devel@nongnu.org; Thu, 31 Jul 2014 05:16:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XCmSo-00070r-WD for qemu-devel@nongnu.org; Thu, 31 Jul 2014 05:16:01 -0400 Received: from mx1.redhat.com ([209.132.183.28]:42312) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XCmSo-00070Q-KK for qemu-devel@nongnu.org; Thu, 31 Jul 2014 05:15:54 -0400 Message-ID: <53DA0940.2030207@redhat.com> Date: Thu, 31 Jul 2014 11:15:44 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1406720388-18671-1-git-send-email-ming.lei@canonical.com> <1406720388-18671-2-git-send-email-ming.lei@canonical.com> <53D8F6F0.7040106@redhat.com> <53D981C0.4030708@redhat.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 01/15] qemu coroutine: support bypass mode List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ming Lei Cc: Kevin Wolf , Peter Maydell , Fam Zheng , "Michael S. Tsirkin" , qemu-devel , Stefan Hajnoczi Il 31/07/2014 10:59, Ming Lei ha scritto: >> > No guesses please. Actually that's also my guess, but since you are >> > submitting the patch you must do better and show profiles where stack >> > switching disappears after the patches. > Follows the below hardware events reported by 'perf stat' when running > fio randread benchmark for 2min in VM(single vq, 2 jobs): > > sudo ~/bin/perf stat -e > L1-dcache-loads,L1-dcache-load-misses,cpu-cycles,instructions,branch-instructions,branch-misses,branch-loads,branch-load-misses,dTLB-loads,dTLB-load-misses > ./nqemu-start-mq 4 1 > > 1), without bypassing coroutine via forcing to set 's->raw_format ' as > false, see patch 5/15 > > - throughout: 95K > 232,564,905,115 instructions > 161.991075781 seconds time elapsed > > > 2), with bypassing coroutinue > - throughput: 115K > 255,526,629,881 instructions > 162.333465490 seconds time elapsed Ok, so you are saving 10% instructions per iop: before 232G / 95K = 2.45M instructions/iop, 255G / 115K = 2.22M instructions/iop. That's not small, and it's a good thing for CPU utilization even if you were not increasing iops. On top of this, can you provide the stack traces to see the difference in the profiles? Paolo