From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51702) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y49gS-0004zc-R0 for qemu-devel@nongnu.org; Thu, 25 Dec 2014 09:46:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Y49gP-0002jc-KG for qemu-devel@nongnu.org; Thu, 25 Dec 2014 09:46:36 -0500 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:55131 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Y49gP-0002jN-A6 for qemu-devel@nongnu.org; Thu, 25 Dec 2014 09:46:33 -0500 Message-ID: <549C2341.2060909@kamp.de> Date: Thu, 25 Dec 2014 15:46:25 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1418743287-20467-1-git-send-email-pl@kamp.de> <20141216154832.GB3301@noname.str.redhat.com> <5490571B.8070207@kamp.de> <20141218103445.GA25902@noname.redhat.com> In-Reply-To: <20141218103445.GA25902@noname.redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH V2 0/4] *virtio-blk: add multiread support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: famz@redhat.com, benoit@irqsave.net, ming.lei@canonical.com, armbru@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com, mreitz@redhat.com Am 18.12.2014 um 11:34 schrieb Kevin Wolf: > Am 16.12.2014 um 17:00 hat Peter Lieven geschrieben: >> On 16.12.2014 16:48, Kevin Wolf wrote: >>> Am 16.12.2014 um 16:21 hat Peter Lieven geschrieben: >>>> this series adds the long missing multiread support to virtio-blk. >>>> >>>> some remarks: >>>> - i introduced rd_merged and wr_merged block accounting stats to >>>> blockstats as a generic interface which can be set from any >>>> driver that will introduce multirequest merging in the future. >>>> - the knob to disable request merging is not yet there. I would >>>> add it to the device properties also as a generic interface >>>> to have the same switch for any driver that might introduce >>>> request merging in the future. As there has been no knob in >>>> the past I would post this as a seperate series as it needs >>>> some mangling in parameter parsing which might lead to further >>>> discussions. >>>> - the old multiwrite interface is still there and might be removed. >>>> >>>> v1->v2: >>>> - add overflow checking for nb_sectors [Kevin] >>>> - do not change the name of the macro of max mergable requests. [Fam] >>> Diff to v1 looks good. Now I just need to check what it does to the >>> performance. Did you run any benchmarks yourself? >> I ran several installs of Debian/Ubuntu, Booting of Windows and Linux >> systems. I looked at rd_total_time_ns and wr_total_time_ns and saw >> no increase. Ofter I even saw even a decrease. >> >> {rd,wr}_total_time_ns measures the time from virtio_blk_handle_request >> to virtio_blk_rw_complete. So it seems to be a good indicator for the time >> spent with I/O. >> >> What you could to is post it on the top of your fio testing stack and >> look at the throughput. Sequential Reads should be faster. The rest >> not worse. > So I finally ran some fio benchmark on the series. The result for small > sequential reads (4k) is quite noisy, but it seems to be improved a bit. > Larger sequenial reads (64k) and random reads seem to be mostly > unaffected. > > For writes, however, I can see a degradation. Perhaps running multiple > jobs in parallel means that we don't detect and merge sequential > requests any more when they are interleaved with another sequential job. > Or do you have an idea what else could have changed for writes? I tried to digged a little more into this topic and maybe found whats going on. If I a right you are using Kernel >= 3.17 in the guest for your tests? Here are my test results of 4k sequential writes under a Linux 3.13 guest. Master: virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 wr_operations=8388608 flush_operations=0 wr_total_time_ns=980610962941 rd_total_time_ns=5656675 flush_total_time_ns=0 rd_merged=0 wr_merged=6533338 Multiread_v2: virtio-fio: rd_bytes=856064 wr_bytes=34359738368 rd_operations=209 wr_operations=8388608 flush_operations=0 wr_total_time_ns=558830918737 rd_total_time_ns=6159151 flush_total_time_ns=0 rd_merged=0 wr_merged=6266824 As you can see the number of merged requests is in the same order, but the wr_total_time_ns is heavily improved! What happened between Linux 3.13 and 3.17 is that Ming introduced the Multiqeue feature into the virtio-blk kernel code. The blk-mq developers intentionally set the number of hw_queues to 4 in Kernel 3.13 for virtio-blk. With the introduction of the Multiqeue feature in 3.17 the number of hw_queues is set the the number of virtqueues. If multique is unsupported the number is 1 and thus the number of hw_queues is also 1. So all the requests from all fio processes go into the same hw_queue and this seems to break the performance. The requests are heavily interleaved this way and as I only merge strictly sequential requests the old implementation which sorts requests wins. But it uses twice the computation time for this. I think this needs to be fixed in the virtio-blk kernel code and if we introduce multiqueue for virtio-blk into qemu, we should set the number of virtqueues to at least 4. Maybe the number of cpus would also be a good choice?! Peter