From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39871) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XMW1k-00074y-SI for qemu-devel@nongnu.org; Wed, 27 Aug 2014 01:44:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XMW1c-0003d5-KS for qemu-devel@nongnu.org; Wed, 27 Aug 2014 01:44:12 -0400 Received: from mail1.windriver.com ([147.11.146.13]:39089) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XMW1c-0003cm-BW for qemu-devel@nongnu.org; Wed, 27 Aug 2014 01:44:04 -0400 Message-ID: <53FD701B.9030402@windriver.com> Date: Tue, 26 Aug 2014 23:43:55 -0600 From: Chris Friesen MIME-Version: 1.0 References: <53C9A440.7020306@windriver.com> <53CA06ED.1090102@redhat.com> <53CA0FC4.8080802@windriver.com> <53CA1D06.9090601@redhat.com> <20140719084537.GA3058@irqsave.net> <53CD2AE1.6080803@windriver.com> <20140721151540.GA22161@irqsave.net> <53CD3341.60705@windriver.com> <20140721161034.GC22161@irqsave.net> <53F7E77A.9050509@windriver.com> <20140823075658.GA6687@irqsave.net> <53FBAF8A.3050005@windriver.com> In-Reply-To: <53FBAF8A.3050005@windriver.com> Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] is there a limit on the number of in-flight I/O operations? List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?ISO-8859-1?Q?Beno=EEt_Canet?= Cc: Paolo Bonzini , qemu-devel@nongnu.org On 08/25/2014 03:50 PM, Chris Friesen wrote: > I think I might have a glimmering of what's going on. Someone please > correct me if I get something wrong. > > I think that VIRTIO_PCI_QUEUE_MAX doesn't really mean anything with > respect to max inflight operations, and neither does virtio-blk calling > virtio_add_queue() with a queue size of 128. > > I think what's happening is that virtio_blk_handle_output() spins, > pulling data off the 128-entry queue and calling > virtio_blk_handle_request(). At this point that queue entry can be > reused, so the queue size isn't really relevant. > > In virtio_blk_handle_write() we add the request to a MultiReqBuffer and > every 32 writes we'll call virtio_submit_multiwrite() which calls down > into bdrv_aio_multiwrite(). That tries to merge requests and then for > each resulting request calls bdrv_aio_writev() which ends up calling > qemu_rbd_aio_writev(), which calls rbd_start_aio(). > > rbd_start_aio() allocates a buffer and converts from iovec to a single > buffer. This buffer stays allocated until the request is acked, which > is where the bulk of the memory overhead with rbd is coming from (has > anyone considered adding iovec support to rbd to avoid this extra copy?). > > The only limit I see in the whole call chain from > virtio_blk_handle_request() on down is the call to > bdrv_io_limits_intercept() in bdrv_co_do_writev(). However, that > doesn't provide any limit on the absolute number of inflight operations, > only on operations/sec. If the ceph server cluster can't keep up with > the aggregate load, then the number of inflight operations can still > grow indefinitely. > > Chris I was a bit concerned that I'd need to extend the IO throttling code to support a limit on total inflight bytes, but it doesn't look like that will be necessary. It seems that using mallopt() to set the trim/mmap thresholds to 128K is enough to minimize the increase in RSS and also drop it back down after an I/O burst. For now this looks like it should be sufficient for our purposes. I'm actually a bit surprised I didn't have to go lower, but it seems to work for both "dd" and dbench testcases so we'll give it a try. Chris