From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=44756 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OvXNx-0007RS-Ue for qemu-devel@nongnu.org; Tue, 14 Sep 2010 11:25:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OvXNw-0001MX-Az for qemu-devel@nongnu.org; Tue, 14 Sep 2010 11:25:29 -0400 Received: from mail-vw0-f45.google.com ([209.85.212.45]:44106) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OvXNw-0001Ic-7W for qemu-devel@nongnu.org; Tue, 14 Sep 2010 11:25:28 -0400 Received: by mail-vw0-f45.google.com with SMTP id 19so6274431vws.4 for ; Tue, 14 Sep 2010 08:25:28 -0700 (PDT) Message-ID: <4C8F93E4.80008@codemonkey.ws> Date: Tue, 14 Sep 2010 10:25:24 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] qcow2 performance plan References: <4C8F7394.8060802@redhat.com> In-Reply-To: <4C8F7394.8060802@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , qemu-devel On 09/14/2010 08:07 AM, Avi Kivity wrote: > Here's a draft of a plan that should improve qcow2 performance. It's > written in wiki syntax for eventual upload to wiki.qemu.org; lines > starting with # are numbered lists, not comments. > > = Basics = > > At the minimum level, no operation should block the main thread. This > could be done in two ways: extending the state machine so that each > blocking operation can be performed asynchronously > (bdrv_aio_*) > or by threading: each new operation is handed off to a worker thread. > Since a full state machine is prohibitively complex, this document > will discuss threading. > > == Basic threading strategy == > > A first iteration of qcow2 threading adds a single mutex to an image. > The existing qcow2 code is then executed within a worker thread, > acquiring the mutex before starting any operation and releasing it > after completion. Concurrent operations will simply block until the > operation is complete. For operations which are already asynchronous, > the blocking time will be negligible since the code will call > bdrv_aio_{read,write} and return, releasing the mutex. > The immediate benefit is that currently blocking operations no long block > the main thread, instead they just block the block operation which is > blocking anyway. > > == Eliminating the threading penalty == > > We can eliminate pointless context switches by using the worker thread > context we're in to issue the I/O. This is trivial for synchronous calls > (bdrv_read and bdrv_write); we simply issue > the I/O > from the same thread we're currently in. The underlying raw block format > driver threading code needs to recognize we're in a worker thread > context so > it doesn't need to use a worker thread of its own; perhaps using a thread > variable to see if it is in the main thread or an I/O worker thread. > > For asynchronous operations, this is harder. We may add a > bdrv_queue_aio_read and bdrv_queue_aio_write if > to replace a > > bdrv_aio_read() > mutex_unlock(bs.mutex) > return; > > sequence. Alternatively, we can just eliminate asynchronous calls. To > retain concurrency we drop the mutex while performing the operation: > an convert a bdrv_aio_read to: > > mutex_unlock(bs.mutex) > bdrv_read() > mutex_lock(bs.mutex) The incremental version of this is hard for me to understand. bdrv_read() may be implemented in terms of bdrv_aio_read() + qemu_io_wait() which dispatches bottom halves. This is done through a shared resource so if you allow bdrv_read() to be called in parallel, there's a very real possibility that you'll get corruption of a shared resource. You'd have to first instrument bdrv_read() to be re-entrant by acquiring bs.mutex() in every bdrv_read() caller. You would then need to modify the file protocol so that it could safely be called in parallel. IOW, you've got to make the whole block layer thread safe before you can begin to make qcow2 thread safe. Regards, Anthony Liguori