From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=36980 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1OvXz5-000153-Dj
	for qemu-devel@nongnu.org; Tue, 14 Sep 2010 12:03:55 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <stefanha@gmail.com>) id 1OvXz0-0007O7-3Z
	for qemu-devel@nongnu.org; Tue, 14 Sep 2010 12:03:51 -0400
Received: from mail-pw0-f45.google.com ([209.85.160.45]:56187)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <stefanha@gmail.com>) id 1OvXyz-0007Nu-S4
	for qemu-devel@nongnu.org; Tue, 14 Sep 2010 12:03:46 -0400
Received: by pwj4 with SMTP id 4so2566313pwj.4
	for <qemu-devel@nongnu.org>; Tue, 14 Sep 2010 09:03:44 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <4C8F9920.7070908@redhat.com>
References: <4C8F7394.8060802@redhat.com> <4C8F7BE4.5010102@codemonkey.ws>
	<4C8F9087.2050005@redhat.com> <4C8F92D9.2000908@codemonkey.ws>
	<4C8F9920.7070908@redhat.com>
Date: Tue, 14 Sep 2010 17:03:43 +0100
Message-ID: <AANLkTimtONED4Spt-i1A4bHdAfivxo3MhuECk-NAHtk3@mail.gmail.com>
Subject: Re: [Qemu-devel] qcow2 performance plan
From: Stefan Hajnoczi <stefanha@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Avi Kivity <avi@redhat.com>, qemu-devel <qemu-devel@nongnu.org>

On Tue, Sep 14, 2010 at 4:47 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 14.09.2010 17:20, schrieb Anthony Liguori:
>> On 09/14/2010 10:11 AM, Kevin Wolf wrote:
>>> Am 14.09.2010 15:43, schrieb Anthony Liguori:
>>>
>>>> Hi Avi,
>>>>
>>>> On 09/14/2010 08:07 AM, Avi Kivity wrote:
>>>>
>>>>> =A0 Here's a draft of a plan that should improve qcow2 performance. =
=A0It's
>>>>> written in wiki syntax for eventual upload to wiki.qemu.org; lines
>>>>> starting with # are numbered lists, not comments.
>>>>>
>>>> Thanks for putting this together. =A0I think it's really useful to thi=
nk
>>>> through the problem before anyone jumps in and starts coding.
>>>>
>>>>
>>>>> =3D Basics =3D
>>>>>
>>>>> At the minimum level, no operation should block the main thread. =A0T=
his
>>>>> could be done in two ways: extending the state machine so that each
>>>>> blocking operation can be performed asynchronously
>>>>> (<code>bdrv_aio_*</code>)
>>>>> or by threading: each new operation is handed off to a worker thread.
>>>>> Since a full state machine is prohibitively complex, this document
>>>>> will discuss threading.
>>>>>
>>>> There's two distinct requirements that must be satisfied by a fast blo=
ck
>>>> device. =A0The device must have fast implementations of aio functions =
and
>>>> it must support concurrent request processing.
>>>>
>>>> If an aio function blocks in the process of submitting the request, it=
's
>>>> by definition slow. =A0But even if you may the aio functions fast, you
>>>> still need to be able to support concurrent request processing in orde=
r
>>>> to achieve high throughput.
>>>>
>>>> I'm not going to comment in depth on your threading proposal. =A0When =
it
>>>> comes to adding concurrency, I think any approach will require a rewri=
te
>>>> of the qcow2 code and if the author of that rewrite is more comfortabl=
e
>>>> implementing concurrency with threads than with a state machine, I'm
>>>> happy with a threaded implementation.
>>>>
>>>> I'd suggest avoiding hyperbole like "a full state machine is
>>>> prohibitively complex". =A0QED is a full state machine. =A0qcow2 adds =
a
>>>> number of additional states because of the additional metadata and syn=
c
>>>> operations but it's not an exponential increase in complexity.
>>>>
>>> It will be quite some additional states that qcow2 brings in, but I
>>> suspect the really hard thing is getting the dependencies between
>>> requests right.
>>>
>>> I just had a look at how QED is doing this, and it seems to take the
>>> easy solution, namely allowing only one allocation at the same time.
>>
>> One L2 allocation, not cluster allocations. =A0You can allocate multiple
>> clusters concurrently and you can read/write L2s concurrently.
>>
>> Since L2 allocation only happens every 2GB, it's a rare event.
>
> Then your state machine is too complicated for me to understand. :-)
>
> Let me try to chase function pointers for a simple cluster allocation:
>
> bdrv_qed_aio_writev
> qed_aio_setup
> qed_aio_next_io
> qed_find_cluster
> qed_read_l2_table
> ...
> qed_find_cluster_cb
>
> This function contains the code to check if the cluster is already
> allocated, right?
>
> =A0 =A0n =3D qed_count_contiguous_clusters(s, request->l2_table->table,
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0index, n, &offset);
> =A0 =A0ret =3D offset ? QED_CLUSTER_FOUND : QED_CLUSTER_L2;
>
> The callback called from there is qed_aio_write_data(..., ret =3D
> QED_CLUSTER_L2, ...) which means
>
> =A0 =A0bool need_alloc =3D ret !=3D QED_CLUSTER_FOUND;
> =A0 =A0/* Freeze this request if another allocating write is in progress =
*/
> =A0 =A0if (need_alloc) {
> =A0 =A0...
>
> So where did I start to follow the path of a L2 table allocation instead
> of a simple cluster allocation?

qed_aio_write_main() writes the main body of data into the cluster.
Then it decides whether to update/allocate L2 tables if this is an
allocating write.

qed_aio_write_l2_update() is the function that gets called to touch
the L2 table (it also handles the allocation case).

Stefan