From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:59446)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XaWsU-0003ea-Ap
	for qemu-devel@nongnu.org; Sat, 04 Oct 2014 17:28:40 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XaWsO-0005vF-4F
	for qemu-devel@nongnu.org; Sat, 04 Oct 2014 17:28:34 -0400
Received: from mx1.redhat.com ([209.132.183.28]:28039)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XaWsN-0005vB-SB
	for qemu-devel@nongnu.org; Sat, 04 Oct 2014 17:28:28 -0400
Received: from int-mx11.intmail.prod.int.phx2.redhat.com
	(int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s94LSQLh000726
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256
	verify=FAIL)
	for <qemu-devel@nongnu.org>; Sat, 4 Oct 2014 17:28:27 -0400
Message-ID: <54306676.1050205@redhat.com>
Date: Sat, 04 Oct 2014 23:28:22 +0200
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1412182919-9550-1-git-send-email-stefanha@redhat.com>
	<1412182919-9550-11-git-send-email-stefanha@redhat.com>
In-Reply-To: <1412182919-9550-11-git-send-email-stefanha@redhat.com>
Content-Type: text/plain; charset=iso-8859-15; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 10/11] block: let commit blockjob run in
	BDS AioContext
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>

On 01.10.2014 19:01, Stefan Hajnoczi wrote:
> The commit block job must run in the BlockDriverState AioContext so that
> it works with dataplane.
>
> Acquire the AioContext in blockdev.c so starting the block job is safe.
> One detail here is that the bdrv_drain_all() must be moved inside the
> aio_context_acquire() region so requests cannot sneak in between the
> drain and acquire.

Hm, I see the intent, but in patch 5 you said bdrv_drain_all() should 
never be called outside of the main loop (at least that's how it 
appeared to me). Wouldn't it be enough to use bdrv_drain() on the source 
BDS, like in patch 9?

> The completion code in block/commit.c must perform backing chain
> manipulation and bdrv_reopen() from the main loop.  Use
> block_job_defer_to_main_loop() to achieve that.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/commit.c | 72 +++++++++++++++++++++++++++++++++++++---------------------
>   blockdev.c     | 29 +++++++++++++++--------
>   2 files changed, 66 insertions(+), 35 deletions(-)
>
> diff --git a/block/commit.c b/block/commit.c
> index 91517d3..0fd05dc 100644
> --- a/block/commit.c
> +++ b/block/commit.c
> @@ -60,17 +60,50 @@ static int coroutine_fn commit_populate(BlockDriverState *bs,
>       return 0;
>   }
>   
> -static void coroutine_fn commit_run(void *opaque)
> +typedef struct {
> +    int ret;
> +} CommitCompleteData;
> +
> +static void commit_complete(BlockJob *job, void *opaque)
>   {
> -    CommitBlockJob *s = opaque;
> +    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
> +    CommitCompleteData *data = opaque;
>       BlockDriverState *active = s->active;
>       BlockDriverState *top = s->top;
>       BlockDriverState *base = s->base;
>       BlockDriverState *overlay_bs;
> +    int ret = data->ret;
> +
> +    if (!block_job_is_cancelled(&s->common) && ret == 0) {
> +        /* success */
> +        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
> +    }
> +
> +    /* restore base open flags here if appropriate (e.g., change the base back
> +     * to r/o). These reopens do not need to be atomic, since we won't abort
> +     * even on failure here */
> +    if (s->base_flags != bdrv_get_flags(base)) {
> +        bdrv_reopen(base, s->base_flags, NULL);
> +    }
> +    overlay_bs = bdrv_find_overlay(active, top);
> +    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
> +        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
> +    }
> +    g_free(s->backing_file_str);
> +    block_job_completed(&s->common, ret);
> +    g_free(data);
> +}
> +
> +static void coroutine_fn commit_run(void *opaque)
> +{
> +    CommitBlockJob *s = opaque;
> +    CommitCompleteData *data;
> +    BlockDriverState *top = s->top;
> +    BlockDriverState *base = s->base;
>       int64_t sector_num, end;
>       int ret = 0;
>       int n = 0;
> -    void *buf;
> +    void *buf = NULL;
>       int bytes_written = 0;
>       int64_t base_len;
>   
> @@ -78,18 +111,18 @@ static void coroutine_fn commit_run(void *opaque)
>   
>   
>       if (s->common.len < 0) {
> -        goto exit_restore_reopen;
> +        goto out;
>       }
>   
>       ret = base_len = bdrv_getlength(base);
>       if (base_len < 0) {
> -        goto exit_restore_reopen;
> +        goto out;
>       }
>   
>       if (base_len < s->common.len) {
>           ret = bdrv_truncate(base, s->common.len);
>           if (ret) {
> -            goto exit_restore_reopen;
> +            goto out;
>           }
>       }
>   
> @@ -128,7 +161,7 @@ wait:
>               if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
>                   s->on_error == BLOCKDEV_ON_ERROR_REPORT||
>                   (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC && ret == -ENOSPC)) {
> -                goto exit_free_buf;
> +                goto out;
>               } else {
>                   n = 0;
>                   continue;
> @@ -140,27 +173,14 @@ wait:
>   
>       ret = 0;
>   
> -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> -        /* success */
> -        ret = bdrv_drop_intermediate(active, top, base, s->backing_file_str);
> +out:
> +    if (buf) {
> +        qemu_vfree(buf);
>       }

Is this new condition really necessary? However, it won't hurt, so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

A general question regarding the assertions here and in patch 8: I tried 
to break them, but it couldn't find a way. The way I tried was by 
creating two devices in different threads with just one qcow2 behind 
each of them, and then trying to attach on of those qcow2 BDS to the 
other as a backing file. I couldn't find out, how, but I guess this is 
something we might want to support in the future. Can we actually be 
sure that all of the BDS in one tree are always running in the same AIO 
context? Are we already enforcing this?

And furthermore, basically all the calls to acquire an AIO context are 
of the form "aio_context = bdrv_get_aio_context(bs); 
aio_context_acquire(aio_context);". It is *extremely* unlikely if 
possible at all, but wouldn't it be possible to change the BDS's AIO 
context from another thread after the first function returned and before 
the lock is acquired? If that is really the case, I think we should have 
some atomic bdrv_acquire_aio_context() function.

Max