From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:51603)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1V49oi-0000Yf-2R
	for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:25 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1V49oc-0007EB-4D
	for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:20 -0400
Received: from mx1.redhat.com ([209.132.183.28]:41268)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1V49ob-0007E4-Rx
	for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:14 -0400
Message-ID: <51F7BCFE.40808@redhat.com>
Date: Tue, 30 Jul 2013 15:17:50 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1375168668-7109-1-git-send-email-famz@redhat.com>
	<1375168668-7109-2-git-send-email-famz@redhat.com>
	<20130730125300.GB7239@stefanha-thinkpad.redhat.com>
In-Reply-To: <20130730125300.GB7239@stefanha-thinkpad.redhat.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active
 image
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, jcody@redhat.com, Fam Zheng <famz@redhat.com>, qemu-devel@nongnu.org, stefanha@redhat.com

Il 30/07/2013 14:53, Stefan Hajnoczi ha scritto:
> On Tue, Jul 30, 2013 at 03:17:47PM +0800, Fam Zheng wrote:
>>      for (sector_num = 0; sector_num < end; sector_num += n) {
>> -        uint64_t delay_ns = 0;
>> -        bool copy;
>>  
>> -wait:
>> -        /* Note that even when no rate limit is applied we need to yield
>> -         * with no pending I/O here so that bdrv_drain_all() returns.
>> -         */
>> -        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> -        if (block_job_is_cancelled(&s->common)) {
>> -            break;
>> -        }
>>          /* Copy if allocated above the base */
>>          ret = bdrv_co_is_allocated_above(top, base, sector_num,
>> -                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>> +                                         COMMIT_BUFFER_SECTORS,
>>                                           &n);
>> -        copy = (ret == 1);
>> -        trace_commit_one_iteration(s, sector_num, n, ret);
>> -        if (copy) {
>> -            if (s->common.speed) {
>> -                delay_ns = ratelimit_calculate_delay(&s->limit, n);
>> -                if (delay_ns > 0) {
>> -                    goto wait;
>> -                }
>> +        if (ret) {
>> +            bdrv_set_dirty(top, sector_num, n);
>> +        }
> 
> This could take a while on a big image.  You need sleep/cancel here like
> the other blockjob loops have.
> 
> I think error handling isn't sufficient here.  If
> bdrv_co_is_allocated_above() fails you need to exit (if n becomes 0 on
> error, then this is an infinite loop!).
> 
>> +        bdrv_dirty_iter_init(s->top, &hbi);
>> +        for (next_dirty = hbitmap_iter_next(&hbi);
>> +                next_dirty >= 0;
>> +                next_dirty = hbitmap_iter_next(&hbi)) {
>> +            sector_num = next_dirty;
>> +            if (block_job_is_cancelled(&s->common)) {
>> +                goto exit;
>> +            }
>> +            delay_ns = ratelimit_calculate_delay(&s->limit,
>> +                                                 COMMIT_BUFFER_SECTORS);
>> +            /* Note that even when no rate limit is applied we need to yield
>> +             * with no pending I/O here so that bdrv_drain_all() returns.
>> +             */
>> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> +            trace_commit_one_iteration(s, sector_num,
>> +                                       COMMIT_BUFFER_SECTORS, ret);
>> +            ret = commit_populate(top, base, sector_num,
>> +                                  COMMIT_BUFFER_SECTORS, buf);
> 
> Can we be sure that a guest write during commit_populate()...
> 
>> +            if (ret < 0) {
>> +                if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
>> +                    s->on_error == BLOCKDEV_ON_ERROR_REPORT ||
>> +                    (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC &&
>> +                         ret == -ENOSPC)) {
>> +                    goto exit;
>> +                } else {
>> +                    continue;
>> +                }
>>              }
>> +            /* Publish progress */
>> +            s->common.offset += COMMIT_BUFFER_BYTES;
>> +            bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS);
> 
> ...sets the dirty but *after* us?  Otherwise there is a race condition
> where guest writes fail to be copied into the base image.
> 
> I think the answer is "no" since commit_populate() performs two separate
> blocking operations: bdrv_read(top) followed by bdrv_write(base).  Now
> imagine the guest does bdrv_aio_writev(top) after we complete
> bdrv_read(top) but before we do bdrv_reset_dirty().

Indeed, reset must always come _before_ reads (and set comes always
after writes).

Paolo