From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:51603) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V49oi-0000Yf-2R for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:25 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1V49oc-0007EB-4D for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:41268) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1V49ob-0007E4-Rx for qemu-devel@nongnu.org; Tue, 30 Jul 2013 09:18:14 -0400 Message-ID: <51F7BCFE.40808@redhat.com> Date: Tue, 30 Jul 2013 15:17:50 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1375168668-7109-1-git-send-email-famz@redhat.com> <1375168668-7109-2-git-send-email-famz@redhat.com> <20130730125300.GB7239@stefanha-thinkpad.redhat.com> In-Reply-To: <20130730125300.GB7239@stefanha-thinkpad.redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, jcody@redhat.com, Fam Zheng , qemu-devel@nongnu.org, stefanha@redhat.com Il 30/07/2013 14:53, Stefan Hajnoczi ha scritto: > On Tue, Jul 30, 2013 at 03:17:47PM +0800, Fam Zheng wrote: >> for (sector_num = 0; sector_num < end; sector_num += n) { >> - uint64_t delay_ns = 0; >> - bool copy; >> >> -wait: >> - /* Note that even when no rate limit is applied we need to yield >> - * with no pending I/O here so that bdrv_drain_all() returns. >> - */ >> - block_job_sleep_ns(&s->common, rt_clock, delay_ns); >> - if (block_job_is_cancelled(&s->common)) { >> - break; >> - } >> /* Copy if allocated above the base */ >> ret = bdrv_co_is_allocated_above(top, base, sector_num, >> - COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE, >> + COMMIT_BUFFER_SECTORS, >> &n); >> - copy = (ret == 1); >> - trace_commit_one_iteration(s, sector_num, n, ret); >> - if (copy) { >> - if (s->common.speed) { >> - delay_ns = ratelimit_calculate_delay(&s->limit, n); >> - if (delay_ns > 0) { >> - goto wait; >> - } >> + if (ret) { >> + bdrv_set_dirty(top, sector_num, n); >> + } > > This could take a while on a big image. You need sleep/cancel here like > the other blockjob loops have. > > I think error handling isn't sufficient here. If > bdrv_co_is_allocated_above() fails you need to exit (if n becomes 0 on > error, then this is an infinite loop!). > >> + bdrv_dirty_iter_init(s->top, &hbi); >> + for (next_dirty = hbitmap_iter_next(&hbi); >> + next_dirty >= 0; >> + next_dirty = hbitmap_iter_next(&hbi)) { >> + sector_num = next_dirty; >> + if (block_job_is_cancelled(&s->common)) { >> + goto exit; >> + } >> + delay_ns = ratelimit_calculate_delay(&s->limit, >> + COMMIT_BUFFER_SECTORS); >> + /* Note that even when no rate limit is applied we need to yield >> + * with no pending I/O here so that bdrv_drain_all() returns. >> + */ >> + block_job_sleep_ns(&s->common, rt_clock, delay_ns); >> + trace_commit_one_iteration(s, sector_num, >> + COMMIT_BUFFER_SECTORS, ret); >> + ret = commit_populate(top, base, sector_num, >> + COMMIT_BUFFER_SECTORS, buf); > > Can we be sure that a guest write during commit_populate()... > >> + if (ret < 0) { >> + if (s->on_error == BLOCKDEV_ON_ERROR_STOP || >> + s->on_error == BLOCKDEV_ON_ERROR_REPORT || >> + (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC && >> + ret == -ENOSPC)) { >> + goto exit; >> + } else { >> + continue; >> + } >> } >> + /* Publish progress */ >> + s->common.offset += COMMIT_BUFFER_BYTES; >> + bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS); > > ...sets the dirty but *after* us? Otherwise there is a race condition > where guest writes fail to be copied into the base image. > > I think the answer is "no" since commit_populate() performs two separate > blocking operations: bdrv_read(top) followed by bdrv_write(base). Now > imagine the guest does bdrv_aio_writev(top) after we complete > bdrv_read(top) but before we do bdrv_reset_dirty(). Indeed, reset must always come _before_ reads (and set comes always after writes). Paolo