From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57952) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC3Zi-0006kJ-H7 for qemu-devel@nongnu.org; Wed, 21 Aug 2013 04:15:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VC3ZZ-0005qJ-VL for qemu-devel@nongnu.org; Wed, 21 Aug 2013 04:15:30 -0400 Received: from mail-ee0-x233.google.com ([2a00:1450:4013:c00::233]:50914) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VC3ZZ-0005pw-MC for qemu-devel@nongnu.org; Wed, 21 Aug 2013 04:15:21 -0400 Received: by mail-ee0-f51.google.com with SMTP id c1so60262eek.24 for ; Wed, 21 Aug 2013 01:15:21 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <521476ED.5030308@redhat.com> Date: Wed, 21 Aug 2013 10:14:37 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1377023667-20256-1-git-send-email-charlie@ctshepherd.com> <5213D628.4030409@redhat.com> <5213F37C.8090300@ctshepherd.com> In-Reply-To: <5213F37C.8090300@ctshepherd.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Charlie Shepherd Cc: kwolf@redhat.com, stefanha@gmail.com, gabriel@kerneis.info, qemu-devel@nongnu.org Il 21/08/2013 00:53, Charlie Shepherd ha scritto: > On 20/08/2013 21:48, Paolo Bonzini wrote: >> Il 20/08/2013 20:34, Charlie Shepherd ha scritto: >>> /* Return true if first block has been changed (ie. current version is >>> @@ -146,40 +114,82 @@ static inline int is_bit_set(BlockDriverState >>> *bs, int64_t bitnum) >>> static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs, >>> int64_t sector_num, int nb_sectors, int *num_same) >>> { >>> - int changed; >>> + int ret, changed; >>> + uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; >>> + >>> + int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; >>> + int remaining = sector_num - init_bits; >>> + int full_bytes = remaining / 8; >>> + int trail = remaining % 8; >>> + >>> + int len = !!init_bits + full_bytes + !!trail; >>> + uint8_t bitmap[len]; >> This is a basically unbounded allocation on the stack. You should split >> this in smaller ranges using the "num_same" argument, which is what I >> did in my patch. > > So if I understand your patch correctly, you read the next 512 bytes > (ie, one BDRV_SECTOR_SIZE) after offset into bitmap? Is this guaranteed > to be safe (like if the file isn't that long)? Yes, because the bitmap is always before the data, and always rounded to full sector size so that the data stays aligned: bitmap_size = ((bs->total_sectors + 7) >> 3) + sizeof(cow_header); s->cow_sectors_offset = (bitmap_size + 511) & ~511; > What if nb_sectors > 512 * 8? For cow_co_is_allocated, you have the luxury of returning information only for the fewer than nb_sectors. That is, you can set *num_same to a smaller value than nb_sectors, even if sector_num + *num_same has the same state as the [sector_num, sector_num + *num_same) range. It will cause extra calls to is_allocated in the callers, but that's it. > I think it's best to use your version of cow_co_is_allocated(), but > those are the questions that come to mind when trying to convert the > stack allocation in cow_update_bitmap() Good point. >>> + ret = bdrv_pread(bs->file, offset, buf, len); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + /* Do sector_num -> nearest byte boundary */ >>> + if (init_bits) { >>> + /* This sets the highest init_bits bits in the byte */ >>> + uint8_t bits = ((1 << init_bits) - 1) << (8 - init_bits); >>> + buf[0] |= bits; >>> + } >>> + >>> + if (full_bytes) { >>> + memset(&buf[!!init_bits], ~0, full_bytes); >>> + } >>> + >>> + /* Set the trailing bits in the final byte */ >>> + if (trail) { >>> + /* This sets the lowest trail bits in the byte */ >>> + uint8_t bits = (1 << trail) - 1; >>> + buf[len - 1] |= bits; >>> + } >> ... and you should also check if there is a change in the bits, and skip >> the flush if there is no change. Flushing a multi-megabyte write is >> very expensive. It basically makes format=cow as slow as >> format=raw,cache=writethrough. > > So if ORing the allocation makes no difference, don't flush? Yep! This means if an image is fully COW-ed, there will be no extra flush (only extra reads). I already did this, but in a very inefficient way because each bit would be read separately. Paolo > > Charlie >>> + ret = bdrv_pwrite(bs->file, offset, buf, len); >>> + if (ret < 0) { >>> + return ret; >>> } >>> - return error; >>> + return 0; >>> } >>> static int coroutine_fn cow_read(BlockDriverState *bs, int64_t >>> sector_num, >>> @@ -237,6 +247,13 @@ static int cow_write(BlockDriverState *bs, >>> int64_t sector_num, >>> return ret; >>> } >>> + /* We need to flush the data before writing the metadata so >>> that there is >>> + * no chance of metadata referring to data that doesn't exist. */ >>> + ret = bdrv_flush(bs->file); >>> + if (ret < 0) { >>> + return ret; >>> + } >> See above about this flush. >> >> Paolo >> >>> return cow_update_bitmap(bs, sector_num, nb_sectors); >>> } >>> > > >