From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:52440) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VButd-0000O2-Pq for qemu-devel@nongnu.org; Tue, 20 Aug 2013 18:59:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VButW-0004HA-Fb for qemu-devel@nongnu.org; Tue, 20 Aug 2013 18:59:29 -0400 Received: from mail6.webfaction.com ([74.55.86.74]:46278 helo=smtp.webfaction.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VBuqV-0003FF-Gr for qemu-devel@nongnu.org; Tue, 20 Aug 2013 18:56:15 -0400 Message-ID: <5213F37C.8090300@ctshepherd.com> Date: Tue, 20 Aug 2013 23:53:48 +0100 From: Charlie Shepherd MIME-Version: 1.0 References: <1377023667-20256-1-git-send-email-charlie@ctshepherd.com> <5213D628.4030409@redhat.com> In-Reply-To: <5213D628.4030409@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/2] Make cow_co_is_allocated and cow_update_bitmap more efficient List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: kwolf@redhat.com, stefanha@gmail.com, gabriel@kerneis.info, qemu-devel@nongnu.org On 20/08/2013 21:48, Paolo Bonzini wrote: > Il 20/08/2013 20:34, Charlie Shepherd ha scritto: >> /* Return true if first block has been changed (ie. current version is >> @@ -146,40 +114,82 @@ static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum) >> static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs, >> int64_t sector_num, int nb_sectors, int *num_same) >> { >> - int changed; >> + int ret, changed; >> + uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; >> + >> + int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; >> + int remaining = sector_num - init_bits; >> + int full_bytes = remaining / 8; >> + int trail = remaining % 8; >> + >> + int len = !!init_bits + full_bytes + !!trail; >> + uint8_t bitmap[len]; > This is a basically unbounded allocation on the stack. You should split > this in smaller ranges using the "num_same" argument, which is what I > did in my patch. So if I understand your patch correctly, you read the next 512 bytes (ie, one BDRV_SECTOR_SIZE) after offset into bitmap? Is this guaranteed to be safe (like if the file isn't that long)? What if nb_sectors > 512 * 8? I think it's best to use your version of cow_co_is_allocated(), but those are the questions that come to mind when trying to convert the stack allocation in cow_update_bitmap() >> if (nb_sectors == 0) { >> - *num_same = nb_sectors; >> - return 0; >> + *num_same = nb_sectors; >> + return 0; >> } >> >> - changed = is_bit_set(bs, sector_num); >> - if (changed < 0) { >> - return 0; /* XXX: how to return I/O errors? */ >> + ret = bdrv_pread(bs->file, offset, bitmap, len); >> + if (ret < 0) { >> + return ret; >> } >> >> + changed = cow_test_bit(sector_num, bitmap); >> for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) { >> - if (is_bit_set(bs, sector_num + *num_same) != changed) >> - break; >> + if (cow_test_bit(sector_num + *num_same, bitmap) != changed) { >> + break; >> + } >> } >> >> return changed; >> } >> >> +/* Set the bits from sector_num to sector_num + nb_sectors in the bitmap of >> + * bs->file. */ >> static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num, >> int nb_sectors) >> { >> - int error = 0; >> - int i; >> + int ret; >> + uint64_t offset = sizeof(struct cow_header_v2) + sector_num / 8; >> >> - for (i = 0; i < nb_sectors; i++) { >> - error = cow_set_bit(bs, sector_num + i); >> - if (error) { >> - break; >> - } >> + int init_bits = (sector_num % 8) ? (8 - (sector_num % 8)) : 0; >> + int remaining = sector_num - init_bits; >> + int full_bytes = remaining / 8; >> + int trail = remaining % 8; >> + >> + int len = !!init_bits + full_bytes + !!trail; >> + uint8_t buf[len]; > Here your patch has indeed an improvement over mine. However, this is > another basically unbounded allocation on the stack. You should split > bitmap updates in smaller parts (doing 512-byte aligned writes is fine, > each covers 2MB in the file and writes this big are very rare!). > >> + ret = bdrv_pread(bs->file, offset, buf, len); >> + if (ret < 0) { >> + return ret; >> + } >> + >> + /* Do sector_num -> nearest byte boundary */ >> + if (init_bits) { >> + /* This sets the highest init_bits bits in the byte */ >> + uint8_t bits = ((1 << init_bits) - 1) << (8 - init_bits); >> + buf[0] |= bits; >> + } >> + >> + if (full_bytes) { >> + memset(&buf[!!init_bits], ~0, full_bytes); >> + } >> + >> + /* Set the trailing bits in the final byte */ >> + if (trail) { >> + /* This sets the lowest trail bits in the byte */ >> + uint8_t bits = (1 << trail) - 1; >> + buf[len - 1] |= bits; >> + } > ... and you should also check if there is a change in the bits, and skip > the flush if there is no change. Flushing a multi-megabyte write is > very expensive. It basically makes format=cow as slow as > format=raw,cache=writethrough. So if ORing the allocation makes no difference, don't flush? Charlie >> + ret = bdrv_pwrite(bs->file, offset, buf, len); >> + if (ret < 0) { >> + return ret; >> } >> >> - return error; >> + return 0; >> } >> >> static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num, >> @@ -237,6 +247,13 @@ static int cow_write(BlockDriverState *bs, int64_t sector_num, >> return ret; >> } >> >> + /* We need to flush the data before writing the metadata so that there is >> + * no chance of metadata referring to data that doesn't exist. */ >> + ret = bdrv_flush(bs->file); >> + if (ret < 0) { >> + return ret; >> + } > See above about this flush. > > Paolo > >> return cow_update_bitmap(bs, sector_num, nb_sectors); >> } >> >>