From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:51822) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UDdJr-00056l-6n for qemu-devel@nongnu.org; Thu, 07 Mar 2013 11:05:27 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UDdJm-0004yx-Fm for qemu-devel@nongnu.org; Thu, 07 Mar 2013 11:05:23 -0500 Received: from mx1.redhat.com ([209.132.183.28]:5038) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UDdJm-0004ys-93 for qemu-devel@nongnu.org; Thu, 07 Mar 2013 11:05:18 -0500 Date: Thu, 7 Mar 2013 11:05:13 -0500 From: Jeff Cody Message-ID: <20130307160513.GG22782@localhost.localdomain> References: <7aabf75e26ea1d60102fd0e1adfcd104aa3689c8.1362580930.git.jcody@redhat.com> <20130307155905.GE27175@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130307155905.GE27175@stefanha-thinkpad.redhat.com> Subject: Re: [Qemu-devel] [PATCH 7/7] block: add write support for VHDX images List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, sw@weilnetz.de, qemu-devel@nongnu.org On Thu, Mar 07, 2013 at 04:59:05PM +0100, Stefan Hajnoczi wrote: > On Wed, Mar 06, 2013 at 09:48:43AM -0500, Jeff Cody wrote: > > @@ -958,12 +960,150 @@ exit: > > return ret; > > } > > > > +/* > > + * Allocate a new payload block at the end of the file. > > + * > > + * Allocation will happen at 1MB alignment inside the file > > + * > > + * Returns the file offset start of the new payload block > > + */ > > +static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s, > > + uint64_t *new_offset) > > +{ > > + *new_offset = bdrv_getlength(bs->file); > > + > > + /* per the spec, the address for a block is in units of 1MB */ > > + if (*new_offset % (1024*1024)) { > > + *new_offset = ((*new_offset >> 20) + 1) << 20; /* round up to 1MB */ > > + } > > + return bdrv_truncate(bs->file, *new_offset + s->block_size); > > +} > > Is it necessary to resize the file? Why not just write at EOF to grow > it? > After the recent bdrv_truncate() discussion, I think that may be best. > > +/* > > + * Update the BAT tablet entry with the new file offset, and the new entry > > + * state */ > > +static int vhdx_update_bat_table_entry(BlockDriverState *bs, BDRVVHDXState *s, > > + vhdx_sector_info *sinfo, int state) > > +{ > > + uint64_t bat_tmp; > > + uint64_t bat_entry_offset; > > + > > + /* The BAT entry is a uint64, with 44 bits for the file offset in units of > > + * 1MB, and 3 bits for the block state. */ > > + s->bat[sinfo->bat_idx] = ((sinfo->file_offset>>20) << > > + VHDX_BAT_FILE_OFF_BITS); > > + > > + s->bat[sinfo->bat_idx] |= state & VHDX_BAT_STATE_BIT_MASK; > > > > + bat_tmp = cpu_to_le64(s->bat[sinfo->bat_idx]); > > + bat_entry_offset = s->bat_offset + sinfo->bat_idx * sizeof(vhdx_bat_entry); > > + > > + return bdrv_pwrite_sync(bs->file, bat_entry_offset, &bat_tmp, > > + sizeof(vhdx_bat_entry)); > > +} > > > > static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num, > > int nb_sectors, QEMUIOVector *qiov) > > { > > - return -ENOTSUP; > > + int ret = -ENOTSUP; > > + BDRVVHDXState *s = bs->opaque; > > + vhdx_sector_info sinfo; > > + uint64_t bytes_done = 0; > > + QEMUIOVector hd_qiov; > > + > > + qemu_iovec_init(&hd_qiov, qiov->niov); > > + > > + qemu_co_mutex_lock(&s->lock); > > + > > + /* Per the spec, on the first write of guest-visible data to the file the > > + * data write guid must be updated in the header */ > > + if (s->first_visible_write) { > > + s->first_visible_write = false; > > + vhdx_update_headers(bs, s, true); > > + } > > + > > + while (nb_sectors > 0) { > > + if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) { > > + /* not supported yet */ > > + ret = -ENOTSUP; > > + goto exit; > > + } else { > > + vhdx_block_translate(s, sector_num, nb_sectors, &sinfo); > > + > > + qemu_iovec_reset(&hd_qiov); > > + qemu_iovec_concat(&hd_qiov, qiov, bytes_done, sinfo.bytes_avail); > > + /* check the payload block state */ > > + switch (s->bat[sinfo.bat_idx] & VHDX_BAT_STATE_BIT_MASK) { > > + case PAYLOAD_BLOCK_ZERO: > > + /* in this case, we need to preserve zero writes for > > + * data that is not part of this write, so we must pad > > + * the rest of the buffer to zeroes */ > > + > > + /* if we are on a posix system with ftruncate() that extends > > + * a file, then it is zero-filled for us. On Win32, the raw > > + * layer uses SetFilePointer and SetFileEnd, which does not > > + * zero fill AFAIK */ > > + > > + /* TODO: queue another write of zero buffers if the host OS does > > + * not zero-fill on file extension */ > > + > > + /* fall through */ > > + case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */ > > + case PAYLOAD_BLOCK_UNMAPPED: /* fall through */ > > + case PAYLOAD_BLOCK_UNDEFINED: /* fall through */ > > + ret = vhdx_allocate_block(bs, s, &sinfo.file_offset); > > + if (ret < 0) { > > + goto exit; > > + } > > + /* once we support differencing files, this may also be > > + * partially present */ > > + /* update block state to the newly specified state */ > > + ret = vhdx_update_bat_table_entry(bs, s, &sinfo, > > + PAYLOAD_BLOCK_FULL_PRESENT); > > + if (ret < 0) { > > + goto exit; > > + } > > The BAT table entry must be written after data is already on disk. > Otherwise a crash partway through would leave the BAT table pointing to > undefined data. > > Do you need to use the log to ensure this? Yes, indeed. The log support is what I am working on now - all metadata and BAT writes are supposed to go through the log, for this reason. The next version will have log support, and so I will use that for the BAT writes (and any other metadata writes, except the header).