From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36119) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlDj2-00086s-I5 for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:47:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YlDiy-0005BN-EP for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:47:16 -0400 Received: from relay.parallels.com ([195.214.232.42]:60745) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YlDiy-0005B7-5I for qemu-devel@nongnu.org; Thu, 23 Apr 2015 05:47:12 -0400 Message-ID: <5538BF94.9080904@openvz.org> Date: Thu, 23 Apr 2015 12:47:00 +0300 From: "Denis V. Lunev" MIME-Version: 1.0 References: <1426069701-1405-1-git-send-email-den@openvz.org> <1426069701-1405-9-git-send-email-den@openvz.org> <20150422130855.GI27617@stefanha-thinkpad.redhat.com> <55379F36.3060906@openvz.org> <20150423092031.GC8811@stefanha-thinkpad.redhat.com> <20150423093223.GC5289@noname.redhat.com> In-Reply-To: <20150423093223.GC5289@noname.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 08/27] block/parallels: _co_writev callback for Parallels format List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf , Stefan Hajnoczi Cc: Stefan Hajnoczi , qemu-devel@nongnu.org On 23/04/15 12:32, Kevin Wolf wrote: > Am 23.04.2015 um 11:20 hat Stefan Hajnoczi geschrieben: >> On Wed, Apr 22, 2015 at 04:16:38PM +0300, Denis V. Lunev wrote: >>> On 22/04/15 16:08, Stefan Hajnoczi wrote: >>>> On Wed, Mar 11, 2015 at 01:28:02PM +0300, Denis V. Lunev wrote: >>>>> +static int64_t allocate_cluster(BlockDriverState *bs, int64_t sector_num) >>>>> +{ >>>>> + BDRVParallelsState *s = bs->opaque; >>>>> + uint32_t idx, offset, tmp; >>>>> + int64_t pos; >>>>> + int ret; >>>>> + >>>>> + idx = sector_num / s->tracks; >>>>> + offset = sector_num % s->tracks; >>>>> + >>>>> + if (idx >= s->catalog_size) { >>>>> + return -EINVAL; >>>>> + } >>>>> + if (s->catalog_bitmap[idx] != 0) { >>>>> + return (uint64_t)s->catalog_bitmap[idx] * s->off_multiplier + offset; >>>>> + } >>>>> + >>>>> + pos = bdrv_getlength(bs->file) >> BDRV_SECTOR_BITS; >>>>> + bdrv_truncate(bs->file, (pos + s->tracks) << BDRV_SECTOR_BITS); >>>>> + s->catalog_bitmap[idx] = pos / s->off_multiplier; >>>>> + >>>>> + tmp = cpu_to_le32(s->catalog_bitmap[idx]); >>>>> + >>>>> + ret = bdrv_pwrite_sync(bs->file, >>>>> + sizeof(ParallelsHeader) + idx * sizeof(tmp), &tmp, sizeof(tmp)); >>>> What is the purpose of the sync? >>> This is necessary to preserve image consistency on crash from >>> my point of view. There is no check consistency at the moment. >>> The sync will be removed later when proper crash detection >>> code will be added (patches 19, 20, 21) >> Let's look at possible orderings in case of failure: >> >> A. BAT update >> B. Data write >> >> This sync enforces A, B ordering. If we can see B, then A must also >> have happened thanks to the sync. >> >> But A, B ordering is too conservative. Imagine B, A ordering and the >> failure where we crash before A. It means we wrote the data but never >> linked it into the BAT. >> >> What happens in that case? We've leaked a cluster in the underlying >> image file but it doesn't corrupt the visible disk from the guest >> point-of-view. >> >> Because your implementation uses truncate to extend the file size before >> A, even the A, B failure case results in a leaked cluster. So the B, A >> case is not worse in any way. >> >> Why do other image formats sync cluster allocation updates? Because >> they support backing files and in that case an A, B ordering results in >> data corruption so they enforce B, A ordering (the opposite of what >> you're trying to do!). >> >> The reason why A, B ordering results in data corruption when backing >> files are in use is because the guest's write request might touch only a >> subset of the cluster (a couple of sectors out of the whole cluster). >> So the guest needs to copy the remaining sectors from the backing file. >> If there is a dangling BAT entry like in the A, B failure case, then the >> guest will see a zeroed cluster instead of the contents of the backing >> file. This is a data corruption, but only if a backing file is being >> used! >> >> So the sync is not necessary, both A, B and B, A ordering work for >> block/parallels.c. > Actually, I suspect this means that the parallels driver is restricted > to protocols with bdrv_has_zero_init() == true, otherwise zeros can turn > into random data (which means that it can't work e.g. directly on host > block devices). > > Do we enforce this? > > Kevin this is fixed in the patch 26 when the code is replaced with + if (s->data_end + s->tracks > pos) { + int ret; + if (s->prealloc_mode == PRL_PREALLOC_MODE_FALLOCATE) { + ret = bdrv_write_zeroes(bs->file, s->data_end, + s->prealloc_size, 0); + } else { + ret = bdrv_truncate(bs->file, + (s->data_end + s->prealloc_size) << BDRV_SECTOR_BITS); + } + if (ret < 0) { + return ret; + } + } on a default path, but you are correct. Some checking is necessary to be on a safe side. Den