From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:34950) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7Mze-0004FX-I0 for qemu-devel@nongnu.org; Mon, 30 May 2016 09:12:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b7Mza-0005V0-EV for qemu-devel@nongnu.org; Mon, 30 May 2016 09:12:30 -0400 Received: from mail-db3on0131.outbound.protection.outlook.com ([157.55.234.131]:59715 helo=emea01-db3-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b7MzZ-0005Uo-RK for qemu-devel@nongnu.org; Mon, 30 May 2016 09:12:26 -0400 References: <1463229957-14253-1-git-send-email-den@openvz.org> <1463229957-14253-3-git-send-email-den@openvz.org> <20160527173335.GF27946@stefanha-x1.localdomain> <574C0404.9040507@virtuozzo.com> From: Pavel Butsykin Message-ID: <574C38D9.8020709@virtuozzo.com> Date: Mon, 30 May 2016 15:58:01 +0300 MIME-Version: 1.0 In-Reply-To: <574C0404.9040507@virtuozzo.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 02/10] qcow2: add qcow2_co_write_compressed List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi , "Denis V. Lunev" Cc: Kevin Wolf , Jeff Cody , qemu-devel@nongnu.org, Markus Armbruster , Stefan Hajnoczi , John Snow On 30.05.2016 12:12, Pavel Butsykin wrote: > On 27.05.2016 20:33, Stefan Hajnoczi wrote: >> On Sat, May 14, 2016 at 03:45:50PM +0300, Denis V. Lunev wrote: >>> + qemu_co_mutex_lock(&s->lock); >>> + cluster_offset = \ >>> + qcow2_alloc_compressed_cluster_offset(bs, sector_num << 9, >>> out_len); >> >> The backslash isn't necessary for wrapping lines in C. This kind of >> thing is only necessary in languages like Python where the grammar is >> whitespace sensistive. >> >> The C compiler is happy with an arbitrary amount of whitespace >> (newlines) in the middle of a statement. The backslash in C is handled >> by the preprocessor: it joins the line. That's useful for macro >> definitions where you need to tell the preprocessor that several lines >> belong to one macro definition. But it's not needed for normal C code. >> > Thanks for the explanation, but the backslash is used more for the > person as a marker a line break. The current coding style misses this > point, but I can remove the backslash, because I don't think it's > something important :) > >>> + if (!cluster_offset) { >>> + qemu_co_mutex_unlock(&s->lock); >>> + ret = -EIO; >>> + goto fail; >>> + } >>> + cluster_offset &= s->cluster_offset_mask; >>> >>> - BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >>> - ret = bdrv_pwrite(bs->file->bs, cluster_offset, out_buf, >>> out_len); >>> - if (ret < 0) { >>> - goto fail; >>> - } >>> + ret = qcow2_pre_write_overlap_check(bs, 0, cluster_offset, >>> out_len); >>> + qemu_co_mutex_unlock(&s->lock); >>> + if (ret < 0) { >>> + goto fail; >>> } >>> >>> + iov = (struct iovec) { >>> + .iov_base = out_buf, >>> + .iov_len = out_len, >>> + }; >>> + qemu_iovec_init_external(&hd_qiov, &iov, 1); >>> + >>> + BLKDBG_EVENT(bs->file, BLKDBG_WRITE_COMPRESSED); >>> + ret = bdrv_co_pwritev(bs->file->bs, cluster_offset, out_len, >>> &hd_qiov, 0); >> >> There is a race condition here: >> >> If the newly allocated cluster is only partially filled by compressed >> data then qcow2_alloc_compressed_cluster_offset() remembers that more >> bytes are still available in the cluster. The >> qcow2_alloc_compressed_cluster_offset() caller will continue filling the >> same cluster. >> >> Imagine two compressed writes running at the same time. Write A >> allocates just a few bytes so write B shares a sector with the first >> write: Sorry, but it seems this will never happen, because the second write will not pass this check: uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs, uint64_t offset, int compressed_size) { ... /* Compression can't overwrite anything. Fail if the cluster was already * allocated. */ cluster_offset = be64_to_cpu(l2_table[l2_index]); if (cluster_offset & L2E_OFFSET_MASK) { qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table); return 0; } ... As you can see we can't do the compressed write in the already allocated cluster. >> >> Sector 1 >> |AAABBBBBBBBB| >> >> The race condition is that bdrv_co_pwritev() uses read-modify-write (a >> bounce buffer). If both requests call bdrv_co_pwritev() around the same >> time then the following could happen: >> >> Sector 1 >> |000BBBBBBBBB| >> >> or: >> >> Sector 1 >> |AAA000000000| >> >> It's necessary to hold s->lock around the compressed data write to avoid >> this race condition. >> > I agree, there is really a race.. Thank you, this is a very good point! > >