From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55086) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fOk9r-0001Fq-3q for qemu-devel@nongnu.org; Fri, 01 Jun 2018 09:31:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fOk9p-0000s5-7n for qemu-devel@nongnu.org; Fri, 01 Jun 2018 09:31:55 -0400 References: <1527801443-30620-1-git-send-email-ari@tuxera.com> From: Ari Sundholm Message-ID: <8ccfcf94-2282-b56b-4b79-41c05551d26d@tuxera.com> Date: Fri, 1 Jun 2018 16:31:43 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 1/5] block: Add blklogwrites List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake , qemu-block@nongnu.org Cc: qemu-devel@nongnu.org, Aapo Vienamo On 06/01/2018 03:26 PM, Eric Blake wrote: > On 05/31/2018 04:17 PM, Ari Sundholm wrote: >> From: Aapo Vienamo >> >> Implements a block device write logging system, similar to Linux kerne= l >=20 > [meta-comment] >=20 > Your patch threading is awkward - you forgot a cover letter, and then=20 > you did deep threading (every message in-reply-to the previous) instead= =20 > of shallow threading (all messages in-reply-to only the 0/5 cover=20 > letter).=C2=A0 This makes it harder for automated tooling to recognize = your=20 > series properly. >=20 > More patch submission hints at: > http://wiki.qemu.org/Contribute/SubmitAPatch >=20 Thank you, I will improve this in the next version. I was not aware that=20 the threading choice breaks the tooling. I should definitely have=20 included a cover letter and perhaps moved some of the comment in the=20 first patch there, with more explanation about some of the choices made. >> device mapper dm-log-writes. The write operations that are performed >> on a block device are logged to a file or another block device. The >> write log format is identical to the dm-log-writes format. Currently, >> log markers are not supported. >> >> This functionality can be used for fail-safe and fs consistency >> testing. By implementing it in qemu, tests utilizing write logs can be >> be used to test non-Linux drivers and older kernels. >> >> The implementation is based on the blkverify and blkdebug block driver= s. >> >> Signed-off-by: Aapo Vienamo >> Signed-off-by: Ari Sundholm >> --- >> =C2=A0 block.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0 6 - >> =C2=A0 block/Makefile.objs=C2=A0=C2=A0 |=C2=A0=C2=A0 1 + >> =C2=A0 block/blklogwrites.c=C2=A0 | 441=20 >> ++++++++++++++++++++++++++++++++++++++++++++++++++ >> =C2=A0 include/block/block.h |=C2=A0=C2=A0 7 + >> =C2=A0 4 files changed, 449 insertions(+), 6 deletions(-) >> =C2=A0 create mode 100644 block/blklogwrites.c >=20 > Without a cover letter, I'll have to read the entire series to see if=20 > there is something obviously missing.=C2=A0 For example, I don't yet kn= ow if=20 > a later patch adds to qapi/block-core.json to allow creation of this ne= w=20 > block device from QMP. >=20 >> >> diff --git a/block.c b/block.c >> index 501b64c..c8cffe1 100644 >> --- a/block.c >> +++ b/block.c >> @@ -1914,12 +1914,6 @@ int bdrv_child_try_set_perm(BdrvChild *c,=20 >> uint64_t perm, uint64_t shared, >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return 0; >> =C2=A0 } >> -#define DEFAULT_PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \ >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | BLK_PERM_WRITE \ >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | BLK_PERM_WRITE_UNCHANG= ED \ >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | BLK_PERM_RESIZE) >> -#define DEFAULT_PERM_UNCHANGED (BLK_PERM_ALL &=20 >> ~DEFAULT_PERM_PASSTHROUGH) >> - >=20 > Your commit message didn't mention why this code motion was necessary;=20 > in fact, if it IS necessary, it might be better to split it off into a=20 > separate patch. >=20 This was just to allow using these constants outside of block.c. Will=20 split this off into a separate patch, as requested. >> +++ b/block/blklogwrites.c >> @@ -0,0 +1,441 @@ >> +/* >> + * Write logging blk driver based on blkverify and blkdebug. >> + * >> + * Copyright (c) 2017 Tuomas Tynkkynen >> + * Copyright (c) 2018 Aapo Vienamo >> + * Copyright (c) 2018 Ari Sundholm >> + * >> + * This work is licensed under the terms of the GNU GPL, version 2 or= =20 >> later. >> + * See the COPYING file in the top-level directory. >> + */ >> + >> +#include "qemu/osdep.h" >> +#include "qapi/error.h" >> +#include "qemu/sockets.h" /* for EINPROGRESS on Windows */ >> +#include "block/block_int.h" >> +#include "qapi/qmp/qdict.h" >> +#include "qapi/qmp/qstring.h" >> +#include "qemu/cutils.h" >> +#include "qemu/option.h" >> + >> +/* Disk format stuff - taken from Linux drivers/md/dm-log-writes.c */ >=20 > Your copyright claims GPLv2+; but drivers/md/dm-log-writes.c states onl= y=20 > "* This file is released under the GPL." - I guess that means you're=20 > okay (parts of Linux are GPLv2-only, which we can't copy into a GPLv2+=20 > file). >=20 I consulted legal counsel on this part, and it is our opinion that the=20 constants and structs taken from the Linux kernel are insignificant and=20 required for compatibility, so it should be OK to license the file as=20 GPLv2+ in any case. Just to confirm, do you see this differently? >> + >> +/* Valid blk_log_writes filenames look like: >> + *=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 blk_log_writes:path/to/raw_image:pat= h/to/logfile */ >> +static void blk_log_writes_parse_filename(const char *filename, QDict= =20 >> *options, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 Error **errp) >=20 > Do we still want to support yet another legacy syntax addition?=C2=A0 I= 'd=20 > rather just require the user to pass in a valid --blockdev specificatio= n=20 > (which implies that you MUST support a QMP description), and skip the=20 > legacy syntax altogether. >=20 I believe this was inherited from blkverify, which was used as a basis.=20 I will have to do some research into how to do this properly for this=20 kind of a driver which requires multiple filenames. Any pointers would=20 be appreciated. >> + >> +static QemuOptsList runtime_opts =3D { >> +=C2=A0=C2=A0=C2=A0 .name =3D "blk_log_writes", >> +=C2=A0=C2=A0=C2=A0 .head =3D QTAILQ_HEAD_INITIALIZER(runtime_opts.hea= d), >> +=C2=A0=C2=A0=C2=A0 .desc =3D { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .n= ame =3D "x-raw", >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .t= ype =3D QEMU_OPT_STRING, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .h= elp =3D "[internal use only, will be removed]", >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 }, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .n= ame =3D "x-log", >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .t= ype =3D QEMU_OPT_STRING, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 .h= elp =3D "[internal use only, will be removed]", >=20 > And the fact that you are adding internal options from the getgo points= =20 > to the fact that we really want a QMP interface. >=20 Will try to find out how to do this properly. >=20 >> +static void blk_log_writes_refresh_limits(BlockDriverState *bs, Error= =20 >> **errp) >> +{ >> +=C2=A0=C2=A0=C2=A0 if (bs->bl.request_alignment < BDRV_SECTOR_SIZE) { >=20 > Why do you need sector alignment?=C2=A0 And in what cases would=20 > bs->bl.request_alignment be larger than sector alignment (does the bloc= k=20 > layer take into account if your log file and/or child BDS already have = a=20 > larger alignment?) >=20 Sector alignment is needed because the write logging takes place at=20 sector granularity. Each write log entry contains the sector offset and=20 the number of sectors the write spanned. I'll try to make this clearer=20 in the next version. >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bs->bl.request_alignment =3D= BDRV_SECTOR_SIZE; >> + >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (bs->bl.pdiscard_alignm= ent && >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 bs->bl.pdiscard_alignment < bs->bl.request_alignmen= t) >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 bs= ->bl.pdiscard_alignment =3D bs->bl.request_alignment; >=20 > Every 'if' MUST use {}.=C2=A0 scripts/checkpatch.pl should have flagged= this. >=20 Thanks, will fix this and other instances. >> +static void coroutine_fn blk_log_writes_co_do_log(void *opaque) >> +{ >=20 >> + >> +=C2=A0=C2=A0=C2=A0 lr->r->done++; >> +=C2=A0=C2=A0=C2=A0 qemu_coroutine_enter_if_inactive(lr->r->co); >> +} >> + >> +static void blk_log_writes_co_do_file(void *opaque) >> +{ >> +=C2=A0=C2=A0=C2=A0 BlkLogWritesFileReq *fr =3D opaque; >> + >> +=C2=A0=C2=A0=C2=A0 fr->file_ret =3D fr->func(fr); >> + >> +=C2=A0=C2=A0=C2=A0 fr->r->done++; >=20 > Two non-atomic increments... >=20 >> +=C2=A0=C2=A0=C2=A0 qemu_coroutine_enter_if_inactive(fr->r->co); >> +} >> + >> +static int coroutine_fn >> +blk_log_writes_co_log(BlockDriverState *bs, uint64_t offset, uint64_t= =20 >> bytes, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 QEMUIOVector *q= iov, int flags, >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 int (*file_func= )(BlkLogWritesFileReq *r), >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 uint64_t entry_= flags) >> +{ >=20 >> +=C2=A0=C2=A0=C2=A0 qemu_coroutine_enter(co_file); >> +=C2=A0=C2=A0=C2=A0 qemu_coroutine_enter(co_log); >> + >> +=C2=A0=C2=A0=C2=A0 while (r.done < 2) { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 qemu_coroutine_yield(); >> +=C2=A0=C2=A0=C2=A0 } >=20 > ...used as the condition for waiting.=C2=A0 Since the point of coroutin= es is=20 > to allow (restricted) parallel operation, there's a chance that the=20 > coroutine implementation can be utilizing parallel threads; if that's=20 > the case, then on the rare race when both threads try to increment at=20 > near the same time, they can both read 0 and write 1, at which point=20 > this wait loop would be an infinite loop.=C2=A0 You're probably better = off=20 > using atomics (even if I'm wrong about coroutines being able to race=20 > each other on the increment, as the other point of coroutines is that=20 > they provide restricted parallelism where you can also implement them i= n=20 > only a single thread because of well-defined yield points). >=20 Inherited from blkverify, I believe. Will fix. >=20 >> +static BlockDriver bdrv_blk_log_writes =3D { >> +=C2=A0=C2=A0=C2=A0 .format_name=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D "blk_log_writes", >> +=C2=A0=C2=A0=C2=A0 .protocol_name=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 =3D "blk_log_writes", >> +=C2=A0=C2=A0=C2=A0 .instance_size=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 =3D sizeof(BDRVBlkLogWritesState), >> + >> +=C2=A0=C2=A0=C2=A0 .bdrv_parse_filename=C2=A0=C2=A0=C2=A0 =3D blk_log= _writes_parse_filename, >> +=C2=A0=C2=A0=C2=A0 .bdrv_file_open=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 =3D blk_log_writes_open, >> +=C2=A0=C2=A0=C2=A0 .bdrv_close=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D blk_log_writes_close, >> +=C2=A0=C2=A0=C2=A0 .bdrv_getlength=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 =3D blk_log_writes_getlength, >> +=C2=A0=C2=A0=C2=A0 .bdrv_refresh_filename=C2=A0 =3D blk_log_writes_re= fresh_filename, >> +=C2=A0=C2=A0=C2=A0 .bdrv_child_perm=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D blk_log_writes_child_perm, >> +=C2=A0=C2=A0=C2=A0 .bdrv_refresh_limits=C2=A0=C2=A0=C2=A0 =3D blk_log= _writes_refresh_limits, >> + >> +=C2=A0=C2=A0=C2=A0 .bdrv_co_preadv=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0 =3D blk_log_writes_co_preadv, >> +=C2=A0=C2=A0=C2=A0 .bdrv_co_pwritev=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D blk_log_writes_co_pwritev, >> +=C2=A0=C2=A0=C2=A0 .bdrv_co_flush_to_disk=C2=A0 =3D blk_log_writes_co= _flush_to_disk, >> +=C2=A0=C2=A0=C2=A0 .bdrv_co_pdiscard=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 =3D blk_log_writes_co_pdiscard, >> + >> +=C2=A0=C2=A0=C2=A0 .is_filter=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D true, >> +}; >=20 > No .bdrv_co_pwrite_zeroes support?=C2=A0 While the block layer will emu= late=20 > zero support on top of .bdrv_co_pwritev, that is probably a performance= =20 > killer compared to you implementing it directly. >=20 I see. I'll have to consider the implications with regard to accurate=20 write logging to see if this is feasible and desirable. > Also, you probably want .bdrv_co_block_status =3D=20 > bdrv_co_block_status_from_file, so that the block layer sees the=20 > underlying device's block status rather than treating the entire device= =20 > as non-sparse. >=20 Thank you, will look into this. Best regards, Ari Sundholm ari@tuxera.com