From: Sam Li <faithilikerun@gmail.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
Dmitry Fomichev <dmitry.fomichev@wdc.com>,
Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
qemu block <qemu-block@nongnu.org>,
Hanna Reitz <hreitz@redhat.com>, Hannes Reinecke <hare@suse.de>,
Damien Le Moal <damien.lemoal@opensource.wdc.com>
Subject: Re: [RFC v2] Adding block layer APIs resembling Linux ZoneBlockDevice ioctls.
Date: Sat, 25 Jun 2022 00:10:54 +0800 [thread overview]
Message-ID: <CAAAx-8+eBON8=bKXVkdpaT8_indgpdp8JPLaXAyWQE4phFE2mw@mail.gmail.com> (raw)
In-Reply-To: <YrXdJQe+KiRcM5RN@stefanha-x1.localdomain>
Stefan Hajnoczi <stefanha@redhat.com> 于2022年6月24日周五 23:50写道:
>
> On Fri, Jun 24, 2022 at 11:14:32AM +0800, Sam Li wrote:
> > Hi Stefan,
> >
> > Stefan Hajnoczi <stefanha@redhat.com> 于2022年6月20日周一 15:55写道:
> > >
> > > On Mon, Jun 20, 2022 at 11:36:11AM +0800, Sam Li wrote:
> > >
> > > Hi Sam,
> > > Is this version 2 of "[RFC v1] Add support for zoned device"? Please
> > > keep the email subject line the same (except for "v2", "v3", etc) so
> > > that it's clear which patch series this new version replaces.
> > >
> > > > Fix some mistakes before. It can report a range of zones now.
> > >
> > > This looks like the description of what changed compared to v1. Please
> > > put the changelog below "---" in the future. When patch emails are
> > > merged by git-am(1) it keeps the text above "---" and discards the text
> > > below "---". The changelog is usually no longer useful once the patches
> > > are merged, so it should be located below the "---" line.
> > >
> > > The text above the "---" is the commit description (an explanation of
> > > why this commit is necessary). In this case the commit description
> > > should explain that this patch adds .bdrv_co_zone_report() and
> > > .bdrv_co_zone_mgmt() to BlockDriver so that zoned block devices can be
> > > supported.
> > >
> > > >
> > > > Signed-off-by: Sam Li <faithilikerun@gmail.com>
> > > > ---
> > > > block/block-backend.c | 22 ++++
> > > > block/coroutines.h | 5 +
> > > > block/file-posix.c | 182 ++++++++++++++++++++++++++++++
> > > > block/io.c | 23 ++++
> > > > include/block/block-common.h | 43 ++++++-
> > > > include/block/block-io.h | 13 +++
> > > > include/block/block_int-common.h | 20 ++++
> > > > qemu-io-cmds.c | 118 +++++++++++++++++++
> > > > tests/qemu-iotests/tests/zoned.sh | 52 +++++++++
> > > > 9 files changed, 477 insertions(+), 1 deletion(-)
> > > > create mode 100644 tests/qemu-iotests/tests/zoned.sh
> > > >
> > > > diff --git a/block/block-backend.c b/block/block-backend.c
> > > > index e0e1aff4b1..20248e4a35 100644
> > > > --- a/block/block-backend.c
> > > > +++ b/block/block-backend.c
> > > > @@ -104,6 +104,8 @@ typedef struct BlockBackendAIOCB {
> > > > int ret;
> > > > } BlockBackendAIOCB;
> > > >
> > > > +
> > > > +
> > >
> > > Please avoid whitespace changes in code that is otherwise untouched by
> > > your patch. Code changes can cause merge conflicts and they make it
> > > harder to use git-annotate(1), so only changes that are necessary should
> > > be included in a patch.
> > >
> > > > static const AIOCBInfo block_backend_aiocb_info = {
> > > > .get_aio_context = blk_aiocb_get_aio_context,
> > > > .aiocb_size = sizeof(BlockBackendAIOCB),
> > > > @@ -1810,6 +1812,25 @@ int blk_flush(BlockBackend *blk)
> > > > return ret;
> > > > }
> > > >
> > >
> > > Please add a documentation comment for blk_co_zone_report() that
> > > explains how to use the functions and the purpose of the arguments. For
> > > example, does offset have to be the first byte in a zone or can it be
> > > any byte offset? What are the alignment requirements of offset and len?
> > > Why is nr_zones a pointer?
> > >
> > > > +int blk_co_zone_report(BlockBackend *blk, int64_t offset, int64_t len,
> > >
> > > Functions that run in coroutine context must be labeled with
> > > coroutine_fn:
> > >
> > > int coroutine_fn blk_co_zone_report(...)
> > >
> > > This tells humans and tools that the function can only be called from a
> > > coroutine. There is a blog post about coroutines in QEMU here:
> > > https://blog.vmsplice.net/2014/01/coroutines-in-qemu-basics.html
> > >
> > > > + int64_t *nr_zones,
> > > > + struct BlockZoneDescriptor *zones)
> > >
> > > QEMU coding style uses typedefs when defining structs, so "struct
> > > BlockZoneDescriptor *zones" should be written as "BlockZoneDescriptor
> > > *zones".
> > >
> > > > +{
> > > > + int ret;
> > >
> > > This function is called from the I/O code path, please mark it with:
> > >
> > > IO_CODE();
> > >
> > > From include/block/block-io.h:
> > >
> > > * I/O API functions. These functions are thread-safe, and therefore
> > > * can run in any thread as long as the thread has called
> > > * aio_context_acquire/release().
> > > *
> > > * These functions can only call functions from I/O and Common categories,
> > > * but can be invoked by GS, "I/O or GS" and I/O APIs.
> > > *
> > > * All functions in this category must use the macro
> > > * IO_CODE();
> > > * to catch when they are accidentally called by the wrong API.
> > >
> > > > + ret = bdrv_co_zone_report(blk->root->bs, offset, len, nr_zones, zones);
> > >
> > > Please add blk_inc_in_flight(blk) and blk_dec_in_flight(blk) around this
> > > function call to ensure that zone report requests finish before I/O is
> > > drained (see bdrv_drained_begin()). This is necessary so that it's
> > > possible to wait for I/O requests, including zone report, to complete.
> > >
> > > Similar to blk_co_do_preadv() we need blk_wait_while_drained(blk),
> > > blk_check_byte_request(), and bdrv_inc_in_flight(bs) before calling
> > > bdrv_co_zone_report(). bdrv_dec_in_flight(bs) needs to be called after
> > > bdrv_co_zone_report() returns.
> > >
> > After adding similar structure to blk_co_do_preadv(), zone operation
> > command will always fail at blk_wait_while_drained(blk) because
> > blk->inflight <= 0. Would it be ok to just remove
> > blk_wait_while_drained?
>
> Are you hitting the assertion in
> block/block-backend.c:blk_wait_while_drained()?
>
> assert(blk->in_flight > 0);
>
> If yes, then there is a bug in the code. You need to make sure that
> blk_inc_in_flight() is called before blk_wait_while_drained().
>
Right! I didn't add blk_inc_in/dec_flight() because similar
blockdriver functions in file-posix.c don't use blk_inc_in_flight much
and the location would be wrong.
> > > > + BLK_ZT_CONV = BLK_ZONE_TYPE_CONVENTIONAL,
> > > > + BLK_ZT_SWR = BLK_ZONE_TYPE_SEQWRITE_REQ,
> > > > + BLK_ZT_SWP = BLK_ZONE_TYPE_SEQWRITE_PREF,
> > > > +};
> > > > +
> > > > +enum zone_cond {
> > > > + BLK_ZS_NOT_WP = BLK_ZONE_COND_NOT_WP,
> > > > + BLK_ZS_EMPTY = BLK_ZONE_COND_EMPTY,
> > > > + BLK_ZS_IOPEN = BLK_ZONE_COND_IMP_OPEN,
> > > > + BLK_ZS_EOPEN = BLK_ZONE_COND_EXP_OPEN,
> > > > + BLK_ZS_CLOSED = BLK_ZONE_COND_CLOSED,
> > > > + BLK_ZS_RDONLY = BLK_ZONE_COND_READONLY,
> > > > + BLK_ZS_FULL = BLK_ZONE_COND_FULL,
> > > > + BLK_ZS_OFFLINE = BLK_ZONE_COND_OFFLINE,
> > > > +};
> > >
> > > This 1:1 correspondence with Linux constants could make the code a
> > > little harder to port.
> > >
> > > Maybe QEMU's block layer should define its own numeric constants so the
> > > code doesn't rely on operating system-specific headers.
> > > block/file-posix.c #ifdef __linux__ code can be responsible for
> > > converting Linux-specific constants to QEMU constants (and the 1:1
> > > mapping can be used there).
> > >
> > Can we define those constants in block-common.h? Because
> > BlockZoneDescriptor requires zone_condition, zone_type defined and
> > BlockZoneDesicriptor are used in header files and qemu-io
> > sub-commands. If we use #ifdef __linux__ in block-common.h, it can be
> > responsible for converting Linux constants instead.
> >
> > Thanks for reviewing! If there is any problem, please let me know.
>
> I suggest defining the constants in block-common.h. #ifdef __linux__ is
> not necessary in block-common.h because the constants should just be an
> enum with BLK_ZS_NOT_WP = 0 and so on (no need for Linux headers).
>
> In block/file-posix.c you can define a helper function inside #ifdef
> __linux__ that does something like:
>
> BlkZoneCond zone_cond_from_linux(enum blk_zone_cond val)
> {
> switch (val) {
> case BLK_ZONE_COND_NOT_WP:
> return BLK_ZS_NOT_WP;
> ...
> }
>
> The code in block/file-posix.c should call this helper to convert from
> Linux values to QEMU values.
>
> This way the QEMU block layer does not use Linux constants and compiles
> on non-Linux machines.
>
Thanks!
> Stefan
next prev parent reply other threads:[~2022-06-24 16:14 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-20 3:36 [RFC v2] Adding block layer APIs resembling Linux ZoneBlockDevice ioctls Sam Li
2022-06-20 7:55 ` Stefan Hajnoczi
2022-06-20 10:11 ` Damien Le Moal
2022-06-20 11:28 ` Stefan Hajnoczi
2022-06-24 3:14 ` Sam Li
2022-06-24 15:49 ` Stefan Hajnoczi
2022-06-24 16:10 ` Sam Li [this message]
2022-06-27 23:13 ` Damien Le Moal
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAAAx-8+eBON8=bKXVkdpaT8_indgpdp8JPLaXAyWQE4phFE2mw@mail.gmail.com' \
--to=faithilikerun@gmail.com \
--cc=damien.lemoal@opensource.wdc.com \
--cc=dmitry.fomichev@wdc.com \
--cc=fam@euphon.net \
--cc=hare@suse.de \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).