Re: [RFC v2] Adding block layer APIs resembling Linux ZoneBlockDevice ioctls.

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Stefan Hajnoczi <stefanha@redhat.com>, Sam Li <faithilikerun@gmail.com>
Cc: qemu-devel <qemu-devel@nongnu.org>,
	Dmitry Fomichev <dmitry.fomichev@wdc.com>,
	Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu block <qemu-block@nongnu.org>,
	Hanna Reitz <hreitz@redhat.com>, Hannes Reinecke <hare@suse.de>
Subject: Re: [RFC v2] Adding block layer APIs resembling Linux ZoneBlockDevice ioctls.
Date: Tue, 28 Jun 2022 08:13:52 +0900	[thread overview]
Message-ID: <1b8eeb70-8068-da8b-191a-fd603cd2d08b@opensource.wdc.com> (raw)
In-Reply-To: <YrXdJQe+KiRcM5RN@stefanha-x1.localdomain>

On 6/25/22 00:49, Stefan Hajnoczi wrote:
> On Fri, Jun 24, 2022 at 11:14:32AM +0800, Sam Li wrote:
>> Hi Stefan,
>>
>> Stefan Hajnoczi <stefanha@redhat.com> 于2022年6月20日周一 15:55写道：
>>>
>>> On Mon, Jun 20, 2022 at 11:36:11AM +0800, Sam Li wrote:
>>>
>>> Hi Sam,
>>> Is this version 2 of "[RFC v1] Add support for zoned device"? Please
>>> keep the email subject line the same (except for "v2", "v3", etc) so
>>> that it's clear which patch series this new version replaces.
>>>
>>>> Fix some mistakes before. It can report a range of zones now.
>>>
>>> This looks like the description of what changed compared to v1. Please
>>> put the changelog below "---" in the future. When patch emails are
>>> merged by git-am(1) it keeps the text above "---" and discards the text
>>> below "---". The changelog is usually no longer useful once the patches
>>> are merged, so it should be located below the "---" line.
>>>
>>> The text above the "---" is the commit description (an explanation of
>>> why this commit is necessary). In this case the commit description
>>> should explain that this patch adds .bdrv_co_zone_report() and
>>> .bdrv_co_zone_mgmt() to BlockDriver so that zoned block devices can be
>>> supported.
>>>
>>>>
>>>> Signed-off-by: Sam Li <faithilikerun@gmail.com>
>>>> ---
>>>>  block/block-backend.c             |  22 ++++
>>>>  block/coroutines.h                |   5 +
>>>>  block/file-posix.c                | 182 ++++++++++++++++++++++++++++++
>>>>  block/io.c                        |  23 ++++
>>>>  include/block/block-common.h      |  43 ++++++-
>>>>  include/block/block-io.h          |  13 +++
>>>>  include/block/block_int-common.h  |  20 ++++
>>>>  qemu-io-cmds.c                    | 118 +++++++++++++++++++
>>>>  tests/qemu-iotests/tests/zoned.sh |  52 +++++++++
>>>>  9 files changed, 477 insertions(+), 1 deletion(-)
>>>>  create mode 100644 tests/qemu-iotests/tests/zoned.sh
>>>>
>>>> diff --git a/block/block-backend.c b/block/block-backend.c
>>>> index e0e1aff4b1..20248e4a35 100644
>>>> --- a/block/block-backend.c
>>>> +++ b/block/block-backend.c
>>>> @@ -104,6 +104,8 @@ typedef struct BlockBackendAIOCB {
>>>>      int ret;
>>>>  } BlockBackendAIOCB;
>>>>
>>>> +
>>>> +
>>>
>>> Please avoid whitespace changes in code that is otherwise untouched by
>>> your patch. Code changes can cause merge conflicts and they make it
>>> harder to use git-annotate(1), so only changes that are necessary should
>>> be included in a patch.
>>>
>>>>  static const AIOCBInfo block_backend_aiocb_info = {
>>>>      .get_aio_context = blk_aiocb_get_aio_context,
>>>>      .aiocb_size = sizeof(BlockBackendAIOCB),
>>>> @@ -1810,6 +1812,25 @@ int blk_flush(BlockBackend *blk)
>>>>      return ret;
>>>>  }
>>>>
>>>
>>> Please add a documentation comment for blk_co_zone_report() that
>>> explains how to use the functions and the purpose of the arguments. For
>>> example, does offset have to be the first byte in a zone or can it be
>>> any byte offset? What are the alignment requirements of offset and len?
>>> Why is nr_zones a pointer?
>>>
>>>> +int blk_co_zone_report(BlockBackend *blk, int64_t offset, int64_t len,
>>>
>>> Functions that run in coroutine context must be labeled with
>>> coroutine_fn:
>>>
>>>     int coroutine_fn blk_co_zone_report(...)
>>>
>>> This tells humans and tools that the function can only be called from a
>>> coroutine. There is a blog post about coroutines in QEMU here:
>>> https://blog.vmsplice.net/2014/01/coroutines-in-qemu-basics.html
>>>
>>>> +                       int64_t *nr_zones,
>>>> +                       struct BlockZoneDescriptor *zones)
>>>
>>> QEMU coding style uses typedefs when defining structs, so "struct
>>> BlockZoneDescriptor *zones" should be written as "BlockZoneDescriptor
>>> *zones".
>>>
>>>> +{
>>>> +    int ret;
>>>
>>> This function is called from the I/O code path, please mark it with:
>>>
>>>   IO_CODE();
>>>
>>> From include/block/block-io.h:
>>>
>>>   * I/O API functions. These functions are thread-safe, and therefore
>>>   * can run in any thread as long as the thread has called
>>>   * aio_context_acquire/release().
>>>   *
>>>   * These functions can only call functions from I/O and Common categories,
>>>   * but can be invoked by GS, "I/O or GS" and I/O APIs.
>>>   *
>>>   * All functions in this category must use the macro
>>>   * IO_CODE();
>>>   * to catch when they are accidentally called by the wrong API.
>>>
>>>> +    ret = bdrv_co_zone_report(blk->root->bs, offset, len, nr_zones, zones);
>>>
>>> Please add blk_inc_in_flight(blk) and blk_dec_in_flight(blk) around this
>>> function call to ensure that zone report requests finish before I/O is
>>> drained (see bdrv_drained_begin()). This is necessary so that it's
>>> possible to wait for I/O requests, including zone report, to complete.
>>>
>>> Similar to blk_co_do_preadv() we need blk_wait_while_drained(blk),
>>> blk_check_byte_request(), and bdrv_inc_in_flight(bs) before calling
>>> bdrv_co_zone_report(). bdrv_dec_in_flight(bs) needs to be called after
>>> bdrv_co_zone_report() returns.
>>>
>> After adding similar structure to blk_co_do_preadv(), zone operation
>> command will always fail at blk_wait_while_drained(blk) because
>> blk->inflight <= 0. Would it be ok to just remove
>> blk_wait_while_drained?
> 
> Are you hitting the assertion in
> block/block-backend.c:blk_wait_while_drained()?
> 
>   assert(blk->in_flight > 0);
> 
> If yes, then there is a bug in the code. You need to make sure that
> blk_inc_in_flight() is called before blk_wait_while_drained().
> 
>>>> +    BLK_ZT_CONV = BLK_ZONE_TYPE_CONVENTIONAL,
>>>> +    BLK_ZT_SWR = BLK_ZONE_TYPE_SEQWRITE_REQ,
>>>> +    BLK_ZT_SWP = BLK_ZONE_TYPE_SEQWRITE_PREF,
>>>> +};
>>>> +
>>>> +enum zone_cond {
>>>> +    BLK_ZS_NOT_WP = BLK_ZONE_COND_NOT_WP,
>>>> +    BLK_ZS_EMPTY = BLK_ZONE_COND_EMPTY,
>>>> +    BLK_ZS_IOPEN = BLK_ZONE_COND_IMP_OPEN,
>>>> +    BLK_ZS_EOPEN = BLK_ZONE_COND_EXP_OPEN,
>>>> +    BLK_ZS_CLOSED = BLK_ZONE_COND_CLOSED,
>>>> +    BLK_ZS_RDONLY = BLK_ZONE_COND_READONLY,
>>>> +    BLK_ZS_FULL = BLK_ZONE_COND_FULL,
>>>> +    BLK_ZS_OFFLINE = BLK_ZONE_COND_OFFLINE,
>>>> +};
>>>
>>> This 1:1 correspondence with Linux constants could make the code a
>>> little harder to port.
>>>
>>> Maybe QEMU's block layer should define its own numeric constants so the
>>> code doesn't rely on operating system-specific headers.
>>> block/file-posix.c #ifdef __linux__ code can be responsible for
>>> converting Linux-specific constants to QEMU constants (and the 1:1
>>> mapping can be used there).
>>>
>> Can we define those constants in block-common.h? Because
>> BlockZoneDescriptor requires zone_condition, zone_type defined and
>> BlockZoneDesicriptor are used in header files and qemu-io
>> sub-commands. If we use #ifdef __linux__ in block-common.h, it can be
>> responsible for converting Linux constants instead.
>>
>> Thanks for reviewing! If there is any problem, please let me know.
> 
> I suggest defining the constants in block-common.h. #ifdef __linux__ is
> not necessary in block-common.h because the constants should just be an
> enum with BLK_ZS_NOT_WP = 0 and so on (no need for Linux headers).
> 
> In block/file-posix.c you can define a helper function inside #ifdef
> __linux__ that does something like:
> 
>   BlkZoneCond zone_cond_from_linux(enum blk_zone_cond val)
>   {
>       switch (val) {
>       case BLK_ZONE_COND_NOT_WP:
>           return BLK_ZS_NOT_WP;
>       ...
>   }
> 
> The code in block/file-posix.c should call this helper to convert from
> Linux values to QEMU values.

Given that the entire zone API is Linux specific (as far as I know), we do
not need to have these helpers: the entire code for zones needs to be
under a #ifdef __linux__.

But the conversion from Linux struct blk_zone to qemu zone descriptor
still needs to be done. And the perfect place to do this is the
parse_zone() function. There, we can add:

	switch (blkz->cond) {
	case BLK_ZONE_COND_NOT_WP:
		zone->cond = BLK_ZS_NOT_WP;
		break;
	...
	}

And same for zone type. That will also allow checking the values returned
by Linux. ZBC-2 defines more zone types and zone conditions than currently
defined in /usr/include/linux/blkzoned.h. If these new zone
types/conditions ever get supported by Linux, qemu can catch the values it
does not support and reject the drive.

> 
> This way the QEMU block layer does not use Linux constants and compiles
> on non-Linux machines.
> 
> Stefan


-- 
Damien Le Moal
Western Digital Research

     prev parent reply	other threads:[~2022-06-27 23:15 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-20  3:36 [RFC v2] Adding block layer APIs resembling Linux ZoneBlockDevice ioctls Sam Li
2022-06-20  7:55 ` Stefan Hajnoczi
2022-06-20 10:11   ` Damien Le Moal
2022-06-20 11:28     ` Stefan Hajnoczi
2022-06-24  3:14   ` Sam Li
2022-06-24 15:49     ` Stefan Hajnoczi
2022-06-24 16:10       ` Sam Li
2022-06-27 23:13       ` Damien Le Moal [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b8eeb70-8068-da8b-191a-fd603cd2d08b@opensource.wdc.com \
    --to=damien.lemoal@opensource.wdc.com \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=fam@euphon.net \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).