qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Damien Le Moal <damien.lemoal@opensource.wdc.com>
To: Sam Li <faithilikerun@gmail.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Hannes Reinecke <hare@suse.de>, Hanna Reitz <hreitz@redhat.com>,
	Dmitry Fomichev <dmitry.fomichev@wdc.com>,
	Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	qemu block <qemu-block@nongnu.org>
Subject: Re: [RFC v3 1/5] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls.
Date: Wed, 29 Jun 2022 10:43:13 +0900	[thread overview]
Message-ID: <20a3234d-eb6b-ee21-95d5-5ce18aa6c822@opensource.wdc.com> (raw)
In-Reply-To: <CAAAx-8+6q9zLGO2Xzi9JaNFgkpHn0-eQyB8GijGx53zbFtsDCQ@mail.gmail.com>

On 6/28/22 19:23, Sam Li wrote:
> Damien Le Moal <damien.lemoal@opensource.wdc.com> 于2022年6月28日周二 17:05写道:
>>
>> On 6/28/22 17:05, Sam Li wrote:
>>> Stefan Hajnoczi <stefanha@redhat.com> 于2022年6月28日周二 14:52写道:
>>>>
>>>> On Mon, Jun 27, 2022 at 08:19:13AM +0800, Sam Li wrote:
>>>>> diff --git a/block/block-backend.c b/block/block-backend.c
>>>>> index e0e1aff4b1..786f964d02 100644
>>>>> --- a/block/block-backend.c
>>>>> +++ b/block/block-backend.c
>>>>> @@ -1810,6 +1810,62 @@ int blk_flush(BlockBackend *blk)
>>>>>      return ret;
>>>>>  }
>>>>>
>>>>> +/*
>>>>> + * Return zone_report from BlockDriver. Offset can be any number within
>>>>> + * the zone size. No alignment for offset and len.
>>>>
>>>> What is the purpose of len? Is it the maximum number of zones to return
>>>> in nr_zones[]?
>>>
>>> len is actually not used in bdrv_co_zone_report. It is needed by
>>> blk_check_byte_request.
>>
>> There is no IO buffer being passed. Only an array of zone descriptors and
>> an in-out number of zones. len is definitely not needed. Not sure what
>> blk_check_byte_request() is intended to check though.
> 
> Can I just remove len argument and blk_check_byte_request() if it's not used?

If it is unused, yes, you must remove it.

> 
>>
>>>
>>>> How does the caller know how many zones were returned?
>>>
>>> nr_zones represents IN maximum and OUT actual. The caller will know by
>>> nr_zones which is changed in bdrv_co_zone_report. I'll add it in the
>>> comments.
>>>
>>>>
>>>>> + */
>>>>> +int coroutine_fn blk_co_zone_report(BlockBackend *blk, int64_t offset,
>>>>> +                       int64_t len, int64_t *nr_zones,
>>>>> +                       BlockZoneDescriptor *zones)
>>>>> +{
>>>>> +    int ret;
>>>>> +    BlockDriverState *bs;
>>>>> +    IO_CODE();
>>>>> +
>>>>> +    blk_inc_in_flight(blk); /* increase before waiting */
>>>>> +    blk_wait_while_drained(blk);
>>>>> +    bs = blk_bs(blk);
>>>>> +
>>>>> +    ret = blk_check_byte_request(blk, offset, len);
>>>>> +    if (ret < 0) {
>>>>> +        return ret;
>>>>> +    }
>>>>> +
>>>>> +    bdrv_inc_in_flight(bs);
>>>>
>>>> The bdrv_inc/dec_in_flight() call should be inside
>>>> bdrv_co_zone_report(). See bdrv_co_ioctl() for an example.
>>>>
>>>>> +    ret = bdrv_co_zone_report(blk->root->bs, offset, len,
>>>>> +                              nr_zones, zones);
>>>>> +    bdrv_dec_in_flight(bs);
>>>>> +    blk_dec_in_flight(blk);
>>>>> +    return ret;
>>>>> +}
>>>>> +
>>>>> +/*
>>>>> + * Return zone_mgmt from BlockDriver.
>>>>
>>>> Maybe this should be:
>>>>
>>>>   Send a zone management command.
>>>
>>> Yes, it's more accurate.
>>>
>>>>
>>>>> @@ -216,6 +217,11 @@ typedef struct RawPosixAIOData {
>>>>>              PreallocMode prealloc;
>>>>>              Error **errp;
>>>>>          } truncate;
>>>>> +        struct {
>>>>> +            int64_t *nr_zones;
>>>>> +            BlockZoneDescriptor *zones;
>>>>> +        } zone_report;
>>>>> +        zone_op op;
>>>>
>>>> It's cleaner to put op inside a struct zone_mgmt so its purpose is
>>>> self-explanatory:
>>>>
>>>>   struct {
>>>>       zone_op op;
>>>>   } zone_mgmt;
>>>>
>>>>> +static int handle_aiocb_zone_report(void *opaque) {
>>>>> +    RawPosixAIOData *aiocb = opaque;
>>>>> +    int fd = aiocb->aio_fildes;
>>>>> +    int64_t *nr_zones = aiocb->zone_report.nr_zones;
>>>>> +    BlockZoneDescriptor *zones = aiocb->zone_report.zones;
>>>>> +    int64_t offset = aiocb->aio_offset;
>>>>> +    int64_t len = aiocb->aio_nbytes;
>>
>> Since you have the zone array and number of zones (size of that array) I
>> really do not see why you need len.
>>
>>>>> +
>>>>> +    struct blk_zone *blkz;
>>>>> +    int64_t rep_size, nrz;
>>>>> +    int ret, n = 0, i = 0;
>>>>> +
>>>>> +    nrz = *nr_zones;
>>>>> +    if (len == -1) {
>>>>> +        return -errno;
>>>>
>>>> Where is errno set? Should this be an errno constant instead like
>>>> -EINVAL?
>>>
>>> That's right! Noted.
>>>
>>>>
>>>>> +    }
>>>>> +    rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
>>>>> +    g_autofree struct blk_zone_report *rep = g_new(struct blk_zone_report, nrz);
>>>>
>>>> g_new() looks incorrect. There should be 1 struct blk_zone_report
>>>> followed by nrz struct blk_zone structs. Please use g_malloc(rep_size)
>>>> instead.
>>>
>>> Yes! However, it still has a memory leak error when using g_autofree
>>> && g_malloc.
>>
>> That may be because you are changing the value of the rep pointer while
>> parsing the report ?
> 
> I am not sure it is the case. Can you show me some way to find the problem?

Not sure. I never used this g_malloc()/g_autofree() before so not sure how
it works. It may be that g_autofree() work only with g_new() ?
Could you try separating the declaration and allocation ? e.g.

Declare at the beginning of the function:
g_autofree struct blk_zone_report *rep = NULL;

And then when needed do:

rep_size = sizeof(struct blk_zone_report) + nrz * sizeof(struct blk_zone);
rep = g_malloc(rep_size);

> 
> Thanks for reviewing!
> 
> 
>>
>>>
>>>>
>>>>> +    offset = offset / 512; /* get the unit of the start sector: sector size is 512 bytes. */
>>>>> +    printf("start to report zone with offset: 0x%lx\n", offset);
>>>>> +
>>>>> +    blkz = (struct blk_zone *)(rep + 1);
>>>>> +    while (n < nrz) {
>>>>> +        memset(rep, 0, rep_size);
>>>>> +        rep->sector = offset;
>>>>> +        rep->nr_zones = nrz;
>>>>
>>>> What prevents zones[] overflowing? nrz isn't being reduced in the loop,
>>>> so maybe the rep->nr_zones return value will eventually exceed the
>>>> number of elements still available in zones[n..]?
>>>
>>> I suppose the number of zones[] is restricted in the subsequent for
>>> loop where zones[] copy one zone at a time as n increases. Even if
>>> rep->zones exceeds the available room in zones[], the extra zone will
>>> not be copied.
>>>
>>>>
>>>>> +static int handle_aiocb_zone_mgmt(void *opaque) {
>>>>> +    RawPosixAIOData *aiocb = opaque;
>>>>> +    int fd = aiocb->aio_fildes;
>>>>> +    int64_t offset = aiocb->aio_offset;
>>>>> +    int64_t len = aiocb->aio_nbytes;
>>>>> +    zone_op op = aiocb->op;
>>>>> +
>>>>> +    struct blk_zone_range range;
>>>>> +    const char *ioctl_name;
>>>>> +    unsigned long ioctl_op;
>>>>> +    int64_t zone_size;
>>>>> +    int64_t zone_size_mask;
>>>>> +    int ret;
>>>>> +
>>>>> +    ret = ioctl(fd, BLKGETZONESZ, &zone_size);
>>>>
>>>> Can this value be stored in bs (maybe in BlockLimits) to avoid calling
>>>> ioctl(BLKGETZONESZ) each time?
>>>
>>> Yes, zone_size is in the zbd_dev field. I'll update BlockLimits after
>>> I think through the configurations. In addition, it's a temporary
>>> approach. It is substituted by get_sysfs_long_val now.
>>>
>>>>
>>>>> +    if (ret) {
>>>>> +        return -1;
>>>>
>>>> The return value should be a negative errno. -1 is -EPERM but it's
>>>> probably not that error code you wanted. This should be:
>>>>
>>>>   return -errno;
>>>>
>>>>> +    }
>>>>> +
>>>>> +    zone_size_mask = zone_size - 1;
>>>>> +    if (offset & zone_size_mask) {
>>>>> +        error_report("offset is not the start of a zone");
>>>>> +        return -1;
>>>>
>>>> return -EINVAL;
>>>>
>>>>> +    }
>>>>> +
>>>>> +    if (len & zone_size_mask) {
>>>>> +        error_report("len is not aligned to zones");
>>>>> +        return -1;
>>>>
>>>> return -EINVAL;
>>>>
>>>>> +static int coroutine_fn raw_co_zone_report(BlockDriverState *bs, int64_t offset,
>>>>> +        int64_t len, int64_t *nr_zones,
>>>>> +        BlockZoneDescriptor *zones) {
>>>>> +    BDRVRawState *s = bs->opaque;
>>>>> +    RawPosixAIOData acb;
>>>>> +
>>>>> +    acb = (RawPosixAIOData) {
>>>>> +        .bs         = bs,
>>>>> +        .aio_fildes = s->fd,
>>>>> +        .aio_type   = QEMU_AIO_IOCTL,
>>>>> +        .aio_offset = offset,
>>>>> +        .aio_nbytes = len,
>>>>> +        .zone_report    = {
>>>>> +                .nr_zones       = nr_zones,
>>>>> +                .zones          = zones,
>>>>> +                },
>>>>
>>>> Indentation is off here. Please use 4-space indentation.
>>> Noted!
>>>
>>> Thanks for reviewing!
>>>
>>> Sam
>>
>>
>> --
>> Damien Le Moal
>> Western Digital Research


-- 
Damien Le Moal
Western Digital Research


  reply	other threads:[~2022-06-29  1:44 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-27  0:19 [RFC v3 0/5] *** Add support for zoned device *** Sam Li
2022-06-27  0:19 ` [RFC v3 1/5] block: add block layer APIs resembling Linux ZonedBlockDevice ioctls Sam Li
2022-06-27  7:21   ` Hannes Reinecke
2022-06-27  7:45     ` Sam Li
2022-06-27 14:15   ` Stefan Hajnoczi
2022-06-28  8:05     ` Sam Li
2022-06-28  9:05       ` Damien Le Moal
2022-06-28 10:23         ` Sam Li
2022-06-29  1:43           ` Damien Le Moal [this message]
2022-06-29  1:50             ` Sam Li
2022-06-29  2:32               ` Damien Le Moal
2022-06-29  2:35                 ` Sam Li
2022-06-27  0:19 ` [RFC v3 2/5] qemu-io: add zoned block device operations Sam Li
2022-06-27  7:30   ` Hannes Reinecke
2022-06-27  8:31     ` Sam Li
2022-06-28  7:56   ` Stefan Hajnoczi
2022-06-28  9:02     ` Sam Li
2022-06-28  9:11     ` Damien Le Moal
2022-06-28 10:27       ` Sam Li
2022-06-27  0:19 ` [RFC v3 3/5] file-posix: introduce get_sysfs_long_val for zoned device information Sam Li
2022-06-27  7:31   ` Hannes Reinecke
2022-06-27  8:32     ` Sam Li
2022-06-28  8:09   ` Stefan Hajnoczi
2022-06-28  9:18     ` Sam Li
2022-06-27  0:19 ` [RFC v3 4/5] file-posix: introduce get_sysfs_str_val for device zoned model Sam Li
2022-06-27  7:41   ` Hannes Reinecke
2022-06-27  8:44     ` Sam Li
2022-06-27  0:19 ` [RFC v3 5/5] qemu-iotests: add zone operation tests Sam Li
2022-06-27  7:42   ` Hannes Reinecke
2022-06-28  8:19   ` Stefan Hajnoczi
2022-06-28  9:34     ` Sam Li
2022-06-30 10:11       ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20a3234d-eb6b-ee21-95d5-5ce18aa6c822@opensource.wdc.com \
    --to=damien.lemoal@opensource.wdc.com \
    --cc=dmitry.fomichev@wdc.com \
    --cc=faithilikerun@gmail.com \
    --cc=fam@euphon.net \
    --cc=hare@suse.de \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).