From: Dmitry Fomichev <Dmitry.Fomichev@wdc.com>
To: "pankydev8@gmail.com" <pankydev8@gmail.com>,
"damien.lemoal@opensource.wdc.com"
<damien.lemoal@opensource.wdc.com>
Cc: "faithilikerun@gmail.com" <faithilikerun@gmail.com>,
"hare@suse.de" <hare@suse.de>,
"virtio-dev@lists.oasis-open.org"
<virtio-dev@lists.oasis-open.org>,
"p.raghav@samsung.com" <p.raghav@samsung.com>,
"javier.gonz@samsung.com" <javier.gonz@samsung.com>,
"axboe@kernel.dk" <axboe@kernel.dk>,
"k.jensen@samsung.com" <k.jensen@samsung.com>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"stefanha@gmail.com" <stefanha@gmail.com>
Subject: Re: [PATCH 3/3] virtio-blk: add support for zoned block devices
Date: Tue, 20 Sep 2022 01:51:07 +0000 [thread overview]
Message-ID: <79186306eecace92e779dee7b2d4a86a38b5dca4.camel@wdc.com> (raw)
In-Reply-To: <44dfb47f-a59f-b236-03bf-5d25d7f206b5@opensource.wdc.com>
Hi Pankaj, Damien,
On Tue, 2022-09-20 at 08:59 +0900, Damien Le Moal wrote:
> On 9/19/22 22:41, Pankaj Raghav wrote:
> > Hi Dmitry,
> >
> > On Sun, Sep 18, 2022 at 10:29:21PM -0400, Dmitry Fomichev wrote:
> > > The zone-specific code in the patch is heavily influenced by NVMe ZNS
> > > code in drivers/nvme/host/zns.c, but it is simpler because the proposed
> > > virtio ZBD draft only covers the zoned device features that are
> > > relevant to the zoned functionality provided by Linux block layer.
> > >
> > There is a parallel work going on to support non-po2 zone sizes in Linux
> > block layer and drivers[1]. I don't see any reason why we shouldn't make
> > the calculations generic here instead of putting the constraint on zone
> > sectors to be po2 as the virtio spec also supports it.
>
> That series is not upstream, so implementing against would not be the
> correct approach, especially given that this would also impact qemu code.
>
I am aware about the effort to support non-^2 zone sizes in the kernel and this
activity actually made me drop the ^2 zone size requirement from the virtio-zbd
specification. I think that the best way to add non-^2 zone size support to this
driver could be a follow up patch to this series. This way, we won't rely on the
code that is not yet merged upstream.
<there is one more comment about your proposed changes below>
> >
> > I took a quick look, and changing the calculations from po2 specific to
> > generic will not be in the hot path and can be trivially changed. I have
> > suggested the changes inline to make the virtio blk driver zone size
> > agnostic. I haven't tested the changes but it is very
> > similar to the changes I did in the drivers/nvme/host/zns.c in my patch
> > series[2].
> >
> > [1]
> > https://lore.kernel.org/linux-block/20220912082204.51189-1-p.raghav@samsung.com/
> > [2]
> > https://lore.kernel.org/linux-block/20220912082204.51189-6-p.raghav@samsung.com/
> >
> > > Co-developed-by: Stefan Hajnoczi <stefanha@gmail.com>
> > > Signed-off-by: Stefan Hajnoczi <stefanha@gmail.com>
> > > Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
> > > ---
> > > drivers/block/virtio_blk.c | 381 ++++++++++++++++++++++++++++++--
> > > include/uapi/linux/virtio_blk.h | 106 +++++++++
> > > 2 files changed, 469 insertions(+), 18 deletions(-)
> > >
> > <snip>
> > > +#ifdef CONFIG_BLK_DEV_ZONED
> > > +static void *virtblk_alloc_report_buffer(struct virtio_blk *vblk,
> > > + unsigned int nr_zones,
> > > + unsigned int zone_sectors,
> > > + size_t *buflen)
> > > +{
> > > + struct request_queue *q = vblk->disk->queue;
> > > + size_t bufsize;
> > > + void *buf;
> > > +
> > - nr_zones = min_t(unsigned int, nr_zones,
> > - get_capacity(vblk->disk) >> ilog2(zone_sectors));
> >
> > + nr_zones = min_t(unsigned int, nr_zones,
> > + div64_u64(get_capacity(vblk->disk), zone_sectors));
> >
> > > +
> > > + bufsize = sizeof(struct virtio_blk_zone_report) +
> > > + nr_zones * sizeof(struct virtio_blk_zone_descriptor);
> > > + bufsize = min_t(size_t, bufsize,
> > > + queue_max_hw_sectors(q) << SECTOR_SHIFT);
> > > + bufsize = min_t(size_t, bufsize, queue_max_segments(q) <<
> > > PAGE_SHIFT);
> > > +
> > > + while (bufsize >= sizeof(struct virtio_blk_zone_report)) {
> > > + buf = __vmalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
> > > + if (buf) {
> > > + *buflen = bufsize;
> > > + return buf;
> > > + }
> > > + bufsize >>= 1;
> > > + }
> > > +
> > > + return NULL;
> > > +}
> > > +
> > > +static int virtblk_submit_zone_report(struct virtio_blk *vblk,
> > > + char *report_buf, size_t report_len,
> > > + sector_t sector)
> > > +{
> > > + struct request_queue *q = vblk->disk->queue;
> > > + struct request *req;
> > > + struct virtblk_req *vbr;
> > > + int err;
> > > +
> > > + req = blk_mq_alloc_request(q, REQ_OP_DRV_IN, 0);
> > > + if (IS_ERR(req))
> > > + return PTR_ERR(req);
> > > +
> > > + vbr = blk_mq_rq_to_pdu(req);
> > > + vbr->in_hdr_len = sizeof(vbr->status);
> > > + vbr->out_hdr.type = cpu_to_virtio32(vblk->vdev,
> > > VIRTIO_BLK_T_ZONE_REPORT);
> > > + vbr->out_hdr.sector = cpu_to_virtio64(vblk->vdev, sector);
> > > +
> > > + err = blk_rq_map_kern(q, req, report_buf, report_len, GFP_KERNEL);
> > > + if (err)
> > > + goto out;
> > > +
> > > + blk_execute_rq(req, false);
> > > + err = blk_status_to_errno(virtblk_result(vbr->status));
> > > +out:return -ENODEV;
> > > + blk_mq_free_request(req);
> > > + return err;
> > > +}
> > > +
> > > +static int virtblk_parse_zone(struct virtio_blk *vblk,
> > > + struct virtio_blk_zone_descriptor *entry,
> > > + unsigned int idx, unsigned int zone_sectors,
> > > + report_zones_cb cb, void *data)
> > > +{
> > > + struct blk_zone zone = { };
> > > +
> > > + if (entry->z_type != VIRTIO_BLK_ZT_SWR &&
> > > + entry->z_type != VIRTIO_BLK_ZT_SWP &&
> > > + entry->z_type != VIRTIO_BLK_ZT_CONV) {
> > > + dev_err(&vblk->vdev->dev, "invalid zone type %#x\n",
> > > + entry->z_type);
> > > + return -EINVAL;
> > > + }
> > > +
> > > + zone.type = entry->z_type;
> > > + zone.cond = entry->z_state;
> > > + zone.len = zone_sectors;
> > > + zone.capacity = le64_to_cpu(entry->z_cap);
> > > + zone.start = le64_to_cpu(entry->z_start);
> > > + if (zone.cond == BLK_ZONE_COND_FULL)
> > > + zone.wp = zone.start + zone.len;
> > > + else
> > > + zone.wp = le64_to_cpu(entry->z_wp);
> > > +
> > > + return cb(&zone, idx, data);
> > > +}
> > > +
> > > +static int virtblk_report_zones(struct gendisk *disk, sector_t sector,
> > > + unsigned int nr_zones, report_zones_cb cb,
> > > + void *data)
> > > +{
> > > + struct virtio_blk *vblk = disk->private_data;
> > > + struct virtio_blk_zone_report *report;
> > > + unsigned int zone_sectors = vblk->zone_sectors;
> > > + unsigned int nz, i;
> > > + int ret, zone_idx = 0;
> > > + size_t buflen;
> > + u64 remainder;
> > > +
> > > + if (WARN_ON_ONCE(!vblk->zone_sectors))
> > > + return -EOPNOTSUPP;
> > > +
> > > + report = virtblk_alloc_report_buffer(vblk, nr_zones,
> > > + zone_sectors, &buflen);
> > > + if (!report)
> > > + return -ENOMEM;
> > > +
> > - sector &= ~(zone_sectors - 1);
> >
> > + div64_u64_rem(sector, zone_sectors, &remainder);
> > + sector -= remainder;
> > > + while (zone_idx < nr_zones && sector < get_capacity(vblk->disk)) {
> > > + memset(report, 0, buflen);
> > > +
> > > + ret = virtblk_submit_zone_report(vblk, (char *)report,
> > > + buflen, sector);
> > > + if (ret) {
> > > + if (ret > 0)
> > > + ret = -EIO;
> > > + goto out_free;
> > > + }
> > > +
> > > + nz = min((unsigned int)le64_to_cpu(report->nr_zones),
> > > nr_zones);
> > > + if (!nz)
> > > + break;
> > > +
> > > + for (i = 0; i < nz && zone_idx < nr_zones; i++) {
> > > + ret = virtblk_parse_zone(vblk, &report->zones[i],
> > > + zone_idx, zone_sectors,
> > > cb, data);
> > > + if (ret)
> > > + goto out_free;
> > > + zone_idx++;
> > > + }
> > > +
> > > + sector += zone_sectors * nz;
> > > + }
> > > +
> > > + if (zone_idx > 0)
> > > + ret = zone_idx;
> > > + else
> > > + ret = -EINVAL;
> > > +out_free:
> > > + kvfree(report);
> > > + return ret;
> > > +}
> > > +
> > <snip>
> > > +static int virtblk_probe_zoned_device(struct virtio_device *vdev,
> > > + struct virtio_blk *vblk,
> > > + struct request_queue *q)
> > > +{
> > <snip>
> > > + blk_queue_physical_block_size(q, le32_to_cpu(v));
> > > + blk_queue_io_min(q, le32_to_cpu(v));
> > > +
> > > + dev_dbg(&vdev->dev, "write granularity = %u\n", le32_to_cpu(v));
> > > +
> > - /*
> > - * virtio ZBD specification doesn't require zones to be a power of
> > - * two sectors in size, but the code in this driver expects that.
> > - */
> > - virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v);
> > - if (v == 0 || !is_power_of_2(v)) {
> > - dev_err(&vdev->dev,
> > - "zoned device with non power of two zone size %u\n",
> > v);
> > - return -ENODEV;
> > - }
This part can't be omitted entirely because it contains the call to read
zone_sectors value from virtio-blk configuration space. This code instead could
be changed to something like
virtio_cread(vdev, struct virtio_blk_config, zoned.zone_sectors, &v);
if (v == 0) {
dev_err(&vdev->dev,
"zoned device with zero zone size %u\n", v);
return -ENODEV;
}
DF
> > > +
> > > + dev_dbg(&vdev->dev, "zone sectors = %u\n", le32_to_cpu(v));
> > > + vblk->zone_sectors = le32_to_cpu(v);
> > > +
> > > + if (virtio_has_feature(vdev, VIRTIO_BLK_F_DISCARD)) {
> > > + dev_warn(&vblk->vdev->dev,
> > > + "ignoring negotiated F_DISCARD for zoned
> > > device\n");
> > > + blk_queue_max_discard_sectors(q, 0);
> > > + }
> > > +
> > > + ret = blk_revalidate_disk_zones(vblk->disk, NULL);
> > > + if (!ret) {
> > > + virtio_cread(vdev, struct virtio_blk_config,
> > > + zoned.max_append_sectors, &v);
> > > + if (!v) {
> > > + dev_warn(&vdev->dev, "zero max_append_sectors
> > > reported\n");
> > > + return -ENODEV;
> > > + }
> > > + blk_queue_max_zone_append_sectors(q, le32_to_cpu(v));
> > > + dev_dbg(&vdev->dev, "max append sectors = %u\n",
> > > le32_to_cpu(v));
> > > +
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> >
>
next prev parent reply other threads:[~2022-09-20 1:51 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-19 2:29 [PATCH 0/3] virtio-blk: support zoned block devices Dmitry Fomichev
2022-09-19 2:29 ` [PATCH 1/3] virtio-blk: use a helper to handle request queuing errors Dmitry Fomichev
2022-09-19 2:29 ` [PATCH 2/3] virtio-blk: add a placeholder for secure erase config Dmitry Fomichev
2022-09-19 2:29 ` [PATCH 3/3] virtio-blk: add support for zoned block devices Dmitry Fomichev
2022-09-19 13:41 ` Pankaj Raghav
2022-09-19 23:59 ` Damien Le Moal
2022-09-20 1:51 ` Dmitry Fomichev [this message]
2022-09-20 8:04 ` Pankaj Raghav
2022-09-20 7:43 ` [PATCH 0/3] virtio-blk: support " Christoph Hellwig
2022-09-20 10:41 ` Stefan Hajnoczi
2022-09-21 2:33 ` Dmitry Fomichev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=79186306eecace92e779dee7b2d4a86a38b5dca4.camel@wdc.com \
--to=dmitry.fomichev@wdc.com \
--cc=axboe@kernel.dk \
--cc=damien.lemoal@opensource.wdc.com \
--cc=faithilikerun@gmail.com \
--cc=hare@suse.de \
--cc=javier.gonz@samsung.com \
--cc=k.jensen@samsung.com \
--cc=linux-block@vger.kernel.org \
--cc=p.raghav@samsung.com \
--cc=pankydev8@gmail.com \
--cc=stefanha@gmail.com \
--cc=virtio-dev@lists.oasis-open.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox