From: "Michael S. Tsirkin" <mst@redhat.com>
To: Stefano Garzarella <sgarzare@redhat.com>
Cc: qemu-devel@nongnu.org, Kevin Wolf <kwolf@redhat.com>,
Eduardo Habkost <ehabkost@redhat.com>,
Laurent Vivier <lvivier@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Max Reitz <mreitz@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
Thomas Huth <thuth@redhat.com>,
qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v2 3/5] virtio-blk: add DISCARD and WRITE ZEROES features
Date: Thu, 31 Jan 2019 11:04:10 -0500 [thread overview]
Message-ID: <20190131103056-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20190131151914.164903-4-sgarzare@redhat.com>
On Thu, Jan 31, 2019 at 04:19:12PM +0100, Stefano Garzarella wrote:
> This patch adds the support of DISCARD and WRITE ZEROES commands,
> that have been introduced in the virtio-blk protocol to have
> better performance when using SSD backend.
>
> We support only one segment per request since multiple segments
> are not widely used and there are no userspace APIs that allow
> applications to submit multiple segments in a single call.
>
> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
This does not seem to match the spec clarifications
that Stefan Hajnoczi has posted on
the virtio TC.
I think this can go any of the following ways:
- we agree that a request should have at most one segment
no padding allowed
- we agree that a request should have at most one segment
we require padding to 512 bytes
- we agree that a request should have at most one segment
we also support padding to 512 bytes
- we agree that a request should have at most one segment
we also support arbitrary padding
- we agree that a request can have any # of segments
I would also need your feedback on whether all this
is a material change to 1.1 public review according to the oasis definition:
"Material Change" is any change to the content of a Work Product that
that would require a compliant application or implementation to be
modified or rewritten in order to remain compliant or which adds new
features or otherwise expands the scope of the work product.
> ---
> hw/block/virtio-blk.c | 173 +++++++++++++++++++++++++++++++++
> include/hw/virtio/virtio-blk.h | 2 +
> 2 files changed, 175 insertions(+)
>
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 542ec52536..34ee676895 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -147,6 +147,30 @@ out:
> aio_context_release(blk_get_aio_context(s->conf.conf.blk));
> }
>
> +static void virtio_blk_discard_wzeroes_complete(void *opaque, int ret)
> +{
> + VirtIOBlockReq *req = opaque;
> + VirtIOBlock *s = req->dev;
> + bool is_wzeroes = (virtio_ldl_p(VIRTIO_DEVICE(req->dev), &req->out.type) &
> + ~VIRTIO_BLK_T_BARRIER) == VIRTIO_BLK_T_WRITE_ZEROES;
> +
> + aio_context_acquire(blk_get_aio_context(s->conf.conf.blk));
> + if (ret) {
> + if (virtio_blk_handle_rw_error(req, -ret, 0, is_wzeroes)) {
> + goto out;
> + }
> + }
> +
> + virtio_blk_req_complete(req, VIRTIO_BLK_S_OK);
> + if (is_wzeroes) {
> + block_acct_done(blk_get_stats(req->dev->blk), &req->acct);
> + }
> + virtio_blk_free_request(req);
> +
> +out:
> + aio_context_release(blk_get_aio_context(s->conf.conf.blk));
> +}
> +
> #ifdef __linux__
>
> typedef struct {
> @@ -480,6 +504,82 @@ static bool virtio_blk_sect_range_ok(VirtIOBlock *dev,
> return true;
> }
>
> +static uint8_t virtio_blk_handle_dwz(VirtIOBlockReq *req, bool is_wzeroes,
> + struct virtio_blk_discard_write_zeroes *dwz_hdr)
> +{
> + VirtIOBlock *s = req->dev;
> + uint64_t sector;
> + uint32_t num_sectors, flags;
> + uint8_t err_status;
> + int bytes;
> +
> + sector = virtio_ldq_p(VIRTIO_DEVICE(req->dev), &dwz_hdr->sector);
> + num_sectors = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &dwz_hdr->num_sectors);
> + flags = virtio_ldl_p(VIRTIO_DEVICE(req->dev), &dwz_hdr->flags);
> +
> + /*
> + * dwz_max_sectors is at most BDRV_REQUEST_MAX_SECTORS, this check
> + * make us sure that "num_sectors << BDRV_SECTOR_BITS" can fit in
> + * the integer variable.
> + */
> + if (unlikely(num_sectors > s->conf.dwz_max_sectors)) {
> + err_status = VIRTIO_BLK_S_IOERR;
> + goto err;
> + }
> +
> + bytes = num_sectors << BDRV_SECTOR_BITS;
> +
> + if (unlikely(!virtio_blk_sect_range_ok(req->dev, sector, bytes))) {
> + err_status = VIRTIO_BLK_S_IOERR;
> + goto err;
> + }
> +
> + /*
> + * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for discard
> + * and write zeroes commands if any unknown flag is set.
> + */
> + if (unlikely(flags & ~VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
> + err_status = VIRTIO_BLK_S_UNSUPP;
> + goto err;
> + }
> +
> + if (is_wzeroes) { /* VIRTIO_BLK_T_WRITE_ZEROES */
> + int blk_aio_flags = 0;
> +
> + if (s->conf.wz_may_unmap &&
> + flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP) {
> + blk_aio_flags |= BDRV_REQ_MAY_UNMAP;
> + }
> +
> + block_acct_start(blk_get_stats(req->dev->blk), &req->acct, bytes,
> + BLOCK_ACCT_WRITE);
> +
> + blk_aio_pwrite_zeroes(req->dev->blk, sector << BDRV_SECTOR_BITS,
> + bytes, blk_aio_flags,
> + virtio_blk_discard_wzeroes_complete, req);
> + } else { /* VIRTIO_BLK_T_DISCARD */
> + /*
> + * The device MUST set the status byte to VIRTIO_BLK_S_UNSUPP for
> + * discard commands if the unmap flag is set.
> + */
> + if (unlikely(flags & VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP)) {
> + err_status = VIRTIO_BLK_S_UNSUPP;
> + goto err;
> + }
> +
> + blk_aio_pdiscard(req->dev->blk, sector << BDRV_SECTOR_BITS, bytes,
> + virtio_blk_discard_wzeroes_complete, req);
> + }
> +
> + return VIRTIO_BLK_S_OK;
> +
> +err:
> + if (is_wzeroes) {
> + block_acct_invalid(blk_get_stats(req->dev->blk), BLOCK_ACCT_WRITE);
> + }
> + return err_status;
> +}
> +
> static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> {
> uint32_t type;
> @@ -586,6 +686,45 @@ static int virtio_blk_handle_request(VirtIOBlockReq *req, MultiReqBuffer *mrb)
> virtio_blk_free_request(req);
> break;
> }
> + /*
> + * VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES are defined with
> + * VIRTIO_BLK_T_OUT flag set. We masked this flag in the switch statement,
> + * so we must mask it for these requests, then we will check if it is set.
> + */
> + case VIRTIO_BLK_T_DISCARD & ~VIRTIO_BLK_T_OUT:
> + case VIRTIO_BLK_T_WRITE_ZEROES & ~VIRTIO_BLK_T_OUT:
> + {
> + struct virtio_blk_discard_write_zeroes dwz_hdr;
> + size_t out_len = iov_size(out_iov, out_num);
> + bool is_wzeroes = (type & ~VIRTIO_BLK_T_BARRIER) ==
> + VIRTIO_BLK_T_WRITE_ZEROES;
> + uint8_t err_status;
> +
> + /*
> + * Unsupported if VIRTIO_BLK_T_OUT is not set or the request contains
> + * more than one segment.
> + */
> + if (unlikely(!(type & VIRTIO_BLK_T_OUT) ||
> + out_len > sizeof(dwz_hdr))) {
> + virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
> + virtio_blk_free_request(req);
> + return 0;
> + }
> +
> + if (unlikely(iov_to_buf(out_iov, out_num, 0, &dwz_hdr,
> + sizeof(dwz_hdr)) != sizeof(dwz_hdr))) {
> + virtio_error(vdev, "virtio-blk discard/wzeroes header too short");
> + return -1;
> + }
> +
> + err_status = virtio_blk_handle_dwz(req, is_wzeroes, &dwz_hdr);
> + if (err_status != VIRTIO_BLK_S_OK) {
> + virtio_blk_req_complete(req, err_status);
> + virtio_blk_free_request(req);
> + }
> +
> + break;
> + }
> default:
> virtio_blk_req_complete(req, VIRTIO_BLK_S_UNSUPP);
> virtio_blk_free_request(req);
> @@ -765,6 +904,22 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
> blkcfg.alignment_offset = 0;
> blkcfg.wce = blk_enable_write_cache(s->blk);
> virtio_stw_p(vdev, &blkcfg.num_queues, s->conf.num_queues);
> + if (s->conf.discard_wzeroes) {
> + virtio_stl_p(vdev, &blkcfg.max_discard_sectors,
> + s->conf.dwz_max_sectors);
> + virtio_stl_p(vdev, &blkcfg.discard_sector_alignment,
> + blk_size >> BDRV_SECTOR_BITS);
> + virtio_stl_p(vdev, &blkcfg.max_write_zeroes_sectors,
> + s->conf.dwz_max_sectors);
> + blkcfg.write_zeroes_may_unmap = s->conf.wz_may_unmap;
> + /*
> + * We support only one segment per request since multiple segments
> + * are not widely used and there are no userspace APIs that allow
> + * applications to submit multiple segments in a single call.
> + */
> + virtio_stl_p(vdev, &blkcfg.max_discard_seg, 1);
> + virtio_stl_p(vdev, &blkcfg.max_write_zeroes_seg, 1);
> + }
> memcpy(config, &blkcfg, sizeof(struct virtio_blk_config));
> }
>
> @@ -811,6 +966,10 @@ static uint64_t virtio_blk_get_features(VirtIODevice *vdev, uint64_t features,
> if (s->conf.num_queues > 1) {
> virtio_add_feature(&features, VIRTIO_BLK_F_MQ);
> }
> + if (s->conf.discard_wzeroes) {
> + virtio_add_feature(&features, VIRTIO_BLK_F_DISCARD);
> + virtio_add_feature(&features, VIRTIO_BLK_F_WRITE_ZEROES);
> + }
>
> return features;
> }
> @@ -956,6 +1115,16 @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp)
> return;
> }
>
> + if (conf->discard_wzeroes) {
> + if (!conf->dwz_max_sectors ||
> + conf->dwz_max_sectors > BDRV_REQUEST_MAX_SECTORS) {
> + error_setg(errp, "invalid dwz-max-sectors property (%" PRIu32 "), "
> + "must be between 1 and %lu",
> + conf->dwz_max_sectors, BDRV_REQUEST_MAX_SECTORS);
> + return;
> + }
> + }
> +
> virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
> sizeof(struct virtio_blk_config));
>
> @@ -1028,6 +1197,10 @@ static Property virtio_blk_properties[] = {
> IOThread *),
> DEFINE_PROP_BIT("discard-wzeroes", VirtIOBlock, conf.discard_wzeroes, 0,
> true),
> + DEFINE_PROP_UINT32("discard-wzeroes-max-sectors", VirtIOBlock,
> + conf.dwz_max_sectors, BDRV_REQUEST_MAX_SECTORS),
> + DEFINE_PROP_BIT("wzeroes-may-unmap", VirtIOBlock, conf.wz_may_unmap, 0,
> + true),
> DEFINE_PROP_END_OF_LIST(),
> };
>
> diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
> index c336afb4cd..4e9d4434ff 100644
> --- a/include/hw/virtio/virtio-blk.h
> +++ b/include/hw/virtio/virtio-blk.h
> @@ -41,6 +41,8 @@ struct VirtIOBlkConf
> uint16_t num_queues;
> uint16_t queue_size;
> uint32_t discard_wzeroes;
> + uint32_t dwz_max_sectors;
> + uint32_t wz_may_unmap;
> };
>
> struct VirtIOBlockDataPlane;
> --
> 2.20.1
next prev parent reply other threads:[~2019-01-31 16:04 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-31 15:19 [Qemu-devel] [PATCH v2 0/5] virtio-blk: add DISCARD and WRITE ZEROES features Stefano Garzarella
2019-01-31 15:19 ` [Qemu-devel] [PATCH v2 1/5] virtio-blk: add acct_failed param to virtio_blk_handle_rw_error() Stefano Garzarella
2019-02-01 5:15 ` Stefan Hajnoczi
2019-01-31 15:19 ` [Qemu-devel] [PATCH v2 2/5] virtio-blk: add "discard-wzeroes" boolean property Stefano Garzarella
2019-01-31 15:40 ` Dr. David Alan Gilbert
2019-01-31 15:50 ` Stefano Garzarella
2019-01-31 15:59 ` Dr. David Alan Gilbert
2019-01-31 16:43 ` Michael S. Tsirkin
2019-01-31 17:37 ` Stefano Garzarella
2019-02-05 20:54 ` Michael S. Tsirkin
2019-02-01 4:29 ` Stefan Hajnoczi
2019-02-01 9:09 ` Stefano Garzarella
2019-02-01 10:06 ` Stefan Hajnoczi
2019-02-01 15:16 ` Michael S. Tsirkin
2019-02-01 17:18 ` Stefano Garzarella
2019-02-04 3:33 ` Stefan Hajnoczi
2019-02-04 10:16 ` Stefano Garzarella
2019-02-04 13:37 ` Michael S. Tsirkin
2019-02-04 15:38 ` Stefano Garzarella
2019-01-31 15:19 ` [Qemu-devel] [PATCH v2 3/5] virtio-blk: add DISCARD and WRITE ZEROES features Stefano Garzarella
2019-01-31 16:04 ` Michael S. Tsirkin [this message]
2019-01-31 17:01 ` Stefano Garzarella
2019-01-31 17:13 ` Michael S. Tsirkin
2019-02-01 4:58 ` Stefan Hajnoczi
2019-02-01 9:54 ` Stefano Garzarella
2019-02-01 10:08 ` Stefan Hajnoczi
2019-01-31 15:19 ` [Qemu-devel] [PATCH v2 4/5] tests/virtio-blk: change assert on data_size in virtio_blk_request() Stefano Garzarella
2019-02-01 5:04 ` Stefan Hajnoczi
2019-01-31 15:19 ` [Qemu-devel] [PATCH v2 5/5] tests/virtio-blk: add test for WRITE_ZEROES command Stefano Garzarella
2019-02-01 5:15 ` Stefan Hajnoczi
2019-01-31 17:15 ` [Qemu-devel] [PATCH v2 0/5] virtio-blk: add DISCARD and WRITE ZEROES features Michael S. Tsirkin
2019-01-31 17:38 ` Stefano Garzarella
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190131103056-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=kwolf@redhat.com \
--cc=lvivier@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
--cc=thuth@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.