Re: [Qemu-devel] [PATCH 3/5] block: Switch transfer length bounds to byte-based

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org, mreitz@redhat.com, qemu-block@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Fam Zheng <famz@redhat.com>,
	Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@kamp.de>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 3/5] block: Switch transfer length bounds to byte-based
Date: Tue, 7 Jun 2016 14:45:22 +0200	[thread overview]
Message-ID: <20160607124522.GH4684@noname.str.redhat.com> (raw)
In-Reply-To: <1464973388-15821-4-git-send-email-eblake@redhat.com>

Am 03.06.2016 um 19:03 hat Eric Blake geschrieben:
> Sector-based limits are awkward to think about; in our on-going
> quest to move to byte-based interfaces, convert max_transfer_length
> and opt_transfer_length.  Rename them (dropping the _length suffix)
> so that the compiler will help us catch the change in semantics
> across any rebased code.  Improve the documentation, and guarantee
> that blk_get_max_transfer() returns a non-zero value.  Use unsigned
> values, so that we don't have to worry about negative values and
> so that bit-twiddling is easier; however, we are still constrained
> by 2^31 of signed int in most APIs.
> 
> Of note: the iscsi calculations use a 64-bit number internally,
> but the final result is guaranteed to fit in 32 bits.  NBD is
> fixed to advertise the maximum length of 32M that the qemu and
> kernel servers enforce, rather than a number of sectors that
> would overflow int when converted back to bytes.  scsi-generic
> now advertises a maximum always, rather than just when the
> underlying device advertised a maximum.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

> @@ -1177,7 +1176,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
> 
>          if (ret == -ENOTSUP) {
>              /* Fall back to bounce buffer if write zeroes is unsupported */
> -            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer_length,
> +            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer,
>                                              MAX_WRITE_ZEROES_BOUNCE_BUFFER);

You could consider renaming the variable to max_transfer to keep it
consistent with the BlockLimits field name and to make it even more
obvious that you converted all uses.

>              BdrvRequestFlags write_flags = flags & ~BDRV_REQ_ZERO_WRITE;
> 
> @@ -1188,7 +1187,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>                  write_flags &= ~BDRV_REQ_FUA;
>                  need_flush = true;
>              }
> -            num = MIN(num, max_xfer_len << BDRV_SECTOR_BITS);
> +            num = MIN(num, max_xfer_len);
>              iov.iov_len = num;
>              if (iov.iov_base == NULL) {
>                  iov.iov_base = qemu_try_blockalign(bs, num);
> @@ -1205,7 +1204,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>              /* Keep bounce buffer around if it is big enough for all
>               * all future requests.
>               */
> -            if (num < max_xfer_len << BDRV_SECTOR_BITS) {
> +            if (num < max_xfer_len) {
>                  qemu_vfree(iov.iov_base);
>                  iov.iov_base = NULL;
>              }
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 7e78ade..c5855be 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -473,9 +473,10 @@ iscsi_co_writev_flags(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
>          return -EINVAL;
>      }
> 
> -    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
> +    if (bs->bl.max_transfer &&
> +        nb_sectors << BDRV_SECTOR_BITS > bs->bl.max_transfer) {
>          error_report("iSCSI Error: Write of %d sectors exceeds max_xfer_len "
> -                     "of %d sectors", nb_sectors, bs->bl.max_transfer_length);
> +                     "of %" PRIu32 " bytes", nb_sectors, bs->bl.max_transfer);
>          return -EINVAL;
>      }
> 
> @@ -650,9 +651,10 @@ static int coroutine_fn iscsi_co_readv(BlockDriverState *bs,
>          return -EINVAL;
>      }
> 
> -    if (bs->bl.max_transfer_length && nb_sectors > bs->bl.max_transfer_length) {
> +    if (bs->bl.max_transfer &&
> +        nb_sectors << BDRV_SECTOR_BITS > bs->bl.max_transfer) {
>          error_report("iSCSI Error: Read of %d sectors exceeds max_xfer_len "
> -                     "of %d sectors", nb_sectors, bs->bl.max_transfer_length);
> +                     "of %" PRIu32 " bytes", nb_sectors, bs->bl.max_transfer);
>          return -EINVAL;
>      }
> 
> @@ -1706,13 +1708,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>       * iscsi_open(): iscsi targets don't change their limits. */
> 
>      IscsiLun *iscsilun = bs->opaque;
> -    uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
> +    uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
> 
>      if (iscsilun->bl.max_xfer_len) {
>          max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
>      }
> 
> -    bs->bl.max_transfer_length = sector_limits_lun2qemu(max_xfer_len, iscsilun);
> +    bs->bl.max_transfer = MIN(BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS,
> +                              max_xfer_len * iscsilun->block_size);

Why don't you simply use 0 when you defined it as no limit?

If we make some drivers put 0 there and others INT_MAX, chances are that
we'll introduce places where we fail to handle 0 correctly.

>      if (iscsilun->lbp.lbpu) {
>          if (iscsilun->bl.max_unmap < 0xffffffff) {
> @@ -1735,8 +1738,7 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>      } else {
>          bs->bl.pwrite_zeroes_alignment = iscsilun->block_size;
>      }
> -    bs->bl.opt_transfer_length =
> -        sector_limits_lun2qemu(iscsilun->bl.opt_xfer_len, iscsilun);
> +    bs->bl.opt_transfer = iscsilun->bl.opt_xfer_len * iscsilun->block_size;
>  }
> 
>  /* Note that this will not re-establish a connection with an iSCSI target - it
> diff --git a/block/nbd.c b/block/nbd.c
> index 6015e8b..2ce7b4d 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -363,7 +363,7 @@ static int nbd_co_flush(BlockDriverState *bs)
>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>  {
>      bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
> -    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
> +    bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
>  }

This introduces a semantic difference and should therefore be a separate
patch. Here, it should become UINT32_MAX through mechanical conversion.

Or actually, it can't because that's not a power of two. So maybe have
the NBD patch first...

>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index ce2e20f..f3bd5ef 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -752,7 +752,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>          if (S_ISBLK(st.st_mode)) {
>              int ret = hdev_get_max_transfer_length(s->fd);
>              if (ret >= 0) {
> -                bs->bl.max_transfer_length = ret;
> +                bs->bl.max_transfer = ret << BDRV_SECTOR_BITS;

I assume that we don't care about overflows here because in practice the
values are very low? Should we check (or maybe better just cap) it
anyway?

>              }
>          }
>      }
> diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
> index 284e646..fbaf8f5 100644
> --- a/hw/block/virtio-blk.c
> +++ b/hw/block/virtio-blk.c
> @@ -382,7 +382,7 @@ static int multireq_compare(const void *a, const void *b)
>  void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>  {
>      int i = 0, start = 0, num_reqs = 0, niov = 0, nb_sectors = 0;
> -    int max_xfer_len = 0;
> +    uint32_t max_xfer_len = 0;
>      int64_t sector_num = 0;

Again, consider renaming the variable.

>      if (mrb->num_reqs == 1) {
> @@ -391,8 +391,9 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>          return;
>      }
> 
> -    max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
> -    max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);
> +    max_xfer_len = blk_get_max_transfer(mrb->reqs[0]->dev->blk);
> +    assert(max_xfer_len &&
> +           max_xfer_len >> BDRV_SECTOR_BITS <= BDRV_REQUEST_MAX_SECTORS);

Why can we assert this here? The comment for BlockLimits.max_xfer_len
doesn't document any maximum. Of course, as I already said above, it
might not happen in practice, but INT_MAX + 1 is theoretically valid and
would fail the assertion.

>      qsort(mrb->reqs, mrb->num_reqs, sizeof(*mrb->reqs),
>            &multireq_compare);
> @@ -408,8 +409,9 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>               */
>              if (sector_num + nb_sectors != req->sector_num ||
>                  niov > blk_get_max_iov(blk) - req->qiov.niov ||
> -                req->qiov.size / BDRV_SECTOR_SIZE > max_xfer_len ||
> -                nb_sectors > max_xfer_len - req->qiov.size / BDRV_SECTOR_SIZE) {
> +                req->qiov.size > max_xfer_len ||
> +                nb_sectors > (max_xfer_len -
> +                              req->qiov.size) / BDRV_SECTOR_SIZE) {
>                  submit_requests(blk, mrb, start, num_reqs, niov);
>                  num_reqs = 0;
>              }
> diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
> index 71372a8..c6fef32 100644
> --- a/hw/scsi/scsi-generic.c
> +++ b/hw/scsi/scsi-generic.c
> @@ -225,13 +225,13 @@ static void scsi_read_complete(void * opaque, int ret)
>      if (s->type == TYPE_DISK &&
>          r->req.cmd.buf[0] == INQUIRY &&
>          r->req.cmd.buf[2] == 0xb0) {
> -        uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
> -        if (max_xfer_len) {
> -            stl_be_p(&r->buf[8], max_xfer_len);
> -            /* Also take care of the opt xfer len. */
> -            if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
> -                stl_be_p(&r->buf[12], max_xfer_len);
> -            }
> +        uint32_t max_xfer_len =
> +            blk_get_max_transfer(s->conf.blk) >> BDRV_SECTOR_BITS;
> +
> +        stl_be_p(&r->buf[8], max_xfer_len);
> +        /* Also take care of the opt xfer len. */
> +        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
> +            stl_be_p(&r->buf[12], max_xfer_len);
>          }
>      }

This is another hidden semantic change. Can we have a separate
scsi-generic patch first that changes the handling for the
max_transfer == 0 case and only then make the change in
blk_get_max_transfer() as pure refactoring without a change in
behaviour?

>      scsi_req_data(&r->req, len);
> diff --git a/qemu-img.c b/qemu-img.c
> index 4b56ad3..577df0d 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -2074,13 +2074,13 @@ static int img_convert(int argc, char **argv)
>      }
>      out_bs = blk_bs(out_blk);
> 
> -    /* increase bufsectors from the default 4096 (2M) if opt_transfer_length
> +    /* increase bufsectors from the default 4096 (2M) if opt_transfer
>       * or discard_alignment of the out_bs is greater. Limit to 32768 (16MB)
>       * as maximum. */
>      bufsectors = MIN(32768,
> -                     MAX(bufsectors, MAX(out_bs->bl.opt_transfer_length,
> -                                         out_bs->bl.discard_alignment))
> -                    );
> +                     MAX(bufsectors,
> +                         MAX(out_bs->bl.opt_transfer >> BDRV_SECTOR_BITS,
> +                             out_bs->bl.discard_alignment)));
> 
>      if (skip_create) {
>          int64_t output_sectors = blk_nb_sectors(out_blk);
> -- 
> 2.5.5

Kevin

next prev parent reply	other threads:[~2016-06-07 12:45 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 17:03 [Qemu-devel] [PATCH 0/5] Byte-based block limits Eric Blake
2016-06-03 17:03 ` [Qemu-devel] [PATCH 1/5] block: Tighter assertions on bdrv_aligned_preadv() Eric Blake
2016-06-07 12:15   ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 2/5] block: Honor flags during bdrv_aligned_preadv() Eric Blake
2016-06-07 12:12   ` Kevin Wolf
2016-06-11 21:43     ` Eric Blake
2016-06-03 17:03 ` [Qemu-devel] [PATCH 3/5] block: Switch transfer length bounds to byte-based Eric Blake
2016-06-07 12:45   ` Kevin Wolf [this message]
2016-06-11 22:06     ` Eric Blake
2016-06-14  8:20       ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 4/5] block: Switch discard " Eric Blake
2016-06-07 13:12   ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit Eric Blake
2016-06-03 17:49   ` Eric Blake
2016-06-03 21:43     ` Eric Blake
2016-06-07 10:08       ` Kevin Wolf
2016-06-07 11:04         ` Paolo Bonzini
2016-06-07 11:24           ` Kevin Wolf
2016-06-14  4:39         ` Eric Blake
2016-06-14  8:05           ` Kevin Wolf
2016-06-14 14:47             ` Eric Blake
2016-06-14 15:30               ` Kevin Wolf
2016-06-07 13:19   ` Kevin Wolf
2016-06-03 23:06 ` [Qemu-devel] [PATCH 6/5] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
2016-06-07 13:47   ` Kevin Wolf
2016-06-03 23:13 ` [Qemu-devel] [PATCH 7/5] block: Refactor zero_beyond_eof hack " Eric Blake
2016-06-07 13:49   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160607124522.GH4684@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).