qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, mreitz@redhat.com, qemu-block@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Fam Zheng <famz@redhat.com>,
	Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Lieven <pl@kamp.de>,
	"Michael S. Tsirkin" <mst@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 3/5] block: Switch transfer length bounds to byte-based
Date: Sat, 11 Jun 2016 16:06:00 -0600	[thread overview]
Message-ID: <575C8B48.7030909@redhat.com> (raw)
In-Reply-To: <20160607124522.GH4684@noname.str.redhat.com>

[-- Attachment #1: Type: text/plain, Size: 6887 bytes --]

On 06/07/2016 06:45 AM, Kevin Wolf wrote:
> Am 03.06.2016 um 19:03 hat Eric Blake geschrieben:
>> Sector-based limits are awkward to think about; in our on-going
>> quest to move to byte-based interfaces, convert max_transfer_length
>> and opt_transfer_length.  Rename them (dropping the _length suffix)
>> so that the compiler will help us catch the change in semantics
>> across any rebased code.  Improve the documentation, and guarantee
>> that blk_get_max_transfer() returns a non-zero value.  Use unsigned
>> values, so that we don't have to worry about negative values and
>> so that bit-twiddling is easier; however, we are still constrained
>> by 2^31 of signed int in most APIs.
>>
>> Of note: the iscsi calculations use a 64-bit number internally,
>> but the final result is guaranteed to fit in 32 bits.  NBD is
>> fixed to advertise the maximum length of 32M that the qemu and
>> kernel servers enforce, rather than a number of sectors that
>> would overflow int when converted back to bytes.  scsi-generic
>> now advertises a maximum always, rather than just when the
>> underlying device advertised a maximum.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
>> @@ -1177,7 +1176,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>>
>>          if (ret == -ENOTSUP) {
>>              /* Fall back to bounce buffer if write zeroes is unsupported */
>> -            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer_length,
>> +            int max_xfer_len = MIN_NON_ZERO(bs->bl.max_transfer,
>>                                              MAX_WRITE_ZEROES_BOUNCE_BUFFER);
> 
> You could consider renaming the variable to max_transfer to keep it
> consistent with the BlockLimits field name and to make it even more
> obvious that you converted all uses.

Good idea.

>> @@ -1706,13 +1708,14 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>>       * iscsi_open(): iscsi targets don't change their limits. */
>>
>>      IscsiLun *iscsilun = bs->opaque;
>> -    uint32_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
>> +    uint64_t max_xfer_len = iscsilun->use_16_for_rw ? 0xffffffff : 0xffff;
>>
>>      if (iscsilun->bl.max_xfer_len) {
>>          max_xfer_len = MIN(max_xfer_len, iscsilun->bl.max_xfer_len);
>>      }
>>
>> -    bs->bl.max_transfer_length = sector_limits_lun2qemu(max_xfer_len, iscsilun);
>> +    bs->bl.max_transfer = MIN(BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS,
>> +                              max_xfer_len * iscsilun->block_size);
> 
> Why don't you simply use 0 when you defined it as no limit?
> 
> If we make some drivers put 0 there and others INT_MAX, chances are that
> we'll introduce places where we fail to handle 0 correctly.

So if I'm understanding correctly, we want something like:

if (max_xfer_len * iscsilun->block_size > INT_MAX) {
    bs->bl.max_transfer = 0;
} else {
    bs->bl.max_transfer = max_xfer_len * iscsilun->block_size;
}

and make sure that 0 continues to mean 'no signed 32-bit limit'.

>> +++ b/block/nbd.c
>> @@ -363,7 +363,7 @@ static int nbd_co_flush(BlockDriverState *bs)
>>  static void nbd_refresh_limits(BlockDriverState *bs, Error **errp)
>>  {
>>      bs->bl.max_discard = UINT32_MAX >> BDRV_SECTOR_BITS;
>> -    bs->bl.max_transfer_length = UINT32_MAX >> BDRV_SECTOR_BITS;
>> +    bs->bl.max_transfer = NBD_MAX_BUFFER_SIZE;
>>  }
> 
> This introduces a semantic difference and should therefore be a separate
> patch. Here, it should become UINT32_MAX through mechanical conversion.
> 
> Or actually, it can't because that's not a power of two. So maybe have
> the NBD patch first...

Will split.  I don't see any problem with a max_transfer that is not a
power of two, as long as it is a multiple of request_alignment (iscsi is
a case in point - the max_transfer can be 0xffff blocks).

> 
>>  static int nbd_co_discard(BlockDriverState *bs, int64_t sector_num,
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index ce2e20f..f3bd5ef 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -752,7 +752,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>          if (S_ISBLK(st.st_mode)) {
>>              int ret = hdev_get_max_transfer_length(s->fd);
>>              if (ret >= 0) {
>> -                bs->bl.max_transfer_length = ret;
>> +                bs->bl.max_transfer = ret << BDRV_SECTOR_BITS;
> 
> I assume that we don't care about overflows here because in practice the
> values are very low? Should we check (or maybe better just cap) it
> anyway?

Will add a check.

>> @@ -391,8 +391,9 @@ void virtio_blk_submit_multireq(BlockBackend *blk, MultiReqBuffer *mrb)
>>          return;
>>      }
>>
>> -    max_xfer_len = blk_get_max_transfer_length(mrb->reqs[0]->dev->blk);
>> -    max_xfer_len = MIN_NON_ZERO(max_xfer_len, BDRV_REQUEST_MAX_SECTORS);
>> +    max_xfer_len = blk_get_max_transfer(mrb->reqs[0]->dev->blk);
>> +    assert(max_xfer_len &&
>> +           max_xfer_len >> BDRV_SECTOR_BITS <= BDRV_REQUEST_MAX_SECTORS);
> 
> Why can we assert this here? The comment for BlockLimits.max_xfer_len
> doesn't document any maximum. Of course, as I already said above, it
> might not happen in practice, but INT_MAX + 1 is theoretically valid and
> would fail the assertion.

As part of this patch, blk_get_max_transfer() guarantees that its result
is non-zero and no larger than INT_MAX.


>> +++ b/hw/scsi/scsi-generic.c
>> @@ -225,13 +225,13 @@ static void scsi_read_complete(void * opaque, int ret)
>>      if (s->type == TYPE_DISK &&
>>          r->req.cmd.buf[0] == INQUIRY &&
>>          r->req.cmd.buf[2] == 0xb0) {
>> -        uint32_t max_xfer_len = blk_get_max_transfer_length(s->conf.blk);
>> -        if (max_xfer_len) {
>> -            stl_be_p(&r->buf[8], max_xfer_len);
>> -            /* Also take care of the opt xfer len. */
>> -            if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
>> -                stl_be_p(&r->buf[12], max_xfer_len);
>> -            }
>> +        uint32_t max_xfer_len =
>> +            blk_get_max_transfer(s->conf.blk) >> BDRV_SECTOR_BITS;
>> +
>> +        stl_be_p(&r->buf[8], max_xfer_len);
>> +        /* Also take care of the opt xfer len. */
>> +        if (ldl_be_p(&r->buf[12]) > max_xfer_len) {
>> +            stl_be_p(&r->buf[12], max_xfer_len);
>>          }
>>      }
> 
> This is another hidden semantic change. Can we have a separate
> scsi-generic patch first that changes the handling for the
> max_transfer == 0 case and only then make the change in
> blk_get_max_transfer() as pure refactoring without a change in
> behaviour?

Will split.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2016-06-11 22:06 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-03 17:03 [Qemu-devel] [PATCH 0/5] Byte-based block limits Eric Blake
2016-06-03 17:03 ` [Qemu-devel] [PATCH 1/5] block: Tighter assertions on bdrv_aligned_preadv() Eric Blake
2016-06-07 12:15   ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 2/5] block: Honor flags during bdrv_aligned_preadv() Eric Blake
2016-06-07 12:12   ` Kevin Wolf
2016-06-11 21:43     ` Eric Blake
2016-06-03 17:03 ` [Qemu-devel] [PATCH 3/5] block: Switch transfer length bounds to byte-based Eric Blake
2016-06-07 12:45   ` Kevin Wolf
2016-06-11 22:06     ` Eric Blake [this message]
2016-06-14  8:20       ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 4/5] block: Switch discard " Eric Blake
2016-06-07 13:12   ` Kevin Wolf
2016-06-03 17:03 ` [Qemu-devel] [PATCH 5/5] block: Move request_alignment into BlockLimit Eric Blake
2016-06-03 17:49   ` Eric Blake
2016-06-03 21:43     ` Eric Blake
2016-06-07 10:08       ` Kevin Wolf
2016-06-07 11:04         ` Paolo Bonzini
2016-06-07 11:24           ` Kevin Wolf
2016-06-14  4:39         ` Eric Blake
2016-06-14  8:05           ` Kevin Wolf
2016-06-14 14:47             ` Eric Blake
2016-06-14 15:30               ` Kevin Wolf
2016-06-07 13:19   ` Kevin Wolf
2016-06-03 23:06 ` [Qemu-devel] [PATCH 6/5] block: Fix harmless off-by-one in bdrv_aligned_preadv() Eric Blake
2016-06-07 13:47   ` Kevin Wolf
2016-06-03 23:13 ` [Qemu-devel] [PATCH 7/5] block: Refactor zero_beyond_eof hack " Eric Blake
2016-06-07 13:49   ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=575C8B48.7030909@redhat.com \
    --to=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).