Re: [PATCH v5 07/11] block: use int64_t instead of int in driver write_zeroes handlers

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Eric Blake <eblake@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: kwolf@redhat.com, Fam Zheng <fam@euphon.net>,
	"open list:GLUSTER" <integration@gluster.org>,
	Alberto Garcia <berto@igalia.com>,
	Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>,
	qemu-block@nongnu.org, Ronnie Sahlberg <ronniesahlberg@gmail.com>,
	Peter Lieven <pl@kamp.de>,
	qemu-devel@nongnu.org, mreitz@redhat.com, stefanha@redhat.com,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>,
	Ari Sundholm <ari@tuxera.com>
Subject: Re: [PATCH v5 07/11] block: use int64_t instead of int in driver write_zeroes handlers
Date: Fri, 4 Jun 2021 15:09:39 -0500	[thread overview]
Message-ID: <20210604200939.2wcnh2spynqlbepb@redhat.com> (raw)
In-Reply-To: <20210505075001.45041-8-vsementsov@virtuozzo.com>

On Wed, May 05, 2021 at 10:49:57AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, convert driver write_zeroes handlers bytes parameter to int64_t.
> 
> The only caller of all updated function is bdrv_co_do_pwrite_zeroes().
> 
> bdrv_co_do_pwrite_zeroes() itself is of course OK with widening of
> callee parameter type. Also, bdrv_co_do_pwrite_zeroes()'s
> max_write_zeroes is limited to INT_MAX. So, updated functions all are
> safe, the will not get "bytes" larger than before.

they

> 
> Still, let's look through all updated functions, and add assertions to
> the ones which are actually unprepared to values larger than INT_MAX.
> For these drivers also set explicit max_pwrite_zeroes limit.
> 
> Let's go:
> 
> backup-top: Calls backup_top_cbw() and bdrv_co_pwrite_zeroes, both
>   have 64bit argument.

backup_top_cbw has uint64_t argument (at least on current qemu.git; I
didn't spot if it was changed in the meantime earlier in this series
or if I'm missing review of a prerequisite), but we're safe since the
block layer does not pass in negative values.

> 
> blkdebug: calculations can't overflow, thanks to
>   bdrv_check_qiov_request() in generic layer. rule_check() and
>   bdrv_co_pwrite_zeroes() both have 64bit argument.

rule_check() is another function currently using uint64_t.

> 
> blklogwrites: pass to blk_log_writes_co_log() with 64bit argument.

That, and struct BlkLogWritesFileReq, still use uint64_t.

> 
> blkreply, copy-on-read, filter-compress: pass to bdrv_co_pwrite_zeroes() which is OK

blkreplay

> 
> file-posix: both handler calls raw_do_pwrite_zeroes, which is updated.
>   In raw_do_pwrite_zeroes() calculations are OK due to
>   bdrv_check_qiov_request(), bytes go to RawPosixAIOData::aio_nbytes
>   which is uint64_t.

We also have to check where that uint64_t gets handed;
handle_aiocb_write_zeroes_block() passes a uint64_t[2] to
ioctl(BLKZEROOUT), handle_aiocb_write_zeroes() calls do_fallocate()
which takes off_t (and we compile to always have 64-bit off_t), as
does handle_aiocb_write_zeroes_unmap.  All look safe.

> 
> gluster: bytes go to GlusterAIOCB::size which is int64_t and to
>   glfs_zerofill_async works with off_t.
> 
> iscsi: Aha, here we deal with iscsi_writesame16_task() that has
>   uint32_t num_blocks argument and iscsi_writesame16_task() has

s/16/10/

>   uint16_t argument. Make comments, add assertions and clarify
>   max_pwrite_zeroes calculation.
>   iscsi_allocmap_() functions already has int64_t argument
>   is_byte_request_lun_aligned is simple to update, do it.
> 
> mirror_top: pass to bdrv_mirror_top_do_write which has uint16_t

s/16/64/

>   argument
> 
> nbd: Aha, here we have protocol limitation, and NBDRequest::len is
>   uint32_t. max_pwrite_zeroes is cleanly set to 32bit value, so we are
>   OK for now.
> 
> nvme: Again, protocol limitation. And no inherent limit for
>   write-zeroes at all. But from code that calculates cdw12 it's obvious
>   that we do have limit and alignment. Let's clarify it. Also,
>   obviously the code is not prepared to bytes=0. Let's handle this

to handle bytes=0

>   case too.
>   trace events already 64bit
> 
> qcow2: offset + bytes and alignment still works good (thanks to
>   bdrv_check_qiov_request()), so tail calculation is OK
>   qcow2_subcluster_zeroize() has 64bit argument, should be OK
>   trace events updated
> 
> qed: qed_co_request wants int nb_sectors. Also in code we have size_t
>   used for request length which may be 32bit.. So, let's just keep

Double dot looks odd.

>   INT_MAX as a limit (aligning it down to pwrite_zeroes_alignment) and
>   don't care.

Yeah, even when size_t is 64-bit, qed is not our high priority so
sticking to 32-bit limit encourages people to switch away from qed ;)

> 
> raw-format: Is OK. raw_adjust_offset and bdrv_co_pwrite_zeroes are both
>   64bit.

I probably already mentioned it in earlier reviews, but hopefully by
the end of the series, we clean up raw_adjust_offset() to take
int64_t* instead of uint64_t* to get rid of ugly casts.  Doesn't have
to be done in this patch.

> 
> throttle: Both throttle_group_co_io_limits_intercept() and
>   bdrv_co_pwrite_zeroes() are 64bit.
> 
> vmdk: pass to vmdk_pwritev which is 64bit
> 
> quorum: pass to quorum_co_pwritev() which is 64bit
> 
> preallocated: pass to handle_write() and bdrv_co_pwrite_zeroes(), both

File is named preallocate.c, the 'd' seems odd. Worth sorting this
before qcow2 in the commit message?

>   64bit.
> 
> Hooray!
> 
> At this point all block drivers are prepared to 64bit write-zero
> requests or has explicitly set max_pwrite_zeroes.

are prepared to support 64bit write-zero requests, or have explicitly set

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---

> +++ b/block/iscsi.c
> @@ -1205,15 +1205,16 @@ out_unlock:
>  
>  static int
>  coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
> -                                    int bytes, BdrvRequestFlags flags)
> +                                    int64_t bytes, BdrvRequestFlags flags)
>  {
>      IscsiLun *iscsilun = bs->opaque;
>      struct IscsiTask iTask;
>      uint64_t lba;
> -    uint32_t nb_blocks;
> +    uint64_t nb_blocks;
>      bool use_16_for_ws = iscsilun->use_16_for_rw;
>      int r = 0;
>  
> +
>      if (!is_byte_request_lun_aligned(offset, bytes, iscsilun)) {

Why the added blank line here?

>          return -ENOTSUP;
>      }
> @@ -1250,11 +1251,21 @@ coroutine_fn iscsi_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
>      iscsi_co_init_iscsitask(iscsilun, &iTask);
>  retry:
>      if (use_16_for_ws) {
> +        /*
> +         * iscsi_writesame16_task num_blocks argument is uint32_t. We rely here
> +         * on our max_pwrite_zeroes limit.
> +         */
> +        assert(nb_blocks < UINT32_MAX);
>          iTask.task = iscsi_writesame16_task(iscsilun->iscsi, iscsilun->lun, lba,
>                                              iscsilun->zeroblock, iscsilun->block_size,
>                                              nb_blocks, 0, !!(flags & BDRV_REQ_MAY_UNMAP),
>                                              0, 0, iscsi_co_generic_cb, &iTask);
>      } else {
> +        /*
> +         * iscsi_writesame10_task num_blocks argument is uint16_t. We rely here
> +         * on our max_pwrite_zeroes limit.
> +         */
> +        assert(nb_blocks < UINT16_MAX);
>          iTask.task = iscsi_writesame10_task(iscsilun->iscsi, iscsilun->lun, lba,
>                                              iscsilun->zeroblock, iscsilun->block_size,
>                                              nb_blocks, 0, !!(flags & BDRV_REQ_MAY_UNMAP),

Thanks - these assertions and comments are indeed a lifesaver for
future maintenance.

> @@ -2074,10 +2085,10 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
>          bs->bl.pdiscard_alignment = iscsilun->block_size;
>      }
>  
> -    if (iscsilun->bl.max_ws_len < 0xffffffff / block_size) {
> -        bs->bl.max_pwrite_zeroes =
> -            iscsilun->bl.max_ws_len * iscsilun->block_size;
> -    }
> +    bs->bl.max_pwrite_zeroes =
> +        MIN_NON_ZERO(iscsilun->bl.max_ws_len * iscsilun->block_size,
> +                     max_xfer_len * iscsilun->block_size);

Works because max_xfer_len is 0xffff if we are stuck using
writesame10, or 0xffffffff if we are able to use writesame16.

> +++ b/block/nvme.c
> @@ -1266,19 +1266,29 @@ static coroutine_fn int nvme_co_flush(BlockDriverState *bs)
>  
>  static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
>                                                int64_t offset,
> -                                              int bytes,
> +                                              int64_t bytes,
>                                                BdrvRequestFlags flags)
>  {
>      BDRVNVMeState *s = bs->opaque;
>      NVMeQueuePair *ioq = s->queues[INDEX_IO(0)];
>      NVMeRequest *req;
> -
> -    uint32_t cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
> +    uint32_t cdw12;
>  
>      if (!s->supports_write_zeroes) {
>          return -ENOTSUP;
>      }
>  
> +    if (bytes == 0) {
> +        return 0;
> +    }
> +
> +    cdw12 = ((bytes >> s->blkshift) - 1) & 0xFFFF;
> +    /*
> +     * We should not loose information. pwrite_zeroes_alignment and

lose (this is a common English typo; "lose" rhymes with "use" and is
opposite of "gain", while "loose" rhymes with "goose" and is opposite
of "tighten")

> +++ b/block/qed.c

> @@ -1408,6 +1409,12 @@ static int coroutine_fn bdrv_qed_co_pwrite_zeroes(BlockDriverState *bs,
>       */
>      QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, bytes);
>  
> +    /*
> +     * QED is not prepared for 62bit write-zero requests, so rely on

64bit

> +     * max_pwrite_zeroes.
> +     */
> +    assert(bytes <= INT_MAX);
> +
>      /* Fall back if the request is not aligned */
>      if (qed_offset_into_cluster(s, offset) ||
>          qed_offset_into_cluster(s, bytes)) {

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

next prev parent reply	other threads:[~2021-06-04 20:10 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-05  7:49 [PATCH v5 00/11] 64bit block-layer: part II Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 01/11] block/io: bring request check to bdrv_co_(read, write)v_vmstate Vladimir Sementsov-Ogievskiy via
2021-05-05  7:49 ` [PATCH v5 02/11] qcow2: check request on vmstate save/load path Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 03/11] block: use int64_t instead of uint64_t in driver read handlers Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 04/11] block: use int64_t instead of uint64_t in driver write handlers Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 05/11] block: use int64_t instead of uint64_t in copy_range driver handlers Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 06/11] block: make BlockLimits::max_pwrite_zeroes 64bit Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 07/11] block: use int64_t instead of int in driver write_zeroes handlers Vladimir Sementsov-Ogievskiy
2021-06-04 20:09   ` Eric Blake [this message]
2021-06-05 13:50     ` Vladimir Sementsov-Ogievskiy
2021-05-05  7:49 ` [PATCH v5 08/11] block/io: allow 64bit write-zeroes requests Vladimir Sementsov-Ogievskiy
2021-06-07 15:03   ` Eric Blake
2021-05-05  7:49 ` [PATCH v5 09/11] block: make BlockLimits::max_pdiscard 64bit Vladimir Sementsov-Ogievskiy
2021-06-07 15:05   ` Eric Blake
2021-05-05  7:50 ` [PATCH v5 10/11] block: use int64_t instead of int in driver discard handlers Vladimir Sementsov-Ogievskiy
2021-06-07 18:13   ` Eric Blake
2021-06-08  8:38     ` Vladimir Sementsov-Ogievskiy
2021-05-05  7:50 ` [PATCH v5 11/11] block/io: allow 64bit discard requests Vladimir Sementsov-Ogievskiy
2021-06-07 18:15   ` Eric Blake
2021-05-05  8:06 ` [PATCH v5 00/11] 64bit block-layer: part II no-reply
2021-05-05  8:10   ` Vladimir Sementsov-Ogievskiy
2021-06-04 18:30 ` Eric Blake

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210604200939.2wcnh2spynqlbepb@redhat.com \
    --to=eblake@redhat.com \
    --cc=ari@tuxera.com \
    --cc=berto@igalia.com \
    --cc=fam@euphon.net \
    --cc=integration@gluster.org \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pavel.dovgaluk@ispras.ru \
    --cc=pbonzini@redhat.com \
    --cc=pl@kamp.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).