* [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading @ 2018-07-04 6:13 Fam Zheng 2018-07-04 6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng 2018-07-04 6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng 0 siblings, 2 replies; 5+ messages in thread From: Fam Zheng @ 2018-07-04 6:13 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, John Snow, Stefan Hajnoczi Qcow2 allocates new clusters after the end of the file. If it is the destinaton of copy offloading, we must adjust dst->bs->total_sectors. Otherwise, further reads will drop to the "beyond EOF" code path and return zeroes, which problem is caught by iotests 222. Follow the logic in the normal write code and update bs->total_sectors after I/O is done. While at it, add a few convenient trace points to aid future debug experiences in the topic. Fam Zheng (2): block: Fix dst total_sectors after copy offloading block: Add copy offloading trace points block/file-posix.c | 2 ++ block/io.c | 6 ++++++ block/iscsi.c | 3 +++ block/trace-events | 6 ++++++ 4 files changed, 17 insertions(+) -- 2.17.1 ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading 2018-07-04 6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng @ 2018-07-04 6:13 ` Fam Zheng 2018-07-04 7:44 ` Kevin Wolf 2018-07-04 6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng 1 sibling, 1 reply; 5+ messages in thread From: Fam Zheng @ 2018-07-04 6:13 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, John Snow, Stefan Hajnoczi This was noticed by the new image fleecing tests case 222. The issue is apparent and we should just do the same right things as in bdrv_aligned_pwritev. Reported-by: John Snow <jsnow@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- block/io.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/io.c b/block/io.c index 1a2272fad3..8e02f4ab95 100644 --- a/block/io.c +++ b/block/io.c @@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, dst, dst_offset, bytes, flags); } + if (ret == 0) { + int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE); + dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector); + } tracked_request_end(&src_req); tracked_request_end(&dst_req); bdrv_dec_in_flight(src->bs); -- 2.17.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading 2018-07-04 6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng @ 2018-07-04 7:44 ` Kevin Wolf 2018-07-04 8:42 ` Fam Zheng 0 siblings, 1 reply; 5+ messages in thread From: Kevin Wolf @ 2018-07-04 7:44 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, John Snow, Stefan Hajnoczi Am 04.07.2018 um 08:13 hat Fam Zheng geschrieben: > This was noticed by the new image fleecing tests case 222. The issue is > apparent and we should just do the same right things as in > bdrv_aligned_pwritev. > > Reported-by: John Snow <jsnow@redhat.com> > Signed-off-by: Fam Zheng <famz@redhat.com> > --- a/block/io.c > +++ b/block/io.c > @@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, > dst, dst_offset, > bytes, flags); > } > + if (ret == 0) { > + int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE); > + dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector); > + } I think it's time to extract this stuff to a common function. I was already aware that a write request that extends the image isn't behaving exactly the same as bdrv_co_truncate(), and this one is bound to diverge from the other two instances as well. This is what bdrv_aligned_pwritev() does after the write: atomic_inc(&bs->write_gen); bdrv_set_dirty(bs, offset, bytes); stat64_max(&bs->wr_highest_offset, offset + bytes); if (ret >= 0) { bs->total_sectors = MAX(bs->total_sectors, end_sector); ret = 0; } Before the write, it also calls bs->before_write_notifiers. This is what bdrv_co_truncate() does after truncating the image: ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS); if (ret < 0) { error_setg_errno(errp, -ret, "Could not refresh total sector count"); } else { offset = bs->total_sectors * BDRV_SECTOR_SIZE; } bdrv_dirty_bitmap_truncate(bs, offset); bdrv_parent_cb_resize(bs); atomic_inc(&bs->write_gen); This means that we probably have at least the following bugs in bdrv_co_copy_range_internal(): 1. bs->write_gen is not increased, a following flush is ignored 2. Dirty bitmaps are not dirtied 3. Dirty bitmaps are not resized when extending the image 4. bs->wr_highest_offset is not adjusted correctly 5. bs->total_sectors is not resized (the bug this patch fixes) 6. The parent node isn't notified about an image size change Of these, 3. and 6. also seem to be bugs in bdrv_aligned_pwritev(). Kevin ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after copy offloading 2018-07-04 7:44 ` Kevin Wolf @ 2018-07-04 8:42 ` Fam Zheng 0 siblings, 0 replies; 5+ messages in thread From: Fam Zheng @ 2018-07-04 8:42 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-devel, qemu-block, Max Reitz, Eric Blake, John Snow, Stefan Hajnoczi On Wed, 07/04 09:44, Kevin Wolf wrote: > Am 04.07.2018 um 08:13 hat Fam Zheng geschrieben: > > This was noticed by the new image fleecing tests case 222. The issue is > > apparent and we should just do the same right things as in > > bdrv_aligned_pwritev. > > > > Reported-by: John Snow <jsnow@redhat.com> > > Signed-off-by: Fam Zheng <famz@redhat.com> > > > --- a/block/io.c > > +++ b/block/io.c > > @@ -2945,6 +2945,10 @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, > > dst, dst_offset, > > bytes, flags); > > } > > + if (ret == 0) { > > + int64_t end_sector = DIV_ROUND_UP(dst_offset + bytes, BDRV_SECTOR_SIZE); > > + dst->bs->total_sectors = MAX(dst->bs->total_sectors, end_sector); > > + } > > I think it's time to extract this stuff to a common function. I was > already aware that a write request that extends the image isn't behaving > exactly the same as bdrv_co_truncate(), and this one is bound to diverge > from the other two instances as well. > > This is what bdrv_aligned_pwritev() does after the write: > > atomic_inc(&bs->write_gen); > bdrv_set_dirty(bs, offset, bytes); > > stat64_max(&bs->wr_highest_offset, offset + bytes); > > if (ret >= 0) { > bs->total_sectors = MAX(bs->total_sectors, end_sector); > ret = 0; > } > > Before the write, it also calls bs->before_write_notifiers. > > This is what bdrv_co_truncate() does after truncating the image: > > ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS); > if (ret < 0) { > error_setg_errno(errp, -ret, "Could not refresh total sector count"); > } else { > offset = bs->total_sectors * BDRV_SECTOR_SIZE; > } > bdrv_dirty_bitmap_truncate(bs, offset); > bdrv_parent_cb_resize(bs); > atomic_inc(&bs->write_gen); > > This means that we probably have at least the following bugs in > bdrv_co_copy_range_internal(): > > 1. bs->write_gen is not increased, a following flush is ignored > 2. Dirty bitmaps are not dirtied > 3. Dirty bitmaps are not resized when extending the image > 4. bs->wr_highest_offset is not adjusted correctly > 5. bs->total_sectors is not resized (the bug this patch fixes) > 6. The parent node isn't notified about an image size change > > Of these, 3. and 6. also seem to be bugs in bdrv_aligned_pwritev(). > Indeed, great insight. I'll work on v2. Fam ^ permalink raw reply [flat|nested] 5+ messages in thread
* [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points 2018-07-04 6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng 2018-07-04 6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng @ 2018-07-04 6:13 ` Fam Zheng 1 sibling, 0 replies; 5+ messages in thread From: Fam Zheng @ 2018-07-04 6:13 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Fam Zheng, Kevin Wolf, Max Reitz, Eric Blake, John Snow, Stefan Hajnoczi A few trace points that can help reveal what is happening in a copy offloading I/O path. Signed-off-by: Fam Zheng <famz@redhat.com> --- block/file-posix.c | 2 ++ block/io.c | 2 ++ block/iscsi.c | 3 +++ block/trace-events | 6 ++++++ 4 files changed, 13 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index 829ee538d8..d3b1609410 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -1488,6 +1488,8 @@ static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb) ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off, aiocb->aio_fd2, &out_off, bytes, 0); + trace_copy_file_range(aiocb->bs, aiocb->aio_fildes, in_off, + aiocb->aio_fd2, out_off, bytes, 0, ret); if (ret == 0) { /* No progress (e.g. when beyond EOF), let the caller fall back to * buffer I/O. */ diff --git a/block/io.c b/block/io.c index 8e02f4ab95..df930b982f 100644 --- a/block/io.c +++ b/block/io.c @@ -2964,6 +2964,7 @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, BdrvRequestFlags flags) { + trace_bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, bytes, flags); return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, bytes, flags, true); } @@ -2976,6 +2977,7 @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, BdrvRequestFlags flags) { + trace_bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, bytes, flags, false); } diff --git a/block/iscsi.c b/block/iscsi.c index ead2bd5aa7..118555d051 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -44,6 +44,7 @@ #include "qapi/qmp/qstring.h" #include "crypto/secret.h" #include "scsi/utils.h" +#include "trace.h" /* Conflict between scsi/utils.h and libiscsi! :( */ #define SCSI_XFER_NONE ISCSI_XFER_NONE @@ -2396,6 +2397,8 @@ retry: } out_unlock: + + trace_iscsi_xcopy(src_lun, src_offset, dst_lun, dst_offset, bytes, r); g_free(iscsi_task.task); qemu_mutex_unlock(&dst_lun->mutex); g_free(iscsi_task.err_str); diff --git a/block/trace-events b/block/trace-events index c35287b48a..1a25a997f2 100644 --- a/block/trace-events +++ b/block/trace-events @@ -15,6 +15,8 @@ bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x" bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %"PRId64 +bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" flags 0x%x" +bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" flags 0x%x" # block/stream.c stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d" @@ -57,6 +59,7 @@ qmp_block_stream(void *bs, void *job) "bs %p job %p" # block/file-posix.c paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d" paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" +copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64 # block/qcow2.c qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d" @@ -150,3 +153,6 @@ nvme_free_req_queue_wait(void *q) "q %p" nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d" nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 0x%"PRIx64 nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] %p pages %d" + +# block/iscsi.c +iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, uint64_t bytes, int ret) "src_lun %p offset %"PRIu64" dst_lun %p offset %"PRIu64" bytes %"PRIu64" ret %d" -- 2.17.1 ^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-07-04 8:43 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-07-04 6:13 [Qemu-devel] [PATCH 0/2] block: Fix dst reading after tail copy offloading Fam Zheng 2018-07-04 6:13 ` [Qemu-devel] [PATCH 1/2] block: Fix dst total_sectors after " Fam Zheng 2018-07-04 7:44 ` Kevin Wolf 2018-07-04 8:42 ` Fam Zheng 2018-07-04 6:13 ` [Qemu-devel] [PATCH 2/2] block: Add copy offloading trace points Fam Zheng
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).