[Qemu-devel] [RFC 0/6] block: generic copy-on-read

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC 0/6] block: generic copy-on-read
@ 2011-10-17 15:47 Stefan Hajnoczi
  2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

The new -drive copy-on-read=on|off feature populates the image file with data
from the backing file on read.  This is useful when accessing images backed
over a slow medium (e.g. http over internet).  All read data will be stored in
the local image file so it does not need to be fetched again in the future.

This series is a prerequisite for the image streaming feature, which uses
copy-on-read to populate the image file in the background while the VM is
running.  However, the copy-on-read feature is useful on its own.

Copy-on-read is implemented by checking whether or not data is allocated in the
image file before reading it.  If data is not allocated then it needs to be
read and written back to the image file.

The tricky bit is avoiding races with other I/O requests.  These patches add
request tracking to BlockDriverState so that the list of pending requests is
available.  Copy-on-read prevents races by serializing overlapping requests.

Finally, there is a performance impact when enabling this feature since an
additional write is performed.  Serializing overlapping requests also means
that I/O patterns where multiple requests access the same cluster will see a
loss in parallelism.  Perhaps we can be smarter about preventing corruption in
the future and win back some performance.

Git repo:
http://repo.or.cz/w/qemu/stefanha.git/shortlog/refs/heads/generic-copy-on-read

Stefan Hajnoczi (6):
  block: add request tracking
  block: add bdrv_set_copy_on_read()
  block: wait for overlapping requests
  block: request overlap detection
  block: core copy-on-read logic
  block: add -drive copy-on-read=on|off

 block.c         |  243 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block.h         |    3 +
 block_int.h     |    9 ++
 blockdev.c      |    6 ++
 hmp-commands.hx |    5 +-
 qemu-config.c   |    4 +
 qemu-options.hx |    9 ++-
 trace-events    |    1 +
 8 files changed, 276 insertions(+), 4 deletions(-)

-- 
1.7.6.3

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-11-02 16:30   ` Kevin Wolf
  2011-11-07 11:00   ` Zhi Yong Wu
  2011-10-17 15:47 ` [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read() Stefan Hajnoczi
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

The block layer does not know about pending requests.  This information
is necessary for copy-on-read since overlapping requests must be
serialized to prevent races that corrupt the image.

Add a simple mechanism to enable/disable request tracking.  By default
request tracking is disabled.

The BlockDriverState gets a new tracked_request list field which
contains all pending requests.  Each request is a BdrvTrackedRequest
record with sector_num, nb_sectors, and is_write fields.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c     |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 block_int.h |    8 +++++
 2 files changed, 90 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 9873b57..2d2c62a 100644
--- a/block.c
+++ b/block.c
@@ -978,6 +978,77 @@ void bdrv_commit_all(void)
     }
 }
 
+struct BdrvTrackedRequest {
+    BlockDriverState *bs;
+    int64_t sector_num;
+    int nb_sectors;
+    bool is_write;
+    QLIST_ENTRY(BdrvTrackedRequest) list;
+};
+
+/**
+ * Remove an active request from the tracked requests list
+ *
+ * If req is NULL, no operation is performed.
+ *
+ * This function should be called when a tracked request is completing.
+ */
+static void tracked_request_remove(BdrvTrackedRequest *req)
+{
+    if (req) {
+        QLIST_REMOVE(req, list);
+        g_free(req);
+    }
+}
+
+/**
+ * Add an active request to the tracked requests list
+ *
+ * Return NULL if request tracking is disabled, non-NULL otherwise.
+ */
+static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
+                                               int64_t sector_num,
+                                               int nb_sectors, bool is_write)
+{
+    BdrvTrackedRequest *req;
+
+    if (!bs->request_tracking) {
+        return NULL;
+    }
+
+    req = g_malloc(sizeof(*req));
+    req->bs = bs;
+    req->sector_num = sector_num;
+    req->nb_sectors = nb_sectors;
+    req->is_write = is_write;
+
+    QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
+
+    return req;
+}
+
+/**
+ * Enable tracking of incoming requests
+ *
+ * Request tracking can be safely used by multiple users at the same time,
+ * there is an internal reference count to match start and stop calls.
+ */
+void bdrv_start_request_tracking(BlockDriverState *bs)
+{
+    bs->request_tracking++;
+}
+
+/**
+ * Disable tracking of incoming requests
+ *
+ * Note that in-flight requests are still tracked, this function only stops
+ * tracking incoming requests.
+ */
+void bdrv_stop_request_tracking(BlockDriverState *bs)
+{
+    bs->request_tracking--;
+}
+
 /*
  * Return values:
  * 0        - success
@@ -1262,6 +1333,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
     BlockDriver *drv = bs->drv;
+    BdrvTrackedRequest *req;
+    int ret;
 
     if (!drv) {
         return -ENOMEDIUM;
@@ -1270,7 +1343,10 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
-    return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    req = tracked_request_add(bs, sector_num, nb_sectors, false);
+    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    tracked_request_remove(req);
+    return ret;
 }
 
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
@@ -1288,6 +1364,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
 {
     BlockDriver *drv = bs->drv;
+    BdrvTrackedRequest *req;
     int ret;
 
     if (!bs->drv) {
@@ -1300,6 +1377,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
+    req = tracked_request_add(bs, sector_num, nb_sectors, true);
+
     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
 
     if (bs->dirty_bitmap) {
@@ -1310,6 +1389,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         bs->wr_highest_sector = sector_num + nb_sectors - 1;
     }
 
+    tracked_request_remove(req);
+
     return ret;
 }
 
diff --git a/block_int.h b/block_int.h
index f2f4f2d..87ce8b4 100644
--- a/block_int.h
+++ b/block_int.h
@@ -43,6 +43,8 @@
 #define BLOCK_OPT_PREALLOC      "preallocation"
 #define BLOCK_OPT_SUBFMT        "subformat"
 
+typedef struct BdrvTrackedRequest BdrvTrackedRequest;
+
 typedef struct AIOPool {
     void (*cancel)(BlockDriverAIOCB *acb);
     int aiocb_size;
@@ -206,6 +208,9 @@ struct BlockDriverState {
     int in_use; /* users other than guest access, eg. block migration */
     QTAILQ_ENTRY(BlockDriverState) list;
     void *private;
+
+    int request_tracking;       /* reference count, 0 means off */
+    QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 };
 
 struct BlockDriverAIOCB {
@@ -216,6 +221,9 @@ struct BlockDriverAIOCB {
     BlockDriverAIOCB *next;
 };
 
+void bdrv_start_request_tracking(BlockDriverState *bs);
+void bdrv_stop_request_tracking(BlockDriverState *bs);
+
 void get_tmp_filename(char *filename, int size);
 
 void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
@ 2011-11-02 16:30   ` Kevin Wolf
  2011-11-03  7:57     ` Stefan Hajnoczi
  2011-11-07 11:00   ` Zhi Yong Wu
  1 sibling, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2011-11-02 16:30 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> The block layer does not know about pending requests.  This information
> is necessary for copy-on-read since overlapping requests must be
> serialized to prevent races that corrupt the image.
> 
> Add a simple mechanism to enable/disable request tracking.  By default
> request tracking is disabled.
> 
> The BlockDriverState gets a new tracked_request list field which
> contains all pending requests.  Each request is a BdrvTrackedRequest
> record with sector_num, nb_sectors, and is_write fields.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c     |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  block_int.h |    8 +++++
>  2 files changed, 90 insertions(+), 1 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 9873b57..2d2c62a 100644
> --- a/block.c
> +++ b/block.c
> @@ -978,6 +978,77 @@ void bdrv_commit_all(void)
>      }
>  }
>  
> +struct BdrvTrackedRequest {
> +    BlockDriverState *bs;
> +    int64_t sector_num;
> +    int nb_sectors;
> +    bool is_write;
> +    QLIST_ENTRY(BdrvTrackedRequest) list;
> +};
> +
> +/**
> + * Remove an active request from the tracked requests list
> + *
> + * If req is NULL, no operation is performed.
> + *
> + * This function should be called when a tracked request is completing.
> + */
> +static void tracked_request_remove(BdrvTrackedRequest *req)
> +{
> +    if (req) {
> +        QLIST_REMOVE(req, list);
> +        g_free(req);
> +    }
> +}
> +
> +/**
> + * Add an active request to the tracked requests list
> + *
> + * Return NULL if request tracking is disabled, non-NULL otherwise.
> + */
> +static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
> +                                               int64_t sector_num,
> +                                               int nb_sectors, bool is_write)
> +{
> +    BdrvTrackedRequest *req;
> +
> +    if (!bs->request_tracking) {
> +        return NULL;
> +    }
> +
> +    req = g_malloc(sizeof(*req));
> +    req->bs = bs;
> +    req->sector_num = sector_num;
> +    req->nb_sectors = nb_sectors;
> +    req->is_write = is_write;

How about using g_malloc0 or a compound literal assignment for
initialisation, so that we won't get surprises when the struct is extended?

> +
> +    QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
> +
> +    return req;
> +}
> +
> +/**
> + * Enable tracking of incoming requests
> + *
> + * Request tracking can be safely used by multiple users at the same time,
> + * there is an internal reference count to match start and stop calls.
> + */
> +void bdrv_start_request_tracking(BlockDriverState *bs)
> +{
> +    bs->request_tracking++;
> +}
> +
> +/**
> + * Disable tracking of incoming requests
> + *
> + * Note that in-flight requests are still tracked, this function only stops
> + * tracking incoming requests.
> + */
> +void bdrv_stop_request_tracking(BlockDriverState *bs)
> +{
> +    bs->request_tracking--;
> +}
> +
>  /*
>   * Return values:
>   * 0        - success
> @@ -1262,6 +1333,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
>  {
>      BlockDriver *drv = bs->drv;
> +    BdrvTrackedRequest *req;
> +    int ret;
>  
>      if (!drv) {
>          return -ENOMEDIUM;
> @@ -1270,7 +1343,10 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>          return -EIO;
>      }
>  
> -    return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
> +    req = tracked_request_add(bs, sector_num, nb_sectors, false);
> +    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
> +    tracked_request_remove(req);

Hm... Do we actually need to malloc the BdrvTrackedRequest? If all users
are like this (and at least those in this patch are), we can just keep
it on the coroutine stack.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-11-02 16:30   ` Kevin Wolf
@ 2011-11-03  7:57     ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-03  7:57 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi, qemu-devel

On Wed, Nov 2, 2011 at 4:30 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
>> The block layer does not know about pending requests.  This information
>> is necessary for copy-on-read since overlapping requests must be
>> serialized to prevent races that corrupt the image.
>>
>> Add a simple mechanism to enable/disable request tracking.  By default
>> request tracking is disabled.
>>
>> The BlockDriverState gets a new tracked_request list field which
>> contains all pending requests.  Each request is a BdrvTrackedRequest
>> record with sector_num, nb_sectors, and is_write fields.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block.c     |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  block_int.h |    8 +++++
>>  2 files changed, 90 insertions(+), 1 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 9873b57..2d2c62a 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -978,6 +978,77 @@ void bdrv_commit_all(void)
>>      }
>>  }
>>
>> +struct BdrvTrackedRequest {
>> +    BlockDriverState *bs;
>> +    int64_t sector_num;
>> +    int nb_sectors;
>> +    bool is_write;
>> +    QLIST_ENTRY(BdrvTrackedRequest) list;
>> +};
>> +
>> +/**
>> + * Remove an active request from the tracked requests list
>> + *
>> + * If req is NULL, no operation is performed.
>> + *
>> + * This function should be called when a tracked request is completing.
>> + */
>> +static void tracked_request_remove(BdrvTrackedRequest *req)
>> +{
>> +    if (req) {
>> +        QLIST_REMOVE(req, list);
>> +        g_free(req);
>> +    }
>> +}
>> +
>> +/**
>> + * Add an active request to the tracked requests list
>> + *
>> + * Return NULL if request tracking is disabled, non-NULL otherwise.
>> + */
>> +static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>> +                                               int64_t sector_num,
>> +                                               int nb_sectors, bool is_write)
>> +{
>> +    BdrvTrackedRequest *req;
>> +
>> +    if (!bs->request_tracking) {
>> +        return NULL;
>> +    }
>> +
>> +    req = g_malloc(sizeof(*req));
>> +    req->bs = bs;
>> +    req->sector_num = sector_num;
>> +    req->nb_sectors = nb_sectors;
>> +    req->is_write = is_write;
>
> How about using g_malloc0 or a compound literal assignment for
> initialisation, so that we won't get surprises when the struct is extended?

Sure.

>> +
>> +    QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>> +
>> +    return req;
>> +}
>> +
>> +/**
>> + * Enable tracking of incoming requests
>> + *
>> + * Request tracking can be safely used by multiple users at the same time,
>> + * there is an internal reference count to match start and stop calls.
>> + */
>> +void bdrv_start_request_tracking(BlockDriverState *bs)
>> +{
>> +    bs->request_tracking++;
>> +}
>> +
>> +/**
>> + * Disable tracking of incoming requests
>> + *
>> + * Note that in-flight requests are still tracked, this function only stops
>> + * tracking incoming requests.
>> + */
>> +void bdrv_stop_request_tracking(BlockDriverState *bs)
>> +{
>> +    bs->request_tracking--;
>> +}
>> +
>>  /*
>>   * Return values:
>>   * 0        - success
>> @@ -1262,6 +1333,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
>>  {
>>      BlockDriver *drv = bs->drv;
>> +    BdrvTrackedRequest *req;
>> +    int ret;
>>
>>      if (!drv) {
>>          return -ENOMEDIUM;
>> @@ -1270,7 +1343,10 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>          return -EIO;
>>      }
>>
>> -    return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
>> +    req = tracked_request_add(bs, sector_num, nb_sectors, false);
>> +    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
>> +    tracked_request_remove(req);
>
> Hm... Do we actually need to malloc the BdrvTrackedRequest? If all users
> are like this (and at least those in this patch are), we can just keep
> it on the coroutine stack.

Okay.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
  2011-11-02 16:30   ` Kevin Wolf
@ 2011-11-07 11:00   ` Zhi Yong Wu
  2011-11-07 11:41     ` Stefan Hajnoczi
  1 sibling, 1 reply; 30+ messages in thread
From: Zhi Yong Wu @ 2011-11-07 11:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> The block layer does not know about pending requests.  This information
> is necessary for copy-on-read since overlapping requests must be
> serialized to prevent races that corrupt the image.
>
> Add a simple mechanism to enable/disable request tracking.  By default
> request tracking is disabled.
>
> The BlockDriverState gets a new tracked_request list field which
> contains all pending requests.  Each request is a BdrvTrackedRequest
> record with sector_num, nb_sectors, and is_write fields.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c     |   83 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  block_int.h |    8 +++++
>  2 files changed, 90 insertions(+), 1 deletions(-)
>
> diff --git a/block.c b/block.c
> index 9873b57..2d2c62a 100644
> --- a/block.c
> +++ b/block.c
> @@ -978,6 +978,77 @@ void bdrv_commit_all(void)
>     }
>  }
>
> +struct BdrvTrackedRequest {
> +    BlockDriverState *bs;
> +    int64_t sector_num;
> +    int nb_sectors;
> +    bool is_write;
> +    QLIST_ENTRY(BdrvTrackedRequest) list;
> +};
> +
> +/**
> + * Remove an active request from the tracked requests list
> + *
> + * If req is NULL, no operation is performed.
> + *
> + * This function should be called when a tracked request is completing.
> + */
> +static void tracked_request_remove(BdrvTrackedRequest *req)
> +{
> +    if (req) {
> +        QLIST_REMOVE(req, list);
> +        g_free(req);
> +    }
> +}
> +
> +/**
> + * Add an active request to the tracked requests list
> + *
> + * Return NULL if request tracking is disabled, non-NULL otherwise.
> + */
> +static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
> +                                               int64_t sector_num,
> +                                               int nb_sectors, bool is_write)
> +{
> +    BdrvTrackedRequest *req;
> +
> +    if (!bs->request_tracking) {
> +        return NULL;
> +    }
> +
> +    req = g_malloc(sizeof(*req));
> +    req->bs = bs;
> +    req->sector_num = sector_num;
> +    req->nb_sectors = nb_sectors;
> +    req->is_write = is_write;
> +
> +    QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
> +
> +    return req;
> +}
> +
> +/**
> + * Enable tracking of incoming requests
> + *
> + * Request tracking can be safely used by multiple users at the same time,
> + * there is an internal reference count to match start and stop calls.
> + */
> +void bdrv_start_request_tracking(BlockDriverState *bs)
> +{
> +    bs->request_tracking++;
> +}
> +
> +/**
> + * Disable tracking of incoming requests
> + *
> + * Note that in-flight requests are still tracked, this function only stops
> + * tracking incoming requests.
> + */
> +void bdrv_stop_request_tracking(BlockDriverState *bs)
> +{
> +    bs->request_tracking--;
> +}
I don't understand what the real intention for the above functions is.
IMHO, why can we not drop them?


> +
>  /*
>  * Return values:
>  * 0        - success
> @@ -1262,6 +1333,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
>  {
>     BlockDriver *drv = bs->drv;
> +    BdrvTrackedRequest *req;
> +    int ret;
>
>     if (!drv) {
>         return -ENOMEDIUM;
> @@ -1270,7 +1343,10 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>         return -EIO;
>     }
>
> -    return drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
> +    req = tracked_request_add(bs, sector_num, nb_sectors, false);
> +    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
> +    tracked_request_remove(req);
> +    return ret;
>  }
>
>  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
> @@ -1288,6 +1364,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
>  {
>     BlockDriver *drv = bs->drv;
> +    BdrvTrackedRequest *req;
>     int ret;
>
>     if (!bs->drv) {
> @@ -1300,6 +1377,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>         return -EIO;
>     }
>
> +    req = tracked_request_add(bs, sector_num, nb_sectors, true);
> +
>     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
>
>     if (bs->dirty_bitmap) {
> @@ -1310,6 +1389,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
>         bs->wr_highest_sector = sector_num + nb_sectors - 1;
>     }
>
> +    tracked_request_remove(req);
> +
>     return ret;
>  }
>
> diff --git a/block_int.h b/block_int.h
> index f2f4f2d..87ce8b4 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -43,6 +43,8 @@
>  #define BLOCK_OPT_PREALLOC      "preallocation"
>  #define BLOCK_OPT_SUBFMT        "subformat"
>
> +typedef struct BdrvTrackedRequest BdrvTrackedRequest;
> +
>  typedef struct AIOPool {
>     void (*cancel)(BlockDriverAIOCB *acb);
>     int aiocb_size;
> @@ -206,6 +208,9 @@ struct BlockDriverState {
>     int in_use; /* users other than guest access, eg. block migration */
>     QTAILQ_ENTRY(BlockDriverState) list;
>     void *private;
> +
> +    int request_tracking;       /* reference count, 0 means off */
> +    QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
>  };
>
>  struct BlockDriverAIOCB {
> @@ -216,6 +221,9 @@ struct BlockDriverAIOCB {
>     BlockDriverAIOCB *next;
>  };
>
> +void bdrv_start_request_tracking(BlockDriverState *bs);
> +void bdrv_stop_request_tracking(BlockDriverState *bs);
> +
>  void get_tmp_filename(char *filename, int size);
>
>  void *qemu_aio_get(AIOPool *pool, BlockDriverState *bs,
> --
> 1.7.6.3
>
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-11-07 11:00   ` Zhi Yong Wu
@ 2011-11-07 11:41     ` Stefan Hajnoczi
  2011-11-08  6:13       ` Zhi Yong Wu
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-07 11:41 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi,
	qemu-devel

On Mon, Nov 7, 2011 at 11:00 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
>> +/**
>> + * Enable tracking of incoming requests
>> + *
>> + * Request tracking can be safely used by multiple users at the same time,
>> + * there is an internal reference count to match start and stop calls.
>> + */
>> +void bdrv_start_request_tracking(BlockDriverState *bs)
>> +{
>> +    bs->request_tracking++;
>> +}
>> +
>> +/**
>> + * Disable tracking of incoming requests
>> + *
>> + * Note that in-flight requests are still tracked, this function only stops
>> + * tracking incoming requests.
>> + */
>> +void bdrv_stop_request_tracking(BlockDriverState *bs)
>> +{
>> +    bs->request_tracking--;
>> +}
> I don't understand what the real intention for the above functions is.
> IMHO, why can we not drop them?

I have dropped them after removing the g_malloc() as Kevin suggested.
The idea was to avoid the overhead of request tracking when no feature
is using request tracking.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 1/6] block: add request tracking
  2011-11-07 11:41     ` Stefan Hajnoczi
@ 2011-11-08  6:13       ` Zhi Yong Wu
  0 siblings, 0 replies; 30+ messages in thread
From: Zhi Yong Wu @ 2011-11-08  6:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi,
	qemu-devel

On Mon, Nov 7, 2011 at 7:41 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Nov 7, 2011 at 11:00 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
>> <stefanha@linux.vnet.ibm.com> wrote:
>>> +/**
>>> + * Enable tracking of incoming requests
>>> + *
>>> + * Request tracking can be safely used by multiple users at the same time,
>>> + * there is an internal reference count to match start and stop calls.
>>> + */
>>> +void bdrv_start_request_tracking(BlockDriverState *bs)
>>> +{
>>> +    bs->request_tracking++;
>>> +}
>>> +
>>> +/**
>>> + * Disable tracking of incoming requests
>>> + *
>>> + * Note that in-flight requests are still tracked, this function only stops
>>> + * tracking incoming requests.
>>> + */
>>> +void bdrv_stop_request_tracking(BlockDriverState *bs)
>>> +{
>>> +    bs->request_tracking--;
>>> +}
>> I don't understand what the real intention for the above functions is.
>> IMHO, why can we not drop them?
>
> I have dropped them after removing the g_malloc() as Kevin suggested.
> The idea was to avoid the overhead of request tracking when no feature
> is using request tracking.
Great
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read()
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
  2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-11-02 16:36   ` Kevin Wolf
  2011-10-17 15:47 ` [Qemu-devel] [RFC 3/6] block: wait for overlapping requests Stefan Hajnoczi
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

The bdrv_set_copy_on_read() function can be used to programmatically
enable or disable copy-on-read for a block device.  Later patches add
the actual copy-on-read logic.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c     |   17 +++++++++++++++++
 block.h     |    3 +++
 block_int.h |    1 +
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 2d2c62a..e624ac3 100644
--- a/block.c
+++ b/block.c
@@ -464,6 +464,18 @@ int bdrv_parse_cache_flags(const char *mode, int *flags)
     return 0;
 }
 
+void bdrv_set_copy_on_read(BlockDriverState *bs, int enable)
+{
+    if (bs->copy_on_read != enable) {
+        if (enable) {
+            bdrv_start_request_tracking(bs);
+        } else {
+            bdrv_stop_request_tracking(bs);
+        }
+    }
+    bs->copy_on_read = enable;
+}
+
 /*
  * Common part for opening disk images and files
  */
@@ -483,6 +495,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->open_flags = flags;
     bs->buffer_alignment = 512;
 
+    bs->copy_on_read = 0;
+    if (flags & BDRV_O_RDWR) {
+        bdrv_set_copy_on_read(bs, !!(flags & BDRV_O_COPY_ON_READ));
+    }
+
     pstrcpy(bs->filename, sizeof(bs->filename), filename);
 
     if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv)) {
diff --git a/block.h b/block.h
index e77988e..cb9832c 100644
--- a/block.h
+++ b/block.h
@@ -61,6 +61,7 @@ typedef struct BlockDevOps {
 #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
 #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
 #define BDRV_O_NO_FLUSH    0x0200 /* disable flushing on this disk */
+#define BDRV_O_COPY_ON_READ 0x0400 /* copy read backing sectors into image */
 
 #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
 
@@ -291,6 +292,8 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
                       int nr_sectors);
 int64_t bdrv_get_dirty_count(BlockDriverState *bs);
 
+void bdrv_set_copy_on_read(BlockDriverState *bs, int enable);
+
 void bdrv_set_in_use(BlockDriverState *bs, int in_use);
 int bdrv_in_use(BlockDriverState *bs);
 
diff --git a/block_int.h b/block_int.h
index 87ce8b4..8eb4795 100644
--- a/block_int.h
+++ b/block_int.h
@@ -160,6 +160,7 @@ struct BlockDriverState {
     int encrypted; /* if true, the media is encrypted */
     int valid_key; /* if true, a valid encryption key has been set */
     int sg;        /* if true, the device is a /dev/sg* */
+    int copy_on_read; /* if true, copy read backing sectors into image */
 
     BlockDriver *drv; /* NULL means no media */
     void *opaque;
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read()
  2011-10-17 15:47 ` [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read() Stefan Hajnoczi
@ 2011-11-02 16:36   ` Kevin Wolf
  2011-11-03  8:01     ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2011-11-02 16:36 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> The bdrv_set_copy_on_read() function can be used to programmatically
> enable or disable copy-on-read for a block device.  Later patches add
> the actual copy-on-read logic.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c     |   17 +++++++++++++++++
>  block.h     |    3 +++
>  block_int.h |    1 +
>  3 files changed, 21 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 2d2c62a..e624ac3 100644
> --- a/block.c
> +++ b/block.c
> @@ -464,6 +464,18 @@ int bdrv_parse_cache_flags(const char *mode, int *flags)
>      return 0;
>  }
>  
> +void bdrv_set_copy_on_read(BlockDriverState *bs, int enable)

bool enable

> +{
> +    if (bs->copy_on_read != enable) {
> +        if (enable) {
> +            bdrv_start_request_tracking(bs);
> +        } else {
> +            bdrv_stop_request_tracking(bs);
> +        }
> +    }
> +    bs->copy_on_read = enable;
> +}
> +
>  /*
>   * Common part for opening disk images and files
>   */
> @@ -483,6 +495,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
>      bs->open_flags = flags;
>      bs->buffer_alignment = 512;
>  
> +    bs->copy_on_read = 0;

I think it should rather be reset on close. We can assert() it here.

> +    if (flags & BDRV_O_RDWR) {
> +        bdrv_set_copy_on_read(bs, !!(flags & BDRV_O_COPY_ON_READ));
> +    }
> +
>      pstrcpy(bs->filename, sizeof(bs->filename), filename);
>  
>      if (use_bdrv_whitelist && !bdrv_is_whitelisted(drv)) {
> diff --git a/block.h b/block.h
> index e77988e..cb9832c 100644
> --- a/block.h
> +++ b/block.h
> @@ -61,6 +61,7 @@ typedef struct BlockDevOps {
>  #define BDRV_O_NATIVE_AIO  0x0080 /* use native AIO instead of the thread pool */
>  #define BDRV_O_NO_BACKING  0x0100 /* don't open the backing file */
>  #define BDRV_O_NO_FLUSH    0x0200 /* disable flushing on this disk */
> +#define BDRV_O_COPY_ON_READ 0x0400 /* copy read backing sectors into image */
>  
>  #define BDRV_O_CACHE_MASK  (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
>  
> @@ -291,6 +292,8 @@ void bdrv_reset_dirty(BlockDriverState *bs, int64_t cur_sector,
>                        int nr_sectors);
>  int64_t bdrv_get_dirty_count(BlockDriverState *bs);
>  
> +void bdrv_set_copy_on_read(BlockDriverState *bs, int enable);
> +
>  void bdrv_set_in_use(BlockDriverState *bs, int in_use);
>  int bdrv_in_use(BlockDriverState *bs);
>  
> diff --git a/block_int.h b/block_int.h
> index 87ce8b4..8eb4795 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -160,6 +160,7 @@ struct BlockDriverState {
>      int encrypted; /* if true, the media is encrypted */
>      int valid_key; /* if true, a valid encryption key has been set */
>      int sg;        /* if true, the device is a /dev/sg* */
> +    int copy_on_read; /* if true, copy read backing sectors into image */

bool.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read()
  2011-11-02 16:36   ` Kevin Wolf
@ 2011-11-03  8:01     ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-03  8:01 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi, qemu-devel

On Wed, Nov 2, 2011 at 4:36 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
>> The bdrv_set_copy_on_read() function can be used to programmatically
>> enable or disable copy-on-read for a block device.  Later patches add
>> the actual copy-on-read logic.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block.c     |   17 +++++++++++++++++
>>  block.h     |    3 +++
>>  block_int.h |    1 +
>>  3 files changed, 21 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 2d2c62a..e624ac3 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -464,6 +464,18 @@ int bdrv_parse_cache_flags(const char *mode, int *flags)
>>      return 0;
>>  }
>>
>> +void bdrv_set_copy_on_read(BlockDriverState *bs, int enable)
>
> bool enable

Thanks for pointing this out.

>> +{
>> +    if (bs->copy_on_read != enable) {
>> +        if (enable) {
>> +            bdrv_start_request_tracking(bs);
>> +        } else {
>> +            bdrv_stop_request_tracking(bs);
>> +        }
>> +    }
>> +    bs->copy_on_read = enable;
>> +}
>> +
>>  /*
>>   * Common part for opening disk images and files
>>   */
>> @@ -483,6 +495,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
>>      bs->open_flags = flags;
>>      bs->buffer_alignment = 512;
>>
>> +    bs->copy_on_read = 0;
>
> I think it should rather be reset on close. We can assert() it here.

Okay.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 3/6] block: wait for overlapping requests
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
  2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
  2011-10-17 15:47 ` [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read() Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-10-18 13:48   ` Marcelo Tosatti
  2011-11-03 14:17   ` Kevin Wolf
  2011-10-17 15:47 ` [Qemu-devel] [RFC 4/6] block: request overlap detection Stefan Hajnoczi
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

When copy-on-read is enabled it is necessary to wait for overlapping
requests before issuing new requests.  This prevents races between the
copy-on-read and a write request.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c |   39 +++++++++++++++++++++++++++++++++++++++
 1 files changed, 39 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index e624ac3..cc3202c 100644
--- a/block.c
+++ b/block.c
@@ -1001,6 +1001,7 @@ struct BdrvTrackedRequest {
     int nb_sectors;
     bool is_write;
     QLIST_ENTRY(BdrvTrackedRequest) list;
+    CoQueue wait_queue; /* coroutines blocked on this request */
 };
 
 /**
@@ -1014,6 +1015,12 @@ static void tracked_request_remove(BdrvTrackedRequest *req)
 {
     if (req) {
         QLIST_REMOVE(req, list);
+
+        /* Wake up all coroutines blocked on this request */
+        while (qemu_co_queue_next(&req->wait_queue)) {
+            /* Do nothing */
+        }
+
         g_free(req);
     }
 }
@@ -1038,12 +1045,36 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
     req->sector_num = sector_num;
     req->nb_sectors = nb_sectors;
     req->is_write = is_write;
+    qemu_co_queue_init(&req->wait_queue);
 
     QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 
     return req;
 }
 
+static bool tracked_request_overlaps(BdrvTrackedRequest *req,
+                                     int64_t sector_num, int nb_sectors) {
+    return false; /* not yet implemented */
+}
+
+static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors)
+{
+    BdrvTrackedRequest *req;
+    bool retry;
+
+    do {
+        retry = false;
+        QLIST_FOREACH(req, &bs->tracked_requests, list) {
+            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
+                qemu_co_queue_wait(&req->wait_queue);
+                retry = true;
+                break;
+            }
+        }
+    } while (retry);
+}
+
 /**
  * Enable tracking of incoming requests
  *
@@ -1360,6 +1391,10 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         return -EIO;
     }
 
+    if (bs->copy_on_read) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    }
+
     req = tracked_request_add(bs, sector_num, nb_sectors, false);
     ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
     tracked_request_remove(req);
@@ -1394,6 +1429,10 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
+    if (bs->copy_on_read) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    }
+
     req = tracked_request_add(bs, sector_num, nb_sectors, true);
 
     ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 3/6] block: wait for overlapping requests
  2011-10-17 15:47 ` [Qemu-devel] [RFC 3/6] block: wait for overlapping requests Stefan Hajnoczi
@ 2011-10-18 13:48   ` Marcelo Tosatti
  2011-10-20 17:34     ` Stefan Hajnoczi
  2011-11-03 14:17   ` Kevin Wolf
  1 sibling, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2011-10-18 13:48 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

On Mon, Oct 17, 2011 at 04:47:29PM +0100, Stefan Hajnoczi wrote:
> When copy-on-read is enabled it is necessary to wait for overlapping
> requests before issuing new requests.  This prevents races between the
> copy-on-read and a write request.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c |   39 +++++++++++++++++++++++++++++++++++++++
>  1 files changed, 39 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index e624ac3..cc3202c 100644
> --- a/block.c
> +++ b/block.c
> @@ -1001,6 +1001,7 @@ struct BdrvTrackedRequest {
>      int nb_sectors;
>      bool is_write;
>      QLIST_ENTRY(BdrvTrackedRequest) list;
> +    CoQueue wait_queue; /* coroutines blocked on this request */
>  };
>  
>  /**
> @@ -1014,6 +1015,12 @@ static void tracked_request_remove(BdrvTrackedRequest *req)
>  {
>      if (req) {
>          QLIST_REMOVE(req, list);
> +
> +        /* Wake up all coroutines blocked on this request */
> +        while (qemu_co_queue_next(&req->wait_queue)) {
> +            /* Do nothing */
> +        }
> +
>          g_free(req);
>      }
>  }
> @@ -1038,12 +1045,36 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>      req->sector_num = sector_num;
>      req->nb_sectors = nb_sectors;
>      req->is_write = is_write;
> +    qemu_co_queue_init(&req->wait_queue);
>  
>      QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>  
>      return req;
>  }
>  
> +static bool tracked_request_overlaps(BdrvTrackedRequest *req,
> +                                     int64_t sector_num, int nb_sectors) {
> +    return false; /* not yet implemented */
> +}
> +
> +static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors)
> +{
> +    BdrvTrackedRequest *req;
> +    bool retry;
> +
> +    do {
> +        retry = false;
> +        QLIST_FOREACH(req, &bs->tracked_requests, list) {
> +            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
> +                qemu_co_queue_wait(&req->wait_queue);
> +                retry = true;

What prevents overlapping requests (from waiter criteria) to be inserted
to the queue while there are waiters again?

That is, why is it not possible for a waiter to wait indefinetely?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 3/6] block: wait for overlapping requests
  2011-10-18 13:48   ` Marcelo Tosatti
@ 2011-10-20 17:34     ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-20 17:34 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel

On Tue, Oct 18, 2011 at 6:48 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Oct 17, 2011 at 04:47:29PM +0100, Stefan Hajnoczi wrote:
>> When copy-on-read is enabled it is necessary to wait for overlapping
>> requests before issuing new requests.  This prevents races between the
>> copy-on-read and a write request.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block.c |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index e624ac3..cc3202c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1001,6 +1001,7 @@ struct BdrvTrackedRequest {
>>      int nb_sectors;
>>      bool is_write;
>>      QLIST_ENTRY(BdrvTrackedRequest) list;
>> +    CoQueue wait_queue; /* coroutines blocked on this request */
>>  };
>>
>>  /**
>> @@ -1014,6 +1015,12 @@ static void tracked_request_remove(BdrvTrackedRequest *req)
>>  {
>>      if (req) {
>>          QLIST_REMOVE(req, list);
>> +
>> +        /* Wake up all coroutines blocked on this request */
>> +        while (qemu_co_queue_next(&req->wait_queue)) {
>> +            /* Do nothing */
>> +        }
>> +
>>          g_free(req);
>>      }
>>  }
>> @@ -1038,12 +1045,36 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>>      req->sector_num = sector_num;
>>      req->nb_sectors = nb_sectors;
>>      req->is_write = is_write;
>> +    qemu_co_queue_init(&req->wait_queue);
>>
>>      QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
>>
>>      return req;
>>  }
>>
>> +static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>> +                                     int64_t sector_num, int nb_sectors) {
>> +    return false; /* not yet implemented */
>> +}
>> +
>> +static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors)
>> +{
>> +    BdrvTrackedRequest *req;
>> +    bool retry;
>> +
>> +    do {
>> +        retry = false;
>> +        QLIST_FOREACH(req, &bs->tracked_requests, list) {
>> +            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
>> +                qemu_co_queue_wait(&req->wait_queue);
>> +                retry = true;
>
> What prevents overlapping requests (from waiter criteria) to be inserted
> to the queue while there are waiters again?
>
> That is, why is it not possible for a waiter to wait indefinetely?

CoQueue is FIFO so overlapping requests are processed in order and
will not wait indefinitely.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 3/6] block: wait for overlapping requests
  2011-10-17 15:47 ` [Qemu-devel] [RFC 3/6] block: wait for overlapping requests Stefan Hajnoczi
  2011-10-18 13:48   ` Marcelo Tosatti
@ 2011-11-03 14:17   ` Kevin Wolf
  1 sibling, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2011-11-03 14:17 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> When copy-on-read is enabled it is necessary to wait for overlapping
> requests before issuing new requests.  This prevents races between the
> copy-on-read and a write request.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>

This doesn't only order guest request against COR requests, but also
makes guest requests wait on each other. It's probably not a big
problem, but if we had to optimise performance with COR later, this is
something to remember.

Doing an optimisation that only requests of different type are ordered
wouldn't be too hard, though maybe avoiding starvation of the other type
could get a bit harder.

Let's leave it as it is for now.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2011-10-17 15:47 ` [Qemu-devel] [RFC 3/6] block: wait for overlapping requests Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-11-07 11:49   ` Zhi Yong Wu
  2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

Detect overlapping requests and remember to align to cluster boundaries
if the image format uses them.  This assumes that allocating I/O is
performed in cluster granularity - which is true for qcow2, qed, etc.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c |   39 +++++++++++++++++++++++++++++++++++++--
 1 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index cc3202c..0c22741 100644
--- a/block.c
+++ b/block.c
@@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
     return req;
 }
 
+/**
+ * Round a region to cluster boundaries
+ */
+static void round_to_clusters(BlockDriverState *bs,
+                              int64_t sector_num, int nb_sectors,
+                              int64_t *cluster_sector_num,
+                              int *cluster_nb_sectors)
+{
+    BlockDriverInfo bdi;
+
+    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+        *cluster_sector_num = sector_num;
+        *cluster_nb_sectors = nb_sectors;
+    } else {
+        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
+        *cluster_sector_num = (sector_num / c) * c;
+        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
+    }
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t sector_num, int nb_sectors) {
-    return false; /* not yet implemented */
+    /*        aaaa   bbbb */
+    if (sector_num >= req->sector_num + req->nb_sectors) {
+        return false;
+    }
+    /* bbbb   aaaa        */
+    if (req->sector_num >= sector_num + nb_sectors) {
+        return false;
+    }
+    return true;
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors)
 {
     BdrvTrackedRequest *req;
+    int64_t cluster_sector_num;
+    int cluster_nb_sectors;
     bool retry;
 
+    /* If we touch the same cluster it counts as an overlap */
+    round_to_clusters(bs, sector_num, nb_sectors,
+                      &cluster_sector_num, &cluster_nb_sectors);
+
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
+            if (tracked_request_overlaps(req, cluster_sector_num,
+                                         cluster_nb_sectors)) {
                 qemu_co_queue_wait(&req->wait_queue);
                 retry = true;
                 break;
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-10-17 15:47 ` [Qemu-devel] [RFC 4/6] block: request overlap detection Stefan Hajnoczi
@ 2011-11-07 11:49   ` Zhi Yong Wu
  2011-11-07 14:37     ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Zhi Yong Wu @ 2011-11-07 11:49 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> Detect overlapping requests and remember to align to cluster boundaries
> if the image format uses them.  This assumes that allocating I/O is
> performed in cluster granularity - which is true for qcow2, qed, etc.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c |   39 +++++++++++++++++++++++++++++++++++++--
>  1 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index cc3202c..0c22741 100644
> --- a/block.c
> +++ b/block.c
> @@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>     return req;
>  }
>
> +/**
> + * Round a region to cluster boundaries
> + */
> +static void round_to_clusters(BlockDriverState *bs,
> +                              int64_t sector_num, int nb_sectors,
> +                              int64_t *cluster_sector_num,
> +                              int *cluster_nb_sectors)
> +{
> +    BlockDriverInfo bdi;
> +
> +    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
> +        *cluster_sector_num = sector_num;
> +        *cluster_nb_sectors = nb_sectors;
> +    } else {
> +        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
> +        *cluster_sector_num = (sector_num / c) * c;
             I can understand the above formula, but the one below is
very magic. :) and can not be understood by me.
> +        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;

> +    }
> +}
> +
>  static bool tracked_request_overlaps(BdrvTrackedRequest *req,
>                                      int64_t sector_num, int nb_sectors) {
> -    return false; /* not yet implemented */
> +    /*        aaaa   bbbb */
> +    if (sector_num >= req->sector_num + req->nb_sectors) {
> +        return false;
> +    }
> +    /* bbbb   aaaa        */
> +    if (req->sector_num >= sector_num + nb_sectors) {
> +        return false;
> +    }
> +    return true;
>  }
>
>  static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
>         int64_t sector_num, int nb_sectors)
>  {
>     BdrvTrackedRequest *req;
> +    int64_t cluster_sector_num;
> +    int cluster_nb_sectors;
>     bool retry;
>
> +    /* If we touch the same cluster it counts as an overlap */
> +    round_to_clusters(bs, sector_num, nb_sectors,
> +                      &cluster_sector_num, &cluster_nb_sectors);
> +
>     do {
>         retry = false;
>         QLIST_FOREACH(req, &bs->tracked_requests, list) {
> -            if (tracked_request_overlaps(req, sector_num, nb_sectors)) {
> +            if (tracked_request_overlaps(req, cluster_sector_num,
> +                                         cluster_nb_sectors)) {
>                 qemu_co_queue_wait(&req->wait_queue);
>                 retry = true;
>                 break;
> --
> 1.7.6.3
>
>
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-11-07 11:49   ` Zhi Yong Wu
@ 2011-11-07 14:37     ` Stefan Hajnoczi
  2011-11-08  6:34       ` Zhi Yong Wu
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-07 14:37 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi,
	qemu-devel

On Mon, Nov 7, 2011 at 11:49 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
> <stefanha@linux.vnet.ibm.com> wrote:
>> Detect overlapping requests and remember to align to cluster boundaries
>> if the image format uses them.  This assumes that allocating I/O is
>> performed in cluster granularity - which is true for qcow2, qed, etc.
>>
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block.c |   39 +++++++++++++++++++++++++++++++++++++--
>>  1 files changed, 37 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index cc3202c..0c22741 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>>     return req;
>>  }
>>
>> +/**
>> + * Round a region to cluster boundaries
>> + */
>> +static void round_to_clusters(BlockDriverState *bs,
>> +                              int64_t sector_num, int nb_sectors,
>> +                              int64_t *cluster_sector_num,
>> +                              int *cluster_nb_sectors)
>> +{
>> +    BlockDriverInfo bdi;
>> +
>> +    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
>> +        *cluster_sector_num = sector_num;
>> +        *cluster_nb_sectors = nb_sectors;
>> +    } else {
>> +        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
>> +        *cluster_sector_num = (sector_num / c) * c;
>             I can understand the above formula, but the one below is
> very magic. :) and can not be understood by me.
>> +        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;

I agree this is ugly.  Here is what is going on:

c = number of sectors per cluster
cluster_sector_num = sector number rounded *down* to cluster boundary
cluster_nb_sectors = number of sectors from cluster_sector_num to
rounded up sector_num+nb_sectors

So the magic expression is takes the original sector_num to
sector_num+nb_sectors region:

|---XXX|XXX---|

Where |-----| is a cluster and XXXX is the region from sector_num to
sector_num+nb_sectors, then the output should be:

|RRRRRR|RRRRRR|

Everything has been rounded to clusters.  So here is the expression broken down:

*cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
                        AAAAAAAAAAAAAA    XXXXXXXXXX   BBBBBBBBBBBBBB

|AAAXXX|XXXBBB|

A is actually equivalent to sector_num - cluster_sector_num.

X is the original unrounded region.

B is the rounding up to the next cluster bounary.

Another way of writing this:

*cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
nb_sectors, c);

I'll try to improve the code in the next revision.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-11-07 14:37     ` Stefan Hajnoczi
@ 2011-11-08  6:34       ` Zhi Yong Wu
  2011-11-08  8:16         ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Zhi Yong Wu @ 2011-11-08  6:34 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi,
	qemu-devel

On Mon, Nov 7, 2011 at 10:37 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Mon, Nov 7, 2011 at 11:49 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
>> <stefanha@linux.vnet.ibm.com> wrote:
>>> Detect overlapping requests and remember to align to cluster boundaries
>>> if the image format uses them.  This assumes that allocating I/O is
>>> performed in cluster granularity - which is true for qcow2, qed, etc.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>>> ---
>>>  block.c |   39 +++++++++++++++++++++++++++++++++++++--
>>>  1 files changed, 37 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block.c b/block.c
>>> index cc3202c..0c22741 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>>>     return req;
>>>  }
>>>
>>> +/**
>>> + * Round a region to cluster boundaries
>>> + */
>>> +static void round_to_clusters(BlockDriverState *bs,
>>> +                              int64_t sector_num, int nb_sectors,
>>> +                              int64_t *cluster_sector_num,
>>> +                              int *cluster_nb_sectors)
>>> +{
>>> +    BlockDriverInfo bdi;
>>> +
>>> +    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
>>> +        *cluster_sector_num = sector_num;
>>> +        *cluster_nb_sectors = nb_sectors;
>>> +    } else {
>>> +        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
>>> +        *cluster_sector_num = (sector_num / c) * c;
>>             I can understand the above formula, but the one below is
>> very magic. :) and can not be understood by me.
>>> +        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
>
> I agree this is ugly.  Here is what is going on:
>
> c = number of sectors per cluster
> cluster_sector_num = sector number rounded *down* to cluster boundary
> cluster_nb_sectors = number of sectors from cluster_sector_num to
> rounded up sector_num+nb_sectors
>
> So the magic expression is takes the original sector_num to
> sector_num+nb_sectors region:
>
> |---XXX|XXX---|
>
> Where |-----| is a cluster and XXXX is the region from sector_num to
> sector_num+nb_sectors, then the output should be:
>
> |RRRRRR|RRRRRR|
>
> Everything has been rounded to clusters.  So here is the expression broken down:
>
> *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
>                        AAAAAAAAAAAAAA    XXXXXXXXXX   BBBBBBBBBBBBBB
>
> |AAAXXX|XXXBBB|
>
> A is actually equivalent to sector_num - cluster_sector_num.
>
> X is the original unrounded region.
>
> B is the rounding up to the next cluster bounary.
>
> Another way of writing this:
>
> *cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
> nb_sectors, c);
Above expression seems to not be correct;
It should be
*cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
nb_sectors, c) * c;

*cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;

#define ROUND_UP(x,y) (((x)+(y)-1)/(y))

I seems to understand your magic expression. thanks.

>
> I'll try to improve the code in the next revision.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-11-08  6:34       ` Zhi Yong Wu
@ 2011-11-08  8:16         ` Stefan Hajnoczi
  2011-11-08  9:49           ` Zhi Yong Wu
  0 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-08  8:16 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Kevin Wolf, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Paolo Bonzini

On Tue, Nov 08, 2011 at 02:34:22PM +0800, Zhi Yong Wu wrote:
> On Mon, Nov 7, 2011 at 10:37 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> > On Mon, Nov 7, 2011 at 11:49 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
> >> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
> >> <stefanha@linux.vnet.ibm.com> wrote:
> >>> Detect overlapping requests and remember to align to cluster boundaries
> >>> if the image format uses them.  This assumes that allocating I/O is
> >>> performed in cluster granularity - which is true for qcow2, qed, etc.
> >>>
> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> >>> ---
> >>>  block.c |   39 +++++++++++++++++++++++++++++++++++++--
> >>>  1 files changed, 37 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/block.c b/block.c
> >>> index cc3202c..0c22741 100644
> >>> --- a/block.c
> >>> +++ b/block.c
> >>> @@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
> >>>     return req;
> >>>  }
> >>>
> >>> +/**
> >>> + * Round a region to cluster boundaries
> >>> + */
> >>> +static void round_to_clusters(BlockDriverState *bs,
> >>> +                              int64_t sector_num, int nb_sectors,
> >>> +                              int64_t *cluster_sector_num,
> >>> +                              int *cluster_nb_sectors)
> >>> +{
> >>> +    BlockDriverInfo bdi;
> >>> +
> >>> +    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
> >>> +        *cluster_sector_num = sector_num;
> >>> +        *cluster_nb_sectors = nb_sectors;
> >>> +    } else {
> >>> +        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
> >>> +        *cluster_sector_num = (sector_num / c) * c;
> >>             I can understand the above formula, but the one below is
> >> very magic. :) and can not be understood by me.
> >>> +        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
> >
> > I agree this is ugly.  Here is what is going on:
> >
> > c = number of sectors per cluster
> > cluster_sector_num = sector number rounded *down* to cluster boundary
> > cluster_nb_sectors = number of sectors from cluster_sector_num to
> > rounded up sector_num+nb_sectors
> >
> > So the magic expression is takes the original sector_num to
> > sector_num+nb_sectors region:
> >
> > |---XXX|XXX---|
> >
> > Where |-----| is a cluster and XXXX is the region from sector_num to
> > sector_num+nb_sectors, then the output should be:
> >
> > |RRRRRR|RRRRRR|
> >
> > Everything has been rounded to clusters.  So here is the expression broken down:
> >
> > *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
> >                        AAAAAAAAAAAAAA    XXXXXXXXXX   BBBBBBBBBBBBBB
> >
> > |AAAXXX|XXXBBB|
> >
> > A is actually equivalent to sector_num - cluster_sector_num.
> >
> > X is the original unrounded region.
> >
> > B is the rounding up to the next cluster bounary.
> >
> > Another way of writing this:
> >
> > *cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
> > nb_sectors, c);
> Above expression seems to not be correct;
> It should be
> *cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
> nb_sectors, c) * c;
> 
> *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
> 
> #define ROUND_UP(x,y) (((x)+(y)-1)/(y))

ALIGN_UP() may be a better macro name, for example:

ALIGN_UP(1024, 4096) = 4096

I see how you're defining ROUND_UP() and it is different.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 4/6] block: request overlap detection
  2011-11-08  8:16         ` Stefan Hajnoczi
@ 2011-11-08  9:49           ` Zhi Yong Wu
  0 siblings, 0 replies; 30+ messages in thread
From: Zhi Yong Wu @ 2011-11-08  9:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Kevin Wolf, Stefan Hajnoczi, Marcelo Tosatti, qemu-devel,
	Paolo Bonzini

On Tue, Nov 8, 2011 at 4:16 PM, Stefan Hajnoczi
<stefanha@linux.vnet.ibm.com> wrote:
> On Tue, Nov 08, 2011 at 02:34:22PM +0800, Zhi Yong Wu wrote:
>> On Mon, Nov 7, 2011 at 10:37 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
>> > On Mon, Nov 7, 2011 at 11:49 AM, Zhi Yong Wu <zwu.kernel@gmail.com> wrote:
>> >> On Mon, Oct 17, 2011 at 11:47 PM, Stefan Hajnoczi
>> >> <stefanha@linux.vnet.ibm.com> wrote:
>> >>> Detect overlapping requests and remember to align to cluster boundaries
>> >>> if the image format uses them.  This assumes that allocating I/O is
>> >>> performed in cluster granularity - which is true for qcow2, qed, etc.
>> >>>
>> >>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> >>> ---
>> >>>  block.c |   39 +++++++++++++++++++++++++++++++++++++--
>> >>>  1 files changed, 37 insertions(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/block.c b/block.c
>> >>> index cc3202c..0c22741 100644
>> >>> --- a/block.c
>> >>> +++ b/block.c
>> >>> @@ -1052,21 +1052,56 @@ static BdrvTrackedRequest *tracked_request_add(BlockDriverState *bs,
>> >>>     return req;
>> >>>  }
>> >>>
>> >>> +/**
>> >>> + * Round a region to cluster boundaries
>> >>> + */
>> >>> +static void round_to_clusters(BlockDriverState *bs,
>> >>> +                              int64_t sector_num, int nb_sectors,
>> >>> +                              int64_t *cluster_sector_num,
>> >>> +                              int *cluster_nb_sectors)
>> >>> +{
>> >>> +    BlockDriverInfo bdi;
>> >>> +
>> >>> +    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
>> >>> +        *cluster_sector_num = sector_num;
>> >>> +        *cluster_nb_sectors = nb_sectors;
>> >>> +    } else {
>> >>> +        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
>> >>> +        *cluster_sector_num = (sector_num / c) * c;
>> >>             I can understand the above formula, but the one below is
>> >> very magic. :) and can not be understood by me.
>> >>> +        *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
>> >
>> > I agree this is ugly.  Here is what is going on:
>> >
>> > c = number of sectors per cluster
>> > cluster_sector_num = sector number rounded *down* to cluster boundary
>> > cluster_nb_sectors = number of sectors from cluster_sector_num to
>> > rounded up sector_num+nb_sectors
>> >
>> > So the magic expression is takes the original sector_num to
>> > sector_num+nb_sectors region:
>> >
>> > |---XXX|XXX---|
>> >
>> > Where |-----| is a cluster and XXXX is the region from sector_num to
>> > sector_num+nb_sectors, then the output should be:
>> >
>> > |RRRRRR|RRRRRR|
>> >
>> > Everything has been rounded to clusters.  So here is the expression broken down:
>> >
>> > *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
>> >                        AAAAAAAAAAAAAA    XXXXXXXXXX   BBBBBBBBBBBBBB
>> >
>> > |AAAXXX|XXXBBB|
>> >
>> > A is actually equivalent to sector_num - cluster_sector_num.
>> >
>> > X is the original unrounded region.
>> >
>> > B is the rounding up to the next cluster bounary.
>> >
>> > Another way of writing this:
>> >
>> > *cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
>> > nb_sectors, c);
>> Above expression seems to not be correct;
>> It should be
>> *cluster_nb_sectors = ROUND_UP((sector_num - cluster_sector_num) +
>> nb_sectors, c) * c;
>>
>> *cluster_nb_sectors = ((sector_num % c) + nb_sectors + c - 1) / c * c;
>>
>> #define ROUND_UP(x,y) (((x)+(y)-1)/(y))
>
> ALIGN_UP() may be a better macro name, for example:
>
> ALIGN_UP(1024, 4096) = 4096
OK. Hope to see this in your next revision.
>
> I see how you're defining ROUND_UP() and it is different.
>
> Stefan
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 5/6] block: core copy-on-read logic
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
                   ` (3 preceding siblings ...)
  2011-10-17 15:47 ` [Qemu-devel] [RFC 4/6] block: request overlap detection Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-10-18 14:00   ` Marcelo Tosatti
                     ` (2 more replies)
  2011-10-17 15:47 ` [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off Stefan Hajnoczi
  2011-11-01 14:28 ` [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
  6 siblings, 3 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 block.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 trace-events |    1 +
 2 files changed, 70 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 0c22741..2aec6b4 100644
--- a/block.c
+++ b/block.c
@@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
     return 0;
 }
 
+static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
+{
+    void *bounce_buffer;
+    struct iovec iov;
+    QEMUIOVector bounce_qiov;
+    int64_t cluster_sector_num;
+    int cluster_nb_sectors;
+    size_t skip_bytes;
+    int ret;
+
+    /* Cover entire cluster so no additional backing file I/O is required when
+     * allocating cluster in the image file.
+     */
+    round_to_clusters(bs, sector_num, nb_sectors,
+                      &cluster_sector_num, &cluster_nb_sectors);
+
+    trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
+                                cluster_sector_num, cluster_nb_sectors);
+
+    iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
+    iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
+    qemu_iovec_init_external(&bounce_qiov, &iov, 1);
+
+    ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
+                                 &bounce_qiov);
+    if (ret < 0) {
+        goto err;
+    }
+
+    ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
+                                  &bounce_qiov);
+    if (ret < 0) {
+        /* It might be okay to ignore write errors for guest requests.  If this
+         * is a deliberate copy-on-read then we don't want to ignore the error.
+         * Simply report it in all cases.
+         */
+        goto err;
+    }
+
+    skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
+    qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
+                           nb_sectors * BDRV_SECTOR_SIZE);
+
+err:
+    qemu_vfree(bounce_buffer);
+    return ret;
+}
+
 /*
  * Handle a read request in coroutine context
  */
@@ -1431,7 +1480,27 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     }
 
     req = tracked_request_add(bs, sector_num, nb_sectors, false);
+
+    if (bs->copy_on_read) {
+        int pnum;
+
+        /* TODO it is not safe to call bdrv_is_allocated() in coroutine context
+         * because it's a synchronous interface.  We probably want a
+         * bdrv_co_is_allocated(). */
+        ret = bdrv_is_allocated(bs, sector_num, nb_sectors, &pnum);
+        if (ret < 0) {
+            goto out;
+        }
+
+        if (!ret || pnum != nb_sectors) {
+            ret = bdrv_co_copy_on_readv(bs, sector_num, nb_sectors, qiov);
+            goto out;
+        }
+    }
+
     ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+
+out:
     tracked_request_remove(req);
     return ret;
 }
diff --git a/trace-events b/trace-events
index 63d8c8e..7e52ed6 100644
--- a/trace-events
+++ b/trace-events
@@ -68,6 +68,7 @@ bdrv_lock_medium(void *bs, bool locked) "bs %p locked %d"
 bdrv_co_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void *acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
+bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
 
 # hw/virtio-blk.c
 virtio_blk_req_complete(void *req, int status) "req %p status %d"
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 5/6] block: core copy-on-read logic
  2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
@ 2011-10-18 14:00   ` Marcelo Tosatti
  2011-10-20 17:40     ` Stefan Hajnoczi
  2011-10-18 14:03   ` Marcelo Tosatti
  2011-11-03 14:30   ` Kevin Wolf
  2 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2011-10-18 14:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

On Mon, Oct 17, 2011 at 04:47:31PM +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  trace-events |    1 +
>  2 files changed, 70 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0c22741..2aec6b4 100644
> --- a/block.c
> +++ b/block.c
> @@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>      return 0;
>  }
>  
> +static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> +{
> +    void *bounce_buffer;
> +    struct iovec iov;
> +    QEMUIOVector bounce_qiov;
> +    int64_t cluster_sector_num;
> +    int cluster_nb_sectors;
> +    size_t skip_bytes;
> +    int ret;
> +
> +    /* Cover entire cluster so no additional backing file I/O is required when
> +     * allocating cluster in the image file.
> +     */
> +    round_to_clusters(bs, sector_num, nb_sectors,
> +                      &cluster_sector_num, &cluster_nb_sectors);
> +
> +    trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
> +                                cluster_sector_num, cluster_nb_sectors);
> +
> +    iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
> +    iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
> +    qemu_iovec_init_external(&bounce_qiov, &iov, 1);
> +
> +    ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
> +                                 &bounce_qiov);
> +    if (ret < 0) {
> +        goto err;
> +    }
> +
> +    ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
> +                                  &bounce_qiov);
> +    if (ret < 0) {
> +        /* It might be okay to ignore write errors for guest requests.  If this
> +         * is a deliberate copy-on-read then we don't want to ignore the error.
> +         * Simply report it in all cases.
> +         */
> +        goto err;
> +    }
> +
> +    skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
> +    qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
> +                           nb_sectors * BDRV_SECTOR_SIZE);
> +
> +err:
> +    qemu_vfree(bounce_buffer);
> +    return ret;
> +}
> +
>  /*
>   * Handle a read request in coroutine context
>   */
> @@ -1431,7 +1480,27 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>      }
>  
>      req = tracked_request_add(bs, sector_num, nb_sectors, false);

The tracked request should include cluster round info?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 5/6] block: core copy-on-read logic
  2011-10-18 14:00   ` Marcelo Tosatti
@ 2011-10-20 17:40     ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-20 17:40 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Paolo Bonzini, Stefan Hajnoczi, qemu-devel

On Tue, Oct 18, 2011 at 7:00 AM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
> On Mon, Oct 17, 2011 at 04:47:31PM +0100, Stefan Hajnoczi wrote:
>> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
>> ---
>>  block.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  trace-events |    1 +
>>  2 files changed, 70 insertions(+), 0 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 0c22741..2aec6b4 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>>      return 0;
>>  }
>>
>> +static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
>> +{
>> +    void *bounce_buffer;
>> +    struct iovec iov;
>> +    QEMUIOVector bounce_qiov;
>> +    int64_t cluster_sector_num;
>> +    int cluster_nb_sectors;
>> +    size_t skip_bytes;
>> +    int ret;
>> +
>> +    /* Cover entire cluster so no additional backing file I/O is required when
>> +     * allocating cluster in the image file.
>> +     */
>> +    round_to_clusters(bs, sector_num, nb_sectors,
>> +                      &cluster_sector_num, &cluster_nb_sectors);
>> +
>> +    trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
>> +                                cluster_sector_num, cluster_nb_sectors);
>> +
>> +    iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
>> +    qemu_iovec_init_external(&bounce_qiov, &iov, 1);
>> +
>> +    ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
>> +                                 &bounce_qiov);
>> +    if (ret < 0) {
>> +        goto err;
>> +    }
>> +
>> +    ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
>> +                                  &bounce_qiov);
>> +    if (ret < 0) {
>> +        /* It might be okay to ignore write errors for guest requests.  If this
>> +         * is a deliberate copy-on-read then we don't want to ignore the error.
>> +         * Simply report it in all cases.
>> +         */
>> +        goto err;
>> +    }
>> +
>> +    skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
>> +    qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
>> +                           nb_sectors * BDRV_SECTOR_SIZE);
>> +
>> +err:
>> +    qemu_vfree(bounce_buffer);
>> +    return ret;
>> +}
>> +
>>  /*
>>   * Handle a read request in coroutine context
>>   */
>> @@ -1431,7 +1480,27 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>>      }
>>
>>      req = tracked_request_add(bs, sector_num, nb_sectors, false);
>
> The tracked request should include cluster round info?

When checking A and B for overlap, only one of them needs to be
rounded in order for overlap detection to be correct.  Therefore we
can store the original request [start, length) in tracked_requests and
only round the new request.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 5/6] block: core copy-on-read logic
  2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
  2011-10-18 14:00   ` Marcelo Tosatti
@ 2011-10-18 14:03   ` Marcelo Tosatti
  2011-11-03 14:30   ` Kevin Wolf
  2 siblings, 0 replies; 30+ messages in thread
From: Marcelo Tosatti @ 2011-10-18 14:03 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel

On Mon, Oct 17, 2011 at 04:47:31PM +0100, Stefan Hajnoczi wrote:
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  trace-events |    1 +
>  2 files changed, 70 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0c22741..2aec6b4 100644
> --- a/block.c
> +++ b/block.c
> @@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>      return 0;
>  }
>  
> +static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> +{
> +    void *bounce_buffer;
> +    struct iovec iov;
> +    QEMUIOVector bounce_qiov;
> +    int64_t cluster_sector_num;
> +    int cluster_nb_sectors;
> +    size_t skip_bytes;
> +    int ret;
> +
> +    /* Cover entire cluster so no additional backing file I/O is required when
> +     * allocating cluster in the image file.
> +     */
> +    round_to_clusters(bs, sector_num, nb_sectors,
> +                      &cluster_sector_num, &cluster_nb_sectors);
> +
> +    trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
> +                                cluster_sector_num, cluster_nb_sectors);
> +
> +    iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
> +    iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
> +    qemu_iovec_init_external(&bounce_qiov, &iov, 1);
> +
> +    ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
> +                                 &bounce_qiov);
> +    if (ret < 0) {
> +        goto err;
> +    }
> +
> +    ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
> +                                  &bounce_qiov);
> +    if (ret < 0) {
> +        /* It might be okay to ignore write errors for guest requests.  If this
> +         * is a deliberate copy-on-read then we don't want to ignore the error.
> +         * Simply report it in all cases.
> +         */
> +        goto err;
> +    }
> +
> +    skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
> +    qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
> +                           nb_sectors * BDRV_SECTOR_SIZE);
> +
> +err:
> +    qemu_vfree(bounce_buffer);
> +    return ret;
> +}
> +
>  /*
>   * Handle a read request in coroutine context
>   */
> @@ -1431,7 +1480,27 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>      }
>  
>      req = tracked_request_add(bs, sector_num, nb_sectors, false);
> +
> +    if (bs->copy_on_read) {
> +        int pnum;
> +
> +        /* TODO it is not safe to call bdrv_is_allocated() in coroutine context
> +         * because it's a synchronous interface.  We probably want a
> +         * bdrv_co_is_allocated(). */
> +        ret = bdrv_is_allocated(bs, sector_num, nb_sectors, &pnum);
> +        if (ret < 0) {
> +            goto out;
> +        }

This lacks shared base image support, BTW (as in copy should be
performed only if cluster not in destination chain). Could be added
later.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 5/6] block: core copy-on-read logic
  2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
  2011-10-18 14:00   ` Marcelo Tosatti
  2011-10-18 14:03   ` Marcelo Tosatti
@ 2011-11-03 14:30   ` Kevin Wolf
  2 siblings, 0 replies; 30+ messages in thread
From: Kevin Wolf @ 2011-11-03 14:30 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  block.c      |   69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  trace-events |    1 +
>  2 files changed, 70 insertions(+), 0 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 0c22741..2aec6b4 100644
> --- a/block.c
> +++ b/block.c
> @@ -1409,6 +1409,55 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>      return 0;
>  }
>  
> +static int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, QEMUIOVector *qiov)
> +{
> +    void *bounce_buffer;

I think the bounce buffer deserves a comment. It may not be obvious for
everyone why it's needed.

> +    struct iovec iov;
> +    QEMUIOVector bounce_qiov;
> +    int64_t cluster_sector_num;
> +    int cluster_nb_sectors;
> +    size_t skip_bytes;
> +    int ret;
> +
> +    /* Cover entire cluster so no additional backing file I/O is required when
> +     * allocating cluster in the image file.
> +     */
> +    round_to_clusters(bs, sector_num, nb_sectors,
> +                      &cluster_sector_num, &cluster_nb_sectors);
> +
> +    trace_bdrv_co_copy_on_readv(bs, sector_num, nb_sectors,
> +                                cluster_sector_num, cluster_nb_sectors);
> +
> +    iov.iov_len = cluster_nb_sectors * BDRV_SECTOR_SIZE;
> +    iov.iov_base = bounce_buffer = qemu_blockalign(bs, iov.iov_len);
> +    qemu_iovec_init_external(&bounce_qiov, &iov, 1);
> +
> +    ret = bs->drv->bdrv_co_readv(bs, cluster_sector_num, cluster_nb_sectors,
> +                                 &bounce_qiov);
> +    if (ret < 0) {
> +        goto err;
> +    }
> +
> +    ret = bs->drv->bdrv_co_writev(bs, cluster_sector_num, cluster_nb_sectors,
> +                                  &bounce_qiov);
> +    if (ret < 0) {
> +        /* It might be okay to ignore write errors for guest requests.  If this
> +         * is a deliberate copy-on-read then we don't want to ignore the error.
> +         * Simply report it in all cases.
> +         */
> +        goto err;
> +    }
> +
> +    skip_bytes = (sector_num - cluster_sector_num) * BDRV_SECTOR_SIZE;
> +    qemu_iovec_from_buffer(qiov, bounce_buffer + skip_bytes,
> +                           nb_sectors * BDRV_SECTOR_SIZE);
> +
> +err:
> +    qemu_vfree(bounce_buffer);
> +    return ret;
> +}
> +
>  /*
>   * Handle a read request in coroutine context
>   */
> @@ -1431,7 +1480,27 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
>      }
>  
>      req = tracked_request_add(bs, sector_num, nb_sectors, false);
> +
> +    if (bs->copy_on_read) {
> +        int pnum;
> +
> +        /* TODO it is not safe to call bdrv_is_allocated() in coroutine context
> +         * because it's a synchronous interface.  We probably want a
> +         * bdrv_co_is_allocated(). */
> +        ret = bdrv_is_allocated(bs, sector_num, nb_sectors, &pnum);

I think I already said it in a reply to another patch, but for the
record: We need bdrv_co_is_allocated() before this can be merged.

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
                   ` (4 preceding siblings ...)
  2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
@ 2011-10-17 15:47 ` Stefan Hajnoczi
  2011-11-03 14:32   ` Kevin Wolf
  2011-11-01 14:28 ` [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
  6 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-10-17 15:47 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, Marcelo Tosatti, Stefan Hajnoczi

This patch adds the -drive copy-on-read=on|off command-line option:

  copy-on-read=on|off
  copy-on-read is "on" or "off" and enables whether to copy read backing
  file sectors into the image file.  Copy-on-read avoids accessing the
  same backing file sectors repeatedly and is useful when the backing
  file is over a slow network.  By default copy-on-read is off.

Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
---
 blockdev.c      |    6 ++++++
 hmp-commands.hx |    5 +++--
 qemu-config.c   |    4 ++++
 qemu-options.hx |    9 ++++++++-
 4 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 0827bf7..1dd0f23 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
     const char *devaddr;
     DriveInfo *dinfo;
     int snapshot = 0;
+    int copy_on_read;
     int ret;
 
     translation = BIOS_ATA_TRANSLATION_AUTO;
@@ -252,6 +253,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
     snapshot = qemu_opt_get_bool(opts, "snapshot", 0);
     ro = qemu_opt_get_bool(opts, "readonly", 0);
+    copy_on_read = qemu_opt_get_bool(opts, "copy-on-read", 0);
 
     file = qemu_opt_get(opts, "file");
     serial = qemu_opt_get(opts, "serial");
@@ -502,6 +504,10 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
         bdrv_flags |= (BDRV_O_SNAPSHOT|BDRV_O_CACHE_WB|BDRV_O_NO_FLUSH);
     }
 
+    if (copy_on_read) {
+        bdrv_flags |= BDRV_O_COPY_ON_READ;
+    }
+
     if (media == MEDIA_CDROM) {
         /* CDROM is fine for any interface, don't check.  */
         ro = 1;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ab08d58..05a1498 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -860,9 +860,10 @@ ETEXI
         .args_type  = "pci_addr:s,opts:s",
         .params     = "[[<domain>:]<bus>:]<slot>\n"
                       "[file=file][,if=type][,bus=n]\n"
-                      "[,unit=m][,media=d][index=i]\n"
+                      "[,unit=m][,media=d][,index=i]\n"
                       "[,cyls=c,heads=h,secs=s[,trans=t]]\n"
-                      "[snapshot=on|off][,cache=on|off]",
+                      "[,snapshot=on|off][,cache=on|off]\n"
+                      "[,readonly=on|off][,copy-on-read=on|off]",
         .help       = "add drive to PCI storage controller",
         .mhandler.cmd = drive_hot_add,
     },
diff --git a/qemu-config.c b/qemu-config.c
index 7a7854f..fbe4f6a 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -85,6 +85,10 @@ static QemuOptsList qemu_drive_opts = {
             .name = "readonly",
             .type = QEMU_OPT_BOOL,
             .help = "open drive file as read-only",
+        },{
+            .name = "copy-on-read",
+            .type = QEMU_OPT_BOOL,
+            .help = "copy read data from backing file into image file",
         },
         { /* end of list */ }
     },
diff --git a/qemu-options.hx b/qemu-options.hx
index dfbabd0..c0bf856 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -135,7 +135,7 @@ DEF("drive", HAS_ARG, QEMU_OPTION_drive,
     "       [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n"
     "       [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n"
     "       [,serial=s][,addr=A][,id=name][,aio=threads|native]\n"
-    "       [,readonly=on|off]\n"
+    "       [,readonly=on|off][,copy-on-read=on|off]\n"
     "                use 'file' as a drive image\n", QEMU_ARCH_ALL)
 STEXI
 @item -drive @var{option}[,@var{option}[,@var{option}[,...]]]
@@ -183,6 +183,9 @@ host disk is full; report the error to the guest otherwise).
 The default setting is @option{werror=enospc} and @option{rerror=report}.
 @item readonly
 Open drive @option{file} as read-only. Guest write attempts will fail.
+@item copy-on-read=@var{copy-on-read}
+@var{copy-on-read} is "on" or "off" and enables whether to copy read backing
+file sectors into the image file.
 @end table
 
 By default, writethrough caching is used for all block device.  This means that
@@ -214,6 +217,10 @@ like your host losing power, the disk storage getting disconnected accidently,
 etc. you're image will most probably be rendered unusable.   When using
 the @option{-snapshot} option, unsafe caching is always used.
 
+Copy-on-read avoids accessing the same backing file sectors repeatedly and is
+useful when the backing file is over a slow network.  By default copy-on-read
+is off.
+
 Instead of @option{-cdrom} you can use:
 @example
 qemu -drive file=file,index=2,media=cdrom
-- 
1.7.6.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off
  2011-10-17 15:47 ` [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off Stefan Hajnoczi
@ 2011-11-03 14:32   ` Kevin Wolf
  2011-11-03 16:21     ` Stefan Hajnoczi
  0 siblings, 1 reply; 30+ messages in thread
From: Kevin Wolf @ 2011-11-03 14:32 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> This patch adds the -drive copy-on-read=on|off command-line option:
> 
>   copy-on-read=on|off
>   copy-on-read is "on" or "off" and enables whether to copy read backing
>   file sectors into the image file.  Copy-on-read avoids accessing the
>   same backing file sectors repeatedly and is useful when the backing
>   file is over a slow network.  By default copy-on-read is off.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  blockdev.c      |    6 ++++++
>  hmp-commands.hx |    5 +++--
>  qemu-config.c   |    4 ++++
>  qemu-options.hx |    9 ++++++++-
>  4 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/blockdev.c b/blockdev.c
> index 0827bf7..1dd0f23 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
>      const char *devaddr;
>      DriveInfo *dinfo;
>      int snapshot = 0;
> +    int copy_on_read;

Another s/int/bool/ :-)

Kevin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off
  2011-11-03 14:32   ` Kevin Wolf
@ 2011-11-03 16:21     ` Stefan Hajnoczi
  0 siblings, 0 replies; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-03 16:21 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, Marcelo Tosatti, qemu-devel

On Thu, Nov 03, 2011 at 03:32:17PM +0100, Kevin Wolf wrote:
> Am 17.10.2011 17:47, schrieb Stefan Hajnoczi:
> > This patch adds the -drive copy-on-read=on|off command-line option:
> > 
> >   copy-on-read=on|off
> >   copy-on-read is "on" or "off" and enables whether to copy read backing
> >   file sectors into the image file.  Copy-on-read avoids accessing the
> >   same backing file sectors repeatedly and is useful when the backing
> >   file is over a slow network.  By default copy-on-read is off.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> > ---
> >  blockdev.c      |    6 ++++++
> >  hmp-commands.hx |    5 +++--
> >  qemu-config.c   |    4 ++++
> >  qemu-options.hx |    9 ++++++++-
> >  4 files changed, 21 insertions(+), 3 deletions(-)
> > 
> > diff --git a/blockdev.c b/blockdev.c
> > index 0827bf7..1dd0f23 100644
> > --- a/blockdev.c
> > +++ b/blockdev.c
> > @@ -236,6 +236,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
> >      const char *devaddr;
> >      DriveInfo *dinfo;
> >      int snapshot = 0;
> > +    int copy_on_read;
> 
> Another s/int/bool/ :-)

In general I use bool but in this case I hesitated since the other
variables are int I tried to stick with the style.  I'm happy to change
it if you prefer bool.

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] block: generic copy-on-read
  2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
                   ` (5 preceding siblings ...)
  2011-10-17 15:47 ` [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off Stefan Hajnoczi
@ 2011-11-01 14:28 ` Stefan Hajnoczi
  2011-11-01 15:07   ` Marcelo Tosatti
  6 siblings, 1 reply; 30+ messages in thread
From: Stefan Hajnoczi @ 2011-11-01 14:28 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Hi Marcelo,
Thanks for your comments on the copy-on-read RFC patches.  I am going
to send a rebased series out for review/merge.  Did you have any other
thoughts - I hope I've addressed your questions?

Stefan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [Qemu-devel] [RFC 0/6] block: generic copy-on-read
  2011-11-01 14:28 ` [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
@ 2011-11-01 15:07   ` Marcelo Tosatti
  0 siblings, 0 replies; 30+ messages in thread
From: Marcelo Tosatti @ 2011-11-01 15:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

On Tue, Nov 01, 2011 at 02:28:22PM +0000, Stefan Hajnoczi wrote:
> Hi Marcelo,
> Thanks for your comments on the copy-on-read RFC patches.  I am going
> to send a rebased series out for review/merge.  Did you have any other
> thoughts - I hope I've addressed your questions?
> 
> Stefan

I'm reviewing the "generic image streaming" series. Should be finished
in a couple of days.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2011-11-08  9:49 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-17 15:47 [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
2011-10-17 15:47 ` [Qemu-devel] [RFC 1/6] block: add request tracking Stefan Hajnoczi
2011-11-02 16:30   ` Kevin Wolf
2011-11-03  7:57     ` Stefan Hajnoczi
2011-11-07 11:00   ` Zhi Yong Wu
2011-11-07 11:41     ` Stefan Hajnoczi
2011-11-08  6:13       ` Zhi Yong Wu
2011-10-17 15:47 ` [Qemu-devel] [RFC 2/6] block: add bdrv_set_copy_on_read() Stefan Hajnoczi
2011-11-02 16:36   ` Kevin Wolf
2011-11-03  8:01     ` Stefan Hajnoczi
2011-10-17 15:47 ` [Qemu-devel] [RFC 3/6] block: wait for overlapping requests Stefan Hajnoczi
2011-10-18 13:48   ` Marcelo Tosatti
2011-10-20 17:34     ` Stefan Hajnoczi
2011-11-03 14:17   ` Kevin Wolf
2011-10-17 15:47 ` [Qemu-devel] [RFC 4/6] block: request overlap detection Stefan Hajnoczi
2011-11-07 11:49   ` Zhi Yong Wu
2011-11-07 14:37     ` Stefan Hajnoczi
2011-11-08  6:34       ` Zhi Yong Wu
2011-11-08  8:16         ` Stefan Hajnoczi
2011-11-08  9:49           ` Zhi Yong Wu
2011-10-17 15:47 ` [Qemu-devel] [RFC 5/6] block: core copy-on-read logic Stefan Hajnoczi
2011-10-18 14:00   ` Marcelo Tosatti
2011-10-20 17:40     ` Stefan Hajnoczi
2011-10-18 14:03   ` Marcelo Tosatti
2011-11-03 14:30   ` Kevin Wolf
2011-10-17 15:47 ` [Qemu-devel] [RFC 6/6] block: add -drive copy-on-read=on|off Stefan Hajnoczi
2011-11-03 14:32   ` Kevin Wolf
2011-11-03 16:21     ` Stefan Hajnoczi
2011-11-01 14:28 ` [Qemu-devel] [RFC 0/6] block: generic copy-on-read Stefan Hajnoczi
2011-11-01 15:07   ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).