qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
@ 2009-06-26 17:51 Kevin Wolf
  2009-06-30 18:09 ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2009-06-26 17:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf

When using O_DIRECT, qcow2 snapshots didn't work any more for me. In the
process of creating the snapshot, qcow2 tries to pwrite some new information
(e.g. new L1 table) which will often end up being after the old end of the
image file. Now pwrite tries to align things and reads the old contents of the
file, read returns 0 because there is nothing to read after the end of file and
pwrite is stuck in an endless loop.

This patch allows to pread beyond the end of an image file. Whenever the
given offset is after the end of the image file, the read succeeds and fills
the buffer with zeros.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/raw-posix.c |   11 +++++++++++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index fa1a394..985bf69 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -117,6 +117,7 @@ typedef struct BDRVRawState {
 static int posix_aio_init(void);
 
 static int fd_open(BlockDriverState *bs);
+static int64_t raw_getlength(BlockDriverState *bs);
 
 #if defined(__FreeBSD__)
 static int cdrom_reopen(BlockDriverState *bs);
@@ -231,6 +232,16 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
     if (ret == count)
         goto label__raw_read__success;
 
+    /* Allow reads beyond the end (needed for pwrite) */
+    if ((ret == 0) && bs->growable) {
+        int64_t size = raw_getlength(bs);
+        if (offset >= size) {
+            memset(buf, 0, count);
+            ret = count;
+            goto label__raw_read__success;
+        }
+    }
+
     DEBUG_BLOCK_PRINT("raw_pread(%d:%s, %" PRId64 ", %p, %d) [%" PRId64
                       "] read failed %d : %d = %s\n",
                       s->fd, bs->filename, offset, buf, count,
-- 
1.6.0.6

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
  2009-06-26 17:51 [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images Kevin Wolf
@ 2009-06-30 18:09 ` Christoph Hellwig
  2009-07-01  7:37   ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2009-06-30 18:09 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Fri, Jun 26, 2009 at 07:51:24PM +0200, Kevin Wolf wrote:
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index fa1a394..985bf69 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -117,6 +117,7 @@ typedef struct BDRVRawState {
>  static int posix_aio_init(void);
>  
>  static int fd_open(BlockDriverState *bs);
> +static int64_t raw_getlength(BlockDriverState *bs);
>  
>  #if defined(__FreeBSD__)
>  static int cdrom_reopen(BlockDriverState *bs);
> @@ -231,6 +232,16 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
>      if (ret == count)
>          goto label__raw_read__success;
>  
> +    /* Allow reads beyond the end (needed for pwrite) */
> +    if ((ret == 0) && bs->growable) {
> +        int64_t size = raw_getlength(bs);
> +        if (offset >= size) {
> +            memset(buf, 0, count);
> +            ret = count;
> +            goto label__raw_read__success;
> +        }
> +    }

I really don't like doing this inside the lowelevel read handler.  If
this is indeed only needed for pwrite we might be better off doing
the right thing in the place that needs it, bdrv_pwrite.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
  2009-06-30 18:09 ` Christoph Hellwig
@ 2009-07-01  7:37   ` Kevin Wolf
  2009-07-01  8:56     ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2009-07-01  7:37 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Christoph Hellwig schrieb:
> On Fri, Jun 26, 2009 at 07:51:24PM +0200, Kevin Wolf wrote:
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index fa1a394..985bf69 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -117,6 +117,7 @@ typedef struct BDRVRawState {
>>  static int posix_aio_init(void);
>>  
>>  static int fd_open(BlockDriverState *bs);
>> +static int64_t raw_getlength(BlockDriverState *bs);
>>  
>>  #if defined(__FreeBSD__)
>>  static int cdrom_reopen(BlockDriverState *bs);
>> @@ -231,6 +232,16 @@ static int raw_pread_aligned(BlockDriverState *bs, int64_t offset,
>>      if (ret == count)
>>          goto label__raw_read__success;
>>  
>> +    /* Allow reads beyond the end (needed for pwrite) */
>> +    if ((ret == 0) && bs->growable) {
>> +        int64_t size = raw_getlength(bs);
>> +        if (offset >= size) {
>> +            memset(buf, 0, count);
>> +            ret = count;
>> +            goto label__raw_read__success;
>> +        }
>> +    }
> 
> I really don't like doing this inside the lowelevel read handler.  If
> this is indeed only needed for pwrite we might be better off doing
> the right thing in the place that needs it, bdrv_pwrite.

If you feel like posting a better patch, go ahead. I'm feeling that it's
going to be a whole lot uglier in bdrv_pwrite, but that might prove
wrong. And after all it is a problem with raw and not with the generic
block code - qcow2 for example was already reading zeros.

For the time being, this patch fixes a bug and should stay.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
  2009-07-01  7:37   ` Kevin Wolf
@ 2009-07-01  8:56     ` Christoph Hellwig
  2009-07-01  9:08       ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2009-07-01  8:56 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel

On Wed, Jul 01, 2009 at 09:37:28AM +0200, Kevin Wolf wrote:
> If you feel like posting a better patch, go ahead. I'm feeling that it's
> going to be a whole lot uglier in bdrv_pwrite, but that might prove
> wrong. And after all it is a problem with raw and not with the generic
> block code - qcow2 for example was already reading zeros.
> 
> For the time being, this patch fixes a bug and should stay.

So the only use case is a doing a savevm on a qcow2 device with a
raw-posix backing dev?  What happens with nbd backing dev or a raw
backing dev on win32?  OR (not sure we'll actually support that) another
image format backing dev?

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
  2009-07-01  8:56     ` Christoph Hellwig
@ 2009-07-01  9:08       ` Kevin Wolf
  2009-07-01 11:37         ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2009-07-01  9:08 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: qemu-devel

Christoph Hellwig schrieb:
> On Wed, Jul 01, 2009 at 09:37:28AM +0200, Kevin Wolf wrote:
>> If you feel like posting a better patch, go ahead. I'm feeling that it's
>> going to be a whole lot uglier in bdrv_pwrite, but that might prove
>> wrong. And after all it is a problem with raw and not with the generic
>> block code - qcow2 for example was already reading zeros.
>>
>> For the time being, this patch fixes a bug and should stay.
> 
> So the only use case is a doing a savevm on a qcow2 device with a
> raw-posix backing dev?  What happens with nbd backing dev or a raw
> backing dev on win32?  OR (not sure we'll actually support that) another
> image format backing dev?

raw-win32 is actually a good question, I haven't tested that one. I
think nbd is used as a protocol rather than the format. qcow1/2 should
work, don't know for other formats like VMDK.

What is biting us here is that nobody has ever specified what the block
driver functions are supposed to do. They exist because they are in the
struct, their parameters have names that give a rough idea about their
meaning and that's it. Who cares about special cases?

That said, while implementing the the fix in bdrv_pwrite is going to be
ugly, we could do it in bdrv_read. Maybe this is the best approach.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images
  2009-07-01  9:08       ` Kevin Wolf
@ 2009-07-01 11:37         ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2009-07-01 11:37 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Christoph Hellwig, qemu-devel

On Wed, Jul 01, 2009 at 11:08:07AM +0200, Kevin Wolf wrote:
> raw-win32 is actually a good question, I haven't tested that one. I
> think nbd is used as a protocol rather than the format. qcow1/2 should
> work, don't know for other formats like VMDK.
> 
> What is biting us here is that nobody has ever specified what the block
> driver functions are supposed to do. They exist because they are in the
> struct, their parameters have names that give a rough idea about their
> meaning and that's it. Who cares about special cases?
> 
> That said, while implementing the the fix in bdrv_pwrite is going to be
> ugly, we could do it in bdrv_read. Maybe this is the best approach.

I have looked a bit at this area and it's even uglier than I expected.

First problem is our use of the growable flag - it's overloaded for two
different purposes:  First we use it to allow growing files if they are
protocols and opened through bdrv_file_open.  Which already seems broken
for host devices, but I'll need to test it.

Second inside qcow2 it allows reading/writing past any image internally
if bdrv_pwrite/bdrv_pread are called via
qcow_put_buffer/qcow_get_buffer.  I think we'd be much more future-proof
if we make these flags to aio_read/aio_write to allow writing past the
image size.

Also none of this currently is easily testable as we don't expose
the growable flag through qemu-io.

And last not least I really hate how we tie up writing the VM metadata
into the block/image code.   I think we should also allow writing it
into an external file with no ties to the image to allow offline
migration or savevm even on a plain file or more important LVM volume.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-07-01 11:37 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-06-26 17:51 [Qemu-devel] [PATCH] block-raw: Allow pread beyond the end of growable images Kevin Wolf
2009-06-30 18:09 ` Christoph Hellwig
2009-07-01  7:37   ` Kevin Wolf
2009-07-01  8:56     ` Christoph Hellwig
2009-07-01  9:08       ` Kevin Wolf
2009-07-01 11:37         ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).