qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Pierre Riteau <Pierre.Riteau@irisa.fr>
To: Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp>
Cc: Kevin Wolf <kwolf@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB
Date: Fri, 21 Jan 2011 15:23:03 +0100	[thread overview]
Message-ID: <73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr> (raw)
In-Reply-To: <AANLkTiknxuDBfQiN4Ct-9qS5s2onGS8YY2wtdaOv7ohr@mail.gmail.com>

On 21 janv. 2011, at 15:21, Yoshiaki Tamura wrote:

> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>> On 21 janv. 2011, at 14:59, Yoshiaki Tamura wrote:
>> 
>>> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote:
>>>> 
>>>>> 2011/1/21 Kevin Wolf <kwolf@redhat.com>:
>>>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura:
>>>>>>> 2011/1/21 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>> Le 20 janv. 2011 à 17:18, Yoshiaki Tamura <tamura.yoshiaki@lab.ntt.co.jp> a écrit :
>>>>>>>> 
>>>>>>>>> 2011/1/20 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote:
>>>>>>>>>> 
>>>>>>>>>>> 2011/1/19 Pierre Riteau <Pierre.Riteau@irisa.fr>:
>>>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the return
>>>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However, if the
>>>>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK_SIZE
>>>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO.
>>>>>>>>>>>> 
>>>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last block.
>>>>>>>>>>>> ---
>>>>>>>>>>>>  block-migration.c |   16 +++++++++++++++-
>>>>>>>>>>>>  1 files changed, 15 insertions(+), 1 deletions(-)
>>>>>>>>>>>> 
>>>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c
>>>>>>>>>>>> index 1475325..eeb9c62 100644
>>>>>>>>>>>> --- a/block-migration.c
>>>>>>>>>>>> +++ b/block-migration.c
>>>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>>>>>>     int64_t addr;
>>>>>>>>>>>>     BlockDriverState *bs;
>>>>>>>>>>>>     uint8_t *buf;
>>>>>>>>>>>> +    int64_t total_sectors;
>>>>>>>>>>>> +    int nr_sectors;
>>>>>>>>>>>> 
>>>>>>>>>>>>     do {
>>>>>>>>>>>>         addr = qemu_get_be64(f);
>>>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
>>>>>>>>>>>>                 return -EINVAL;
>>>>>>>>>>>>             }
>>>>>>>>>>>> 
>>>>>>>>>>>> +            total_sectors = bdrv_getlength(bs) >> BDRV_SECTOR_BITS;
>>>>>>>>>>>> +            if (total_sectors <= 0) {
>>>>>>>>>>>> +                fprintf(stderr, "Error getting length of block device %s\n", device_name);
>>>>>>>>>>>> +                return -EINVAL;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +
>>>>>>>>>>>> +            if (total_sectors - addr < BDRV_SECTORS_PER_DIRTY_CHUNK) {
>>>>>>>>>>>> +                nr_sectors = total_sectors - addr;
>>>>>>>>>>>> +            } else {
>>>>>>>>>>>> +                nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +
>>>>>>>>>>>>             buf = qemu_malloc(BLOCK_SIZE);
>>>>>>>>>>>> 
>>>>>>>>>>>>             qemu_get_buffer(f, buf, BLOCK_SIZE);
>>>>>>>>>>>> -            ret = bdrv_write(bs, addr, buf, BDRV_SECTORS_PER_DIRTY_CHUNK);
>>>>>>>>>>>> +            ret = bdrv_write(bs, addr, buf, nr_sectors);
>>>>>>>>>>>> 
>>>>>>>>>>>>             qemu_free(buf);
>>>>>>>>>>>>             if (ret < 0) {
>>>>>>>>>>>> --
>>>>>>>>>>>> 1.7.3.5
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>> 
>>>>>>>>>>> I don't think the fix above is correct.  If you have a file which
>>>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the
>>>>>>>>>>> patch.  However, the receiver doesn't know how much sectors which
>>>>>>>>>>> the sender wants to be written, so the guest may fail after
>>>>>>>>>>> migration because some data may not be written.  IIUC, although
>>>>>>>>>>> changing bytestream should be prevented as much as possible, we
>>>>>>>>>>> should save/load total_sectors to check appropriate file is
>>>>>>>>>>> allocated on the receiver side.
>>>>>>>>>> 
>>>>>>>>>> Isn't the guest supposed to be started using a file with the correct size?
>>>>>>>>> 
>>>>>>>>> I personally don't like that; It's insisting too much to the user.
>>>>>>>>> Can't we expand the image on the fly?  We can just abort if expanding
>>>>>>>>> failed anyway.
>>>>>>>> 
>>>>>>>> At first I thought your expansion idea was best, but now I think there are valid scenarios where it fails.
>>>>>>>> 
>>>>>>>> Imagine both sides are not using a file but a disk partition as storage. If the partition size is not rounded to 1 MB, the last write will fail with the current code, and there is no way we can expand the partition.
>>>>>>>> 
>>>>>>> 
>>>>>>> Right.  But in case of partition doesn't the check in the patch below
>>>>>>> return error?  Does bdrv_getlength return the size correctly?
>>>>>> 
>>>>>> I'm pretty sure that it does. We would have problems in other places if
>>>>>> it didn't (e.g. we're checking if I/O requests are within the disk size).
>>>>> 
>>>>> Sorry for the noise.  I just learned it's returning the value of lseek
>>>>> in case of raw-posix.
>>>> 
>>>> 
>>>> And it does a ioctl call on other platforms than Linux.
>>> 
>>> Thanks.  Just a quick question regarding total_sectors.
>>> BlockDriverState seems to contain total_sectors.  Can we avoid
>>> calling bdrv_getlength() if bs->total_sectors were already there?
>> 
>> From a comment in bdrv_getlength():
>> 
>> Fixed size devices use the total_sectors value for speed instead of
>> issuing a length query (like lseek) on each call.  Also, legacy block
>> drivers don't provide a bdrv_getlength function and must use
>> total_sectors.
>> 
>> So using bdrv_getlength will protect against devices being resized during migration, but as far as I can see, the sender side doesn't support it: the value of total_sectors is cached for the whole block migration.
> 
> Even if the sender supports it, as far as total_sectors isn't
> sent to the receiver, can we follow the resize on the receiver?


I was referring to the complex, and probably unrealistic scenario, where a user allocates a file of the correct size on the receiving side, starts block migration, and during migration grows the size of the disk on both the sender and receiver side.

-- 
Pierre Riteau -- PhD student, Myriads team, IRISA, Rennes, France
http://perso.univ-rennes1.fr/pierre.riteau/

  reply	other threads:[~2011-01-21 14:23 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-19 14:59 [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB Pierre Riteau
2011-01-20  2:06 ` Yoshiaki Tamura
2011-01-20  6:49   ` Pierre Riteau
2011-01-20 16:18     ` Yoshiaki Tamura
2011-01-21  8:08       ` Pierre Riteau
2011-01-21  9:11         ` Kevin Wolf
2011-01-21 12:26           ` Yoshiaki Tamura
2011-01-21 12:15         ` Yoshiaki Tamura
2011-01-21 12:31           ` Kevin Wolf
2011-01-21 12:36             ` Yoshiaki Tamura
2011-01-21 12:40               ` Pierre Riteau
2011-01-21 13:59                 ` Yoshiaki Tamura
2011-01-21 14:09                   ` Kevin Wolf
2011-01-21 14:18                     ` Yoshiaki Tamura
2011-01-21 14:14                   ` Pierre Riteau
2011-01-21 14:21                     ` Yoshiaki Tamura
2011-01-21 14:23                       ` Pierre Riteau [this message]
2011-01-21 14:30                         ` Yoshiaki Tamura
2011-01-21 14:48                           ` Pierre Riteau
2011-01-21  9:16 ` Kevin Wolf
2011-01-21 11:38   ` Pierre Riteau
2011-01-21 11:45     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=73163FA4-194C-483B-A20B-3AF6843C6BC7@irisa.fr \
    --to=pierre.riteau@irisa.fr \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=tamura.yoshiaki@lab.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).