From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=60303 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1PgHoh-00010J-TW for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:18:21 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1PgHog-0004Qz-Is for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:18:19 -0500 Received: from mail-wy0-f173.google.com ([74.125.82.173]:59623) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1PgHog-0004Qo-Bo for qemu-devel@nongnu.org; Fri, 21 Jan 2011 09:18:18 -0500 Received: by wyg36 with SMTP id 36so1893059wyg.4 for ; Fri, 21 Jan 2011 06:18:17 -0800 (PST) MIME-Version: 1.0 Sender: tamura.yoshiaki@gmail.com In-Reply-To: <4D399393.7030506@redhat.com> References: <1295449188-17877-1-git-send-email-Pierre.Riteau@irisa.fr> <04350B7C-9933-4A70-8FA9-B5B409D1E10A@irisa.fr> <43211019-BF0D-405A-99B7-54C9B3BBE58E@irisa.fr> <4D397C8E.7080703@redhat.com> <292A277F-FDB6-4842-9133-8CAC22F08453@irisa.fr> <4D399393.7030506@redhat.com> Date: Fri, 21 Jan 2011 23:18:17 +0900 Message-ID: Subject: Re: [Qemu-devel] [PATCH] Fix block migration when the device size is not a multiple of 1 MB From: Yoshiaki Tamura Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: "qemu-devel@nongnu.org" , Pierre Riteau 2011/1/21 Kevin Wolf : > Am 21.01.2011 14:59, schrieb Yoshiaki Tamura: >> 2011/1/21 Pierre Riteau : >>> On 21 janv. 2011, at 13:36, Yoshiaki Tamura wrote: >>> >>>> 2011/1/21 Kevin Wolf : >>>>> Am 21.01.2011 13:15, schrieb Yoshiaki Tamura: >>>>>> 2011/1/21 Pierre Riteau : >>>>>>> Le 20 janv. 2011 =E0 17:18, Yoshiaki Tamura a =E9crit : >>>>>>> >>>>>>>> 2011/1/20 Pierre Riteau : >>>>>>>>> On 20 janv. 2011, at 03:06, Yoshiaki Tamura wrote: >>>>>>>>> >>>>>>>>>> 2011/1/19 Pierre Riteau : >>>>>>>>>>> b02bea3a85cc939f09aa674a3f1e4f36d418c007 added a check on the r= eturn >>>>>>>>>>> value of bdrv_write and aborts migration when it fails. However= , if the >>>>>>>>>>> size of the block device to migrate is not a multiple of BLOCK_= SIZE >>>>>>>>>>> (currently 1 MB), the last bdrv_write will fail with -EIO. >>>>>>>>>>> >>>>>>>>>>> Fixed by calling bdrv_write with the correct size of the last b= lock. >>>>>>>>>>> --- >>>>>>>>>>> =A0block-migration.c | =A0 16 +++++++++++++++- >>>>>>>>>>> =A01 files changed, 15 insertions(+), 1 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/block-migration.c b/block-migration.c >>>>>>>>>>> index 1475325..eeb9c62 100644 >>>>>>>>>>> --- a/block-migration.c >>>>>>>>>>> +++ b/block-migration.c >>>>>>>>>>> @@ -635,6 +635,8 @@ static int block_load(QEMUFile *f, void *op= aque, int version_id) >>>>>>>>>>> =A0 =A0 int64_t addr; >>>>>>>>>>> =A0 =A0 BlockDriverState *bs; >>>>>>>>>>> =A0 =A0 uint8_t *buf; >>>>>>>>>>> + =A0 =A0int64_t total_sectors; >>>>>>>>>>> + =A0 =A0int nr_sectors; >>>>>>>>>>> >>>>>>>>>>> =A0 =A0 do { >>>>>>>>>>> =A0 =A0 =A0 =A0 addr =3D qemu_get_be64(f); >>>>>>>>>>> @@ -656,10 +658,22 @@ static int block_load(QEMUFile *f, void *= opaque, int version_id) >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL; >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 } >>>>>>>>>>> >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0total_sectors =3D bdrv_getlength(bs) >= > BDRV_SECTOR_BITS; >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0if (total_sectors <=3D 0) { >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0fprintf(stderr, "Error getting= length of block device %s\n", device_name); >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -EINVAL; >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} >>>>>>>>>>> + >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0if (total_sectors - addr < BDRV_SECTOR= S_PER_DIRTY_CHUNK) { >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nr_sectors =3D total_sectors -= addr; >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} else { >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0nr_sectors =3D BDRV_SECTORS_PE= R_DIRTY_CHUNK; >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0} >>>>>>>>>>> + >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 buf =3D qemu_malloc(BLOCK_SIZE); >>>>>>>>>>> >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 qemu_get_buffer(f, buf, BLOCK_SIZE); >>>>>>>>>>> - =A0 =A0 =A0 =A0 =A0 =A0ret =3D bdrv_write(bs, addr, buf, BDRV= _SECTORS_PER_DIRTY_CHUNK); >>>>>>>>>>> + =A0 =A0 =A0 =A0 =A0 =A0ret =3D bdrv_write(bs, addr, buf, nr_s= ectors); >>>>>>>>>>> >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 qemu_free(buf); >>>>>>>>>>> =A0 =A0 =A0 =A0 =A0 =A0 if (ret < 0) { >>>>>>>>>>> -- >>>>>>>>>>> 1.7.3.5 >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi Pierre, >>>>>>>>>> >>>>>>>>>> I don't think the fix above is correct. =A0If you have a file wh= ich >>>>>>>>>> isn't aliened with BLOCK_SIZE, you won't get an error with the >>>>>>>>>> patch. =A0However, the receiver doesn't know how much sectors wh= ich >>>>>>>>>> the sender wants to be written, so the guest may fail after >>>>>>>>>> migration because some data may not be written. =A0IIUC, althoug= h >>>>>>>>>> changing bytestream should be prevented as much as possible, we >>>>>>>>>> should save/load total_sectors to check appropriate file is >>>>>>>>>> allocated on the receiver side. >>>>>>>>> >>>>>>>>> Isn't the guest supposed to be started using a file with the corr= ect size? >>>>>>>> >>>>>>>> I personally don't like that; It's insisting too much to the user. >>>>>>>> Can't we expand the image on the fly? =A0We can just abort if expa= nding >>>>>>>> failed anyway. >>>>>>> >>>>>>> At first I thought your expansion idea was best, but now I think th= ere are valid scenarios where it fails. >>>>>>> >>>>>>> Imagine both sides are not using a file but a disk partition as sto= rage. If the partition size is not rounded to 1 MB, the last write will fai= l with the current code, and there is no way we can expand the partition. >>>>>>> >>>>>> >>>>>> Right. =A0But in case of partition doesn't the check in the patch be= low >>>>>> return error? =A0Does bdrv_getlength return the size correctly? >>>>> >>>>> I'm pretty sure that it does. We would have problems in other places = if >>>>> it didn't (e.g. we're checking if I/O requests are within the disk si= ze). >>>> >>>> Sorry for the noise. =A0I just learned it's returning the value of lse= ek >>>> in case of raw-posix. >>> >>> >>> And it does a ioctl call on other platforms than Linux. >> >> Thanks. =A0Just a quick question regarding total_sectors. >> BlockDriverState seems to contain total_sectors. =A0Can we avoid >> calling bdrv_getlength() if bs->total_sectors were already there? > > I'd need to check the details, but I think it may not be correct with > growable files. Does growable flag mean total_sectors is growable? Because block-migration require users to preallocate a file w/ enough size, it doesn't seem to be a problem, IIUC. Yoshi > > Kevin > >