From mboxrd@z Thu Jan  1 00:00:00 1970
From: Kevin Wolf <kwolf@redhat.com>
Subject: Re: qemu and qemu.git -> Migration + disk stress
 introduces qcow2 corruptions
Date: Mon, 14 Nov 2011 12:21:53 +0100
Message-ID: <4EC0F9D1.3060505@redhat.com>
References: <4EBAACAF.4080407@codemonkey.ws> <4EBAB236.2060409@redhat.com>
	<4EBAB9FA.3070601@codemonkey.ws> <4EBB919B.7040605@redhat.com>
	<4EBC1792.3030004@codemonkey.ws>
	<4EBC4260.1090405@codemonkey.ws> <4EBCF5DA.1000605@redhat.com>
	<4EBE499E.4030100@redhat.com> <20111114101610.GA32392@redhat.com>
	<20111114102421.GE16454@redhat.com>
	<20111114110802.GB32392@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Lucas Meneghel Rodrigues <lmr@redhat.com>,
	KVM mailing list <kvm@vger.kernel.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"libvir-list@redhat.com" <libvir-list@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>, QEMU devel <qemu-devel@nongnu.org>,
	Juan Jose Quintela Carreira <quintela@redhat.com>,
	Avi Kivity <avi@redhat.com>
To: "Daniel P. Berrange" <berrange@redhat.com>
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
In-Reply-To: <20111114110802.GB32392@redhat.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
List-Id: kvm.vger.kernel.org

Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
>>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>>> Live migration with qcow2 or any other image format is just not going to work 
>>>>>> right now even with proper clustered storage.  I think doing a block level flush 
>>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>>
>>>>> I would really prefer reusing the existing open/close code. It means
>>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>>> make migration much of a special case.
>>>>>
>>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>>> and in 1.1 we can use bdrv_reopen().
>>>>>
>>>>
>>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>>> yields different results from the first, does it invalidate any
>>>> computations in between?
>>>>
>>>> What's wrong with just delaying the open?
>>>
>>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>>> the ability to rollback to the source host upon open failure for most
>>> deployed versions of libvirt. We only fairly recently switched to a five
>>> stage migration handshake to cope with rollback when 'cont' fails.
>>>
>>> Daniel
>>
>> I guess reopen can fail as well, so this seems to me to be an important
>> fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.

Do you have some more details on the kind of errors? Missing files,
permissions, something like this? Or rather something related to the
actual content of an image file?

I'm asking because for avoiding the former, things like access() could
be enough, whereas for the latter we'd have to do a full open.

Kevin

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:53969)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RPuYw-00061F-TJ
	for qemu-devel@nongnu.org; Mon, 14 Nov 2011 06:18:56 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RPuYv-0001b4-LD
	for qemu-devel@nongnu.org; Mon, 14 Nov 2011 06:18:54 -0500
Received: from mx1.redhat.com ([209.132.183.28]:37986)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1RPuYv-0001ad-C6
	for qemu-devel@nongnu.org; Mon, 14 Nov 2011 06:18:53 -0500
Message-ID: <4EC0F9D1.3060505@redhat.com>
Date: Mon, 14 Nov 2011 12:21:53 +0100
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <4EBAACAF.4080407@codemonkey.ws> <4EBAB236.2060409@redhat.com>
	<4EBAB9FA.3070601@codemonkey.ws> <4EBB919B.7040605@redhat.com>
	<4EBC1792.3030004@codemonkey.ws>
	<4EBC4260.1090405@codemonkey.ws> <4EBCF5DA.1000605@redhat.com>
	<4EBE499E.4030100@redhat.com> <20111114101610.GA32392@redhat.com>
	<20111114102421.GE16454@redhat.com>
	<20111114110802.GB32392@redhat.com>
In-Reply-To: <20111114110802.GB32392@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] qemu and qemu.git -> Migration + disk stress
 introduces qcow2 corruptions
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Daniel P. Berrange" <berrange@redhat.com>
Cc: Lucas Meneghel Rodrigues <lmr@redhat.com>, KVM mailing list <kvm@vger.kernel.org>, "Michael S. Tsirkin" <mst@redhat.com>, "libvir-list@redhat.com" <libvir-list@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>, QEMU devel <qemu-devel@nongnu.org>, Juan Jose Quintela Carreira <quintela@redhat.com>, Avi Kivity <avi@redhat.com>

Am 14.11.2011 12:08, schrieb Daniel P. Berrange:
> On Mon, Nov 14, 2011 at 12:24:22PM +0200, Michael S. Tsirkin wrote:
>> On Mon, Nov 14, 2011 at 10:16:10AM +0000, Daniel P. Berrange wrote:
>>> On Sat, Nov 12, 2011 at 12:25:34PM +0200, Avi Kivity wrote:
>>>> On 11/11/2011 12:15 PM, Kevin Wolf wrote:
>>>>> Am 10.11.2011 22:30, schrieb Anthony Liguori:
>>>>>> Live migration with qcow2 or any other image format is just not going to work 
>>>>>> right now even with proper clustered storage.  I think doing a block level flush 
>>>>>> cache interface and letting block devices decide how to do it is the best approach.
>>>>>
>>>>> I would really prefer reusing the existing open/close code. It means
>>>>> less (duplicated) code, is existing code that is well tested and doesn't
>>>>> make migration much of a special case.
>>>>>
>>>>> If you want to avoid reopening the file on the OS level, we can reopen
>>>>> only the topmost layer (i.e. the format, but not the protocol) for now
>>>>> and in 1.1 we can use bdrv_reopen().
>>>>>
>>>>
>>>> Intuitively I dislike _reopen style interfaces.  If the second open
>>>> yields different results from the first, does it invalidate any
>>>> computations in between?
>>>>
>>>> What's wrong with just delaying the open?
>>>
>>> If you delay the 'open' until the mgmt app issues 'cont', then you loose
>>> the ability to rollback to the source host upon open failure for most
>>> deployed versions of libvirt. We only fairly recently switched to a five
>>> stage migration handshake to cope with rollback when 'cont' fails.
>>>
>>> Daniel
>>
>> I guess reopen can fail as well, so this seems to me to be an important
>> fix but not a blocker.
> 
> If if the initial open succeeds, then it is far more likely that a later
> re-open will succeed too, because you have already elminated the possibility
> of configuration mistakes, and will have caught most storage runtime errors
> too. So there is a very significant difference in reliability between doing
> an 'open at startup + reopen at cont' vs just 'open at cont'
> 
> Based on the bug reports I see, we want to be very good at detecting and
> gracefully handling open errors because they are pretty frequent.

Do you have some more details on the kind of errors? Missing files,
permissions, something like this? Or rather something related to the
actual content of an image file?

I'm asking because for avoiding the former, things like access() could
be enough, whereas for the latter we'd have to do a full open.

Kevin