From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KLB0i-0003sA-CR for qemu-devel@nongnu.org; Tue, 22 Jul 2008 02:06:08 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KLB0h-0003ry-NS for qemu-devel@nongnu.org; Tue, 22 Jul 2008 02:06:08 -0400 Received: from [199.232.76.173] (port=44611 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KLB0h-0003rv-Jo for qemu-devel@nongnu.org; Tue, 22 Jul 2008 02:06:07 -0400 Received: from il.qumranet.com ([212.179.150.194]:27913) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KLB0h-0004R3-21 for qemu-devel@nongnu.org; Tue, 22 Jul 2008 02:06:07 -0400 Message-ID: <488578CA.4000402@qumranet.com> Date: Tue, 22 Jul 2008 09:06:02 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] qcow2 - safe on kill? safe on power fail? References: <47CF0E0C.9030807@quinthar.com> <47CF16C5.6040102@codemonkey.ws> <20080721181031.GA31773@shareable.org> <4884E6F1.5020205@codemonkey.ws> <20080721212604.GA2823@shareable.org> <48850A5A.3070106@codemonkey.ws> In-Reply-To: <48850A5A.3070106@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: qemu-devel@nongnu.org Anthony Liguori wrote: > Jamie Lokier wrote: >>> If the sector hasn't been previously allocated, then a new sector in >>> the file needs to be allocated. This is going to change metadata >>> within the QCOW2 file and this is where it is possible to corrupt a >>> disk image. The operation of allocating a new disk sector is >>> completely synchronous so no other code runs until this completes. >>> Once the disk sector is allocated, you're safe again[1]. >>> >> >> My main concern is corruption of the QCOW2 sector allocation map, and >> subsequently QEMU/KVM breaking or going wildly haywire with that file. >> >> With a normal filesystem, sure, there are lots of ways to get >> corruption when certain events happen. But you don't lose the _whole_ >> filesystem. >> > > Sure you can. If you don't have a battery backed disk cache and are > using write-back (which is usually the default), you can definitely > get corruption of the journal. Likewise, under the right scenarios, > you will get journal corruption with the default mount options of ext3 > because it doesn't use barriers. > What about SCSI or SATA NCQ? On these, barriers don't impact performance greatly. > This is very hard to see happen in practice though because these > windows are very small--just like with QEMU. > The exposure window with qemu is not small. It's as large as the page cache of the host. > > >>> you are running QEMU with cache=off to disable host write caching. >> >> Doesn't that use O_DIRECT? O_DIRECT writes don't use barriers, and >> fsync() does not deterministically issue a disk barrier if there's no >> metadata change, so O_DIRECT writes are _less_ safe with disks which >> have write-cache enabled than using normal writes. >> > > It depends on the filesystem. ext3 never issues any barriers by > default :-) > > I would think a good filesystem would issue a barrier after an > O_DIRECT write. > Using a disk controller that supports queueing means that you can (in theory at least) leave writeback turned on and yet have the disk not lie to you about completions. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.