From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=37522 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ou3yS-0006Y4-Tl for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:49:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ou3yR-0007Eb-1l for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:49:04 -0400 Received: from verein.lst.de ([213.95.11.210]:51946) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ou3yQ-0007EL-Oj for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:49:03 -0400 Date: Fri, 10 Sep 2010 15:48:59 +0200 From: Christoph Hellwig Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format Message-ID: <20100910134859.GB28831@lst.de> References: <4C86BC6B.5010809@codemonkey.ws> <4C874812.9090807@redhat.com> <4C87860A.3060904@codemonkey.ws> <4C888287.8020209@redhat.com> <4C88D7CC.5000806@codemonkey.ws> <4C8A1311.8070903@redhat.com> <4C8A15C4.40201@redhat.com> <4C8A19CA.3040000@redhat.com> <4C8A3106.8050501@codemonkey.ws> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4C8A3106.8050501@codemonkey.ws> List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Stefan Hajnoczi , qemu-devel@nongnu.org, Avi Kivity , Stefan Hajnoczi On Fri, Sep 10, 2010 at 08:22:14AM -0500, Anthony Liguori wrote: > fsck will always be fast on qed because the metadata is small. For a > 1PB image, there's 128MB worth of L2s if it's fully allocated (keeping > in mind, that once you're fully allocated, you'll never fsck again). If > you've got 1PB worth of storage, I'm fairly sure you're going to be able > to do 128MB of reads in a short period of time. Even if it's a few > seconds, it only occurs on power failure so it's pretty reasonable. I don't think it is. Even if the metadata is small it can still be spread all over the disks and seek latencies might kill you. I think if we want to make qed future proof it needs to provide transactional integrity for metadata updates, just like a journaling filesystem. Given that small amount of metadata and less different kinds it will still be a lot simpler than a full filesystem of course.