From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=42709 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ou3Yb-0003hL-6Z for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:22:22 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ou3YZ-0002bz-Ke for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:22:20 -0400 Received: from mail-vw0-f45.google.com ([209.85.212.45]:34638) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ou3YZ-0002bt-Fz for qemu-devel@nongnu.org; Fri, 10 Sep 2010 09:22:19 -0400 Received: by vws19 with SMTP id 19so2623587vws.4 for ; Fri, 10 Sep 2010 06:22:19 -0700 (PDT) Message-ID: <4C8A3106.8050501@codemonkey.ws> Date: Fri, 10 Sep 2010 08:22:14 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format References: <1283767478-16740-1-git-send-email-stefanha@linux.vnet.ibm.com> <4C84E738.3020802@codemonkey.ws> <4C865187.6090508@redhat.com> <4C865CFE.7010508@codemonkey.ws> <4C8663C4.1090508@redhat.com> <4C866773.2030103@codemonkey.ws> <4C86BC6B.5010809@codemonkey.ws> <4C874812.9090807@redhat.com> <4C87860A.3060904@codemonkey.ws> <4C888287.8020209@redhat.com> <4C88D7CC.5000806@codemonkey.ws> <4C8A1311.8070903@redhat.com> <4C8A15C4.40201@redhat.com> <4C8A19CA.3040000@redhat.com> In-Reply-To: <4C8A19CA.3040000@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , Stefan Hajnoczi , Stefan Hajnoczi , qemu-devel@nongnu.org On 09/10/2010 06:43 AM, Avi Kivity wrote: > On 09/10/2010 02:33 PM, Stefan Hajnoczi wrote: >> >>> btw, despite being not properly designed, qcow2 is able to support >>> TRIM. >>> qed isn't able to, except by leaking clusters on shutdown. TRIM >>> support is >>> required unless you're okay with the image growing until it is no >>> longer >>> sparse (the lack of TRIM support in guests make sparse image formats >>> somewhat of a joke, but nobody seems to notice). >> Anthony has started writing up notes on trim for qed: >> http://wiki.qemu.org/Features/QED/Trim >> > > Looks like it depends on fsck, which is not a good idea for large images. fsck will always be fast on qed because the metadata is small. For a 1PB image, there's 128MB worth of L2s if it's fully allocated (keeping in mind, that once you're fully allocated, you'll never fsck again). If you've got 1PB worth of storage, I'm fairly sure you're going to be able to do 128MB of reads in a short period of time. Even if it's a few seconds, it only occurs on power failure so it's pretty reasonable. >> I need to look at the actual ATA and SCSI specs for how this will >> work. The issue I am concerned with is sub-cluster trim operations. >> If the trim region is less than a cluster, then both qed and qcow2 >> don't really have a way to handle it. Perhaps we could punch a hole >> in the file, given a userspace interface to do this, but that isn't >> ideal because we're losing sparseness again. > > To deal with a sub-cluster TRIM, look at the surrounding sectors. If > they're zero, free the cluster. If not, write zeros or use > sys_punch() to the range specified by TRIM. Better yet, if you can't trim a full cluster, just write out zeros and have a separate background process that punches out zero clusters. That approach is a bit more generic and will help compact images independently of guest trims. Regards, Anthony Liguori