From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1KrckT-0006eN-5D for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:11:29 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1KrckS-0006dg-Jp for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:11:28 -0400 Received: from [199.232.76.173] (port=44213 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1KrckS-0006dZ-8W for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:11:28 -0400 Received: from pasmtpa.tele.dk ([80.160.77.114]:50124) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1KrckR-0007eG-T7 for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:11:28 -0400 Date: Sun, 19 Oct 2008 20:10:27 +0200 From: Jens Axboe Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU Message-ID: <20081019181026.GU19428@kernel.dk> References: <48EE38B9.2050106@codemonkey.ws> <48EF1D55.7060307@redhat.com> <48F0E83E.2000907@redhat.com> <48F10DFD.40505@codemonkey.ws> <48F1CD76.2000203@redhat.com> <20081017132040.GK19428@kernel.dk> <48FAF751.8010806@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48FAF751.8010806@redhat.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Chris Wright , Mark McLoughlin , kvm-devel , Laurent Vivier , qemu-devel@nongnu.org, Ryan Harper On Sun, Oct 19 2008, Avi Kivity wrote: > Jens Axboe wrote: > >On Sun, Oct 12 2008, Avi Kivity wrote: > > > >>>If you have a normal laptop, your disk has a cache. That cache does > >>>not have a battery backup. Under normal operations, the cache is > >>>acting in write-back mode and when you do a write, the disk will > >>>report the write as completed even though it is not actually on disk. > >>>If you really care about the data being on disk, you have to either > >>>use a disk with a battery backed cache (much more expensive) or enable > >>>write-through caching (will significantly reduce performance). > >>> > >>> > >>I think that with SATA NCQ, this is no longer true. The drive will > >>report the write complete when it is on disk, and utilize multiple > >>outstanding requests to get coalescing and reordering. Not sure about > >> > > > >It is still very true. Go buy any consumer drive on the market and check > >the write cache settings - hint, it's definitely shipped with write back > >caching. So while the drive may have NCQ and Linux will use it, the > >write cache is still using write back unless you explicitly change it. > > > > > > Sounds like a bug. Shouldn't Linux disable the write cache unless the > user explicitly enables it, if NCQ is available? NCQ should provide > acceptable throughput even without the write cache. How can it be a bug? Changing the cache policy of a drive would be a policy decision in the kernel, that is never the right thing to do. There's no such thing as 'acceptable throughput', manufacturers and customers usually just want the go faster stripes and data consistency is second. Additionally, write back caching is perfectly safe, if used with a barrier enabled file system in Linux. Also note that most users will not have deep queuing for most things. To get good random write performance with write through caching and NCQ, you naturally need to be able to fill the drive queue most of the time. Most desktop workloads don't come close to that, so the user will definitely see it as slower. -- Jens Axboe