From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Krd9l-0006Qn-AP for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:37 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Krd9i-0006MR-Og for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:35 -0400 Received: from [199.232.76.173] (port=48799 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Krd9i-0006M7-Jb for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:34 -0400 Received: from pasmtpb.tele.dk ([80.160.77.98]:52792) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Krd9i-0002i0-59 for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:34 -0400 Date: Sun, 19 Oct 2008 20:36:42 +0200 From: Jens Axboe Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU Message-ID: <20081019183642.GV19428@kernel.dk> References: <48EE38B9.2050106@codemonkey.ws> <48EF1D55.7060307@redhat.com> <48F0E83E.2000907@redhat.com> <48F10DFD.40505@codemonkey.ws> <48F1CD76.2000203@redhat.com> <20081017132040.GK19428@kernel.dk> <48FAF751.8010806@redhat.com> <20081019181026.GU19428@kernel.dk> <48FB7B7A.4050008@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48FB7B7A.4050008@redhat.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Chris Wright , Mark McLoughlin , kvm-devel , Laurent Vivier , qemu-devel@nongnu.org, Ryan Harper On Sun, Oct 19 2008, Avi Kivity wrote: > Jens Axboe wrote: > > > > >> Sounds like a bug. Shouldn't Linux disable the write cache unless the > >> user explicitly enables it, if NCQ is available? NCQ should provide > >> acceptable throughput even without the write cache. > >> > > > > How can it be a bug? > > If it puts my data at risk, it's a bug. I can understand it for IDE, > but not for SATA with NCQ. Then YOU turn it off. Other people would consider the lousy performance to be the bigger problem. See policy :-) > > Changing the cache policy of a drive would be a > > policy decision in the kernel, > > If you don't want this in the kernel, then the system as a whole should > default to being safe. Though in this case I think it is worthwhile to > do this in the kernel. Doesn't matter how you turn this, it's still a policy decision. Leave it to the user. It's not exactly a new turn of events, commodity drives have shipped with write caching on forever. What if the drive has a battery backing? What if the user has an UPS? > > that is never the right thing to do. > > There's no such thing as 'acceptable throughput', > > I meant that performance is not completely destroyed. How can you even How do you know it's not destroyed? Depending on your workload, it may very well be dropping your throughput by orders of magnitude. > compare data safety to some percent of performance? I'm not, what I'm saying is that different people will have different opponions on what is most important. Do note that the window of corruption is really small and requires powerloss to trigger. So for most desktop users, the tradeoff is actually sane. > > manufacturers and > > customers usually just want the go faster stripes and data consistency > > is second. > > What is the performance impact of disabling the write cache, given > enough queue depth? Depends on the drive. On commodity drives, manufacturers don't really optimize much for the write through caching, since it's not really what anybody uses. So you'd have to benchmark it to see. > > Additionally, write back caching is perfectly safe, if used > > with a barrier enabled file system in Linux. > > > > Not all Linux filesystems are barrier enabled, AFAIK. Further, barriers > don't help with O_DIRECT (right?). O_DIRECT should just use FUA writes, there are safe with write back caching. I'm actually testing such a change just to gauge the performance impact. > I shouldn't need a disk array to run a database. You are free to turn off write back caching! > > Also note that most users will not have deep queuing for most things. To > > get good random write performance with write through caching and NCQ, > > you naturally need to be able to fill the drive queue most of the time. > > Most desktop workloads don't come close to that, so the user will > > definitely see it as slower. > > > > Most desktop workloads use writeback cache, so write performance is not > critical. Ehm, how do you reach that conclusion based on that statement? > However I'd hate to see my data destroyed by a power failure, and > today's large caches can hold a bunch of data. Then you use barriers or turn write back caching off, simple as that. -- Jens Axboe