From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1Krd9l-0006Qn-AP
	for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:37 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1Krd9i-0006MR-Og
	for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:35 -0400
Received: from [199.232.76.173] (port=48799 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Krd9i-0006M7-Jb
	for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:34 -0400
Received: from pasmtpb.tele.dk ([80.160.77.98]:52792)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <axboe@kernel.dk>) id 1Krd9i-0002i0-59
	for qemu-devel@nongnu.org; Sun, 19 Oct 2008 14:37:34 -0400
Date: Sun, 19 Oct 2008 20:36:42 +0200
From: Jens Axboe <qemu@kernel.dk>
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Message-ID: <20081019183642.GV19428@kernel.dk>
References: <48EE38B9.2050106@codemonkey.ws> <48EF1D55.7060307@redhat.com>
	<48F0E83E.2000907@redhat.com> <48F10DFD.40505@codemonkey.ws>
	<48F1CD76.2000203@redhat.com> <20081017132040.GK19428@kernel.dk>
	<48FAF751.8010806@redhat.com> <20081019181026.GU19428@kernel.dk>
	<48FB7B7A.4050008@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <48FB7B7A.4050008@redhat.com>
Reply-To: qemu-devel@nongnu.org
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Chris Wright <chrisw@redhat.com>, Mark McLoughlin <markmc@redhat.com>, kvm-devel <kvm-devel@lists.sourceforge.net>, Laurent Vivier <Laurent.Vivier@bull.net>, qemu-devel@nongnu.org, Ryan Harper <ryanh@us.ibm.com>

On Sun, Oct 19 2008, Avi Kivity wrote:
> Jens Axboe wrote:
> 
>  
> 
> >> Sounds like a bug.  Shouldn't Linux disable the write cache unless the 
> >> user explicitly enables it, if NCQ is available?  NCQ should provide 
> >> acceptable throughput even without the write cache.
> >>     
> >
> > How can it be a bug? 
> 
> If it puts my data at risk, it's a bug.  I can understand it for IDE,
> but not for SATA with NCQ.

Then YOU turn it off. Other people would consider the lousy performance
to be the bigger problem. See policy :-)

> > Changing the cache policy of a drive would be a
> > policy decision in the kernel, 
> 
> If you don't want this in the kernel, then the system as a whole should
> default to being safe.  Though in this case I think it is worthwhile to
> do this in the kernel.

Doesn't matter how you turn this, it's still a policy decision. Leave it
to the user. It's not exactly a new turn of events, commodity drives
have shipped with write caching on forever. What if the drive has a
battery backing? What if the user has an UPS?

> > that is never the right thing to do.
> > There's no such thing as 'acceptable throughput',
> 
> I meant that performance is not completely destroyed.  How can you even

How do you know it's not destroyed? Depending on your workload, it may
very well be dropping your throughput by orders of magnitude.

> compare data safety to some percent of performance?

I'm not, what I'm saying is that different people will have different
opponions on what is most important. Do note that the window of
corruption is really small and requires powerloss to trigger. So for
most desktop users, the tradeoff is actually sane.

> >  manufacturers and
> > customers usually just want the go faster stripes and data consistency
> > is second. 
> 
> What is the performance impact of disabling the write cache, given
> enough queue depth?

Depends on the drive. On commodity drives, manufacturers don't really
optimize much for the write through caching, since it's not really what
anybody uses. So you'd have to benchmark it to see.

> > Additionally, write back caching is perfectly safe, if used
> > with a barrier enabled file system in Linux.
> >   
> 
> Not all Linux filesystems are barrier enabled, AFAIK.  Further, barriers
> don't help with O_DIRECT (right?).

O_DIRECT should just use FUA writes, there are safe with write back
caching. I'm actually testing such a change just to gauge the
performance impact.

> I shouldn't need a disk array to run a database.

You are free to turn off write back caching!

> > Also note that most users will not have deep queuing for most things. To
> > get good random write performance with write through caching and NCQ,
> > you naturally need to be able to fill the drive queue most of the time.
> > Most desktop workloads don't come close to that, so the user will
> > definitely see it as slower.
> >   
> 
> Most desktop workloads use writeback cache, so write performance is not
> critical.

Ehm, how do you reach that conclusion based on that statement?

> However I'd hate to see my data destroyed by a power failure, and
> today's large caches can hold a bunch of data.

Then you use barriers or turn write back caching off, simple as that.

-- 
Jens Axboe