qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Avi Kivity <avi@redhat.com>
To: Jens Axboe <qemu@kernel.dk>
Cc: Chris Wright <chrisw@redhat.com>,
	Mark McLoughlin <markmc@redhat.com>,
	kvm-devel <kvm-devel@lists.sourceforge.net>,
	Laurent Vivier <Laurent.Vivier@bull.net>,
	qemu-devel@nongnu.org, Ryan Harper <ryanh@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Date: Sun, 19 Oct 2008 21:11:23 +0200	[thread overview]
Message-ID: <48FB865B.60906@redhat.com> (raw)
In-Reply-To: <20081019183642.GV19428@kernel.dk>

Jens Axboe wrote:
> On Sun, Oct 19 2008, Avi Kivity wrote:
>   
>> Jens Axboe wrote:
>>
>>  
>>
>>     
>>>> Sounds like a bug.  Shouldn't Linux disable the write cache unless the 
>>>> user explicitly enables it, if NCQ is available?  NCQ should provide 
>>>> acceptable throughput even without the write cache.
>>>>     
>>>>         
>>> How can it be a bug? 
>>>       
>> If it puts my data at risk, it's a bug.  I can understand it for IDE,
>> but not for SATA with NCQ.
>>     
>
> Then YOU turn it off. Other people would consider the lousy performance
> to be the bigger problem. See policy :-)
>
>   

If I get lousy performance, I can turn on the write cache and ignore the
risk of data loss.  If I lose my data, I can't turn off the write cache
and get my data back.

(it seems I can't turn off the write cache even without losing my data:

[avi@firebolt ~]$ sudo sdparm --set=WCE=0 /dev/sd[ab]
    /dev/sda: ATA       WDC WD3200YS-01P  21.0
change_mode_page: failed setting page: Caching (SBC)
    /dev/sdb: ATA       WDC WD3200YS-01P  21.0
change_mode_page: failed setting page: Caching (SBC)
)

>>> Changing the cache policy of a drive would be a
>>> policy decision in the kernel, 
>>>       
>> If you don't want this in the kernel, then the system as a whole should
>> default to being safe.  Though in this case I think it is worthwhile to
>> do this in the kernel.
>>     
>
> Doesn't matter how you turn this, it's still a policy decision. Leave it
> to the user. It's not exactly a new turn of events, commodity drives
> have shipped with write caching on forever. What if the drive has a
> battery backing? 

If the drive has a batter backup, I'd argue it should report it as a
write-through cache.  I'm not a drive manufacturer though.

> What if the user has an UPS?
>
>   

They should enable the write-back cache if they trust the UPS.  Or maybe
the system should do that automatically if it's aware of the UPS.

"Policy" doesn't mean you shouldn't choose good defaults.

>>> that is never the right thing to do.
>>> There's no such thing as 'acceptable throughput',
>>>       
>> I meant that performance is not completely destroyed.  How can you even
>>     
>
> How do you know it's not destroyed? Depending on your workload, it may
> very well be dropping your throughput by orders of magnitude.
>
>   

I guess this is the crux.  According to my understanding, you shouldn't
see such a horrible drop, unless the application does synchronous writes
explicitly, in which case it is probably worried about data safety.

>> compare data safety to some percent of performance?
>>     
>
> I'm not, what I'm saying is that different people will have different
> opponions on what is most important. Do note that the window of
> corruption is really small and requires powerloss to trigger. So for
> most desktop users, the tradeoff is actually sane.
>
>   

I agree that the window is very small, and that by eliminating software
failures we get rid of the major source of data loss.  What I don't know
is what the performance tradeoff looks like (and I can't measure since
my drives won't let me turn off the cache for some reason).

>>> Additionally, write back caching is perfectly safe, if used
>>> with a barrier enabled file system in Linux.
>>>   
>>>       
>> Not all Linux filesystems are barrier enabled, AFAIK.  Further, barriers
>> don't help with O_DIRECT (right?).
>>     
>
> O_DIRECT should just use FUA writes, there are safe with write back
> caching. I'm actually testing such a change just to gauge the
> performance impact.
>   

You mean, this is not in mainline yet?

So, with this, plus barrier support for metadata and O_SYNC writes, the
write-back cache should be safe?

Some googling shows that Windows XP introduced FUA for O_DIRECT and
metadata writes as well.

>   
>> I shouldn't need a disk array to run a database.
>>     
>
> You are free to turn off write back caching!
>
>   

What about the users who aren't on qemu-devel?

However, with your FUA change, they should be safe.

>>
>> Most desktop workloads use writeback cache, so write performance is not
>> critical.
>>     
>
> Ehm, how do you reach that conclusion based on that statement?
>
>   

Any write latency is buffered by the kernel.  Write speed is main memory
speed.  Disk speed only bubbles up when memory is tight.

>> However I'd hate to see my data destroyed by a power failure, and
>> today's large caches can hold a bunch of data.
>>     
>
> Then you use barriers or turn write back caching off, simple as that.
>   

I will (if I figure out how) but there may be one or two users who
haven't read the scsi spec yet.

Or more correctly, I am revising my opinion of the write back cache
since even when it is enabled, it is completely optional.  Instead of
disabling the write back cache we should use FUA and barriers, and since
you are to be working on FUA, it looks like this will be resolved soon
without performance/correctness compromises.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.

  reply	other threads:[~2008-10-19 19:12 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-10-09 17:00 [Qemu-devel] [RFC] Disk integrity in QEMU Anthony Liguori
2008-10-10  7:54 ` Gerd Hoffmann
2008-10-10  8:12   ` Mark McLoughlin
2008-10-12 23:10     ` Jamie Lokier
2008-10-14 17:15       ` Avi Kivity
2008-10-10  9:32   ` Avi Kivity
2008-10-12 23:00     ` Jamie Lokier
2008-10-10  8:11 ` Aurelien Jarno
2008-10-10 12:26   ` Anthony Liguori
2008-10-10 12:53     ` Paul Brook
2008-10-10 13:55       ` Anthony Liguori
2008-10-10 14:05         ` Paul Brook
2008-10-10 14:19         ` Avi Kivity
2008-10-17 13:14           ` Jens Axboe
2008-10-19  9:13             ` Avi Kivity
2008-10-10 15:48     ` Aurelien Jarno
2008-10-10  9:16 ` Avi Kivity
2008-10-10  9:58   ` Daniel P. Berrange
2008-10-10 10:26     ` Avi Kivity
2008-10-10 12:59       ` Paul Brook
2008-10-10 13:20         ` Avi Kivity
2008-10-10 12:34   ` Anthony Liguori
2008-10-10 12:56     ` Avi Kivity
2008-10-11  9:07     ` andrzej zaborowski
2008-10-11 17:54   ` Mark Wagner
2008-10-11 20:35     ` Anthony Liguori
2008-10-12  0:43       ` Mark Wagner
2008-10-12  1:50         ` Chris Wright
2008-10-12 16:22           ` Jamie Lokier
2008-10-12 17:54         ` Anthony Liguori
2008-10-12 18:14           ` nuitari-qemu
2008-10-13  0:27           ` Mark Wagner
2008-10-13  1:21             ` Anthony Liguori
2008-10-13  2:09               ` Mark Wagner
2008-10-13  3:16                 ` Anthony Liguori
2008-10-13  6:42                 ` Aurelien Jarno
2008-10-13 14:38                 ` Steve Ofsthun
2008-10-12  0:44       ` Chris Wright
2008-10-12 10:21         ` Avi Kivity
2008-10-12 14:37           ` Dor Laor
2008-10-12 15:35             ` Jamie Lokier
2008-10-12 18:00               ` Anthony Liguori
2008-10-12 18:02             ` Anthony Liguori
2008-10-15 10:17               ` Andrea Arcangeli
2008-10-12 17:59           ` Anthony Liguori
2008-10-12 18:34             ` Avi Kivity
2008-10-12 19:33               ` Izik Eidus
2008-10-14 17:08                 ` Avi Kivity
2008-10-12 19:59               ` Anthony Liguori
2008-10-12 20:43                 ` Avi Kivity
2008-10-12 21:11                   ` Anthony Liguori
2008-10-14 15:21                     ` Avi Kivity
2008-10-14 15:32                       ` Anthony Liguori
2008-10-14 15:43                         ` Avi Kivity
2008-10-14 19:25                       ` Laurent Vivier
2008-10-16  9:47                         ` Avi Kivity
2008-10-12 10:12       ` Avi Kivity
2008-10-17 13:20         ` Jens Axboe
2008-10-19  9:01           ` Avi Kivity
2008-10-19 18:10             ` Jens Axboe
2008-10-19 18:23               ` Avi Kivity
2008-10-19 19:17                 ` M. Warner Losh
2008-10-19 19:31                   ` Avi Kivity
2008-10-19 18:24               ` Avi Kivity
2008-10-19 18:36                 ` Jens Axboe
2008-10-19 19:11                   ` Avi Kivity [this message]
2008-10-19 19:30                     ` Jens Axboe
2008-10-19 20:16                       ` Avi Kivity
2008-10-20 14:14                       ` Avi Kivity
2008-10-10 10:03 ` Fabrice Bellard
2008-10-13 16:11 ` Laurent Vivier
2008-10-13 16:58   ` Anthony Liguori
2008-10-13 17:36     ` Jamie Lokier
2008-10-13 17:06 ` [Qemu-devel] " Ryan Harper
2008-10-13 18:43   ` Anthony Liguori
2008-10-14 16:42     ` Avi Kivity
2008-10-13 18:51   ` Laurent Vivier
2008-10-13 19:43     ` Ryan Harper
2008-10-13 20:21       ` Laurent Vivier
2008-10-13 21:05         ` Ryan Harper
2008-10-15 13:10           ` Laurent Vivier
2008-10-16 10:24             ` Laurent Vivier
2008-10-16 13:43               ` Anthony Liguori
2008-10-16 16:08                 ` Laurent Vivier
2008-10-17 12:48                 ` Avi Kivity
2008-10-17 13:17                   ` Laurent Vivier
2008-10-14 10:05       ` Kevin Wolf
2008-10-14 14:32         ` Ryan Harper
2008-10-14 16:37       ` Avi Kivity
2008-10-13 19:00   ` Mark Wagner
2008-10-13 19:15     ` Ryan Harper
2008-10-14 16:49       ` Avi Kivity
2008-10-13 17:58 ` [Qemu-devel] " Rik van Riel
2008-10-13 18:22   ` Jamie Lokier
2008-10-13 18:34     ` Rik van Riel
2008-10-14  1:56       ` Jamie Lokier
2008-10-14  2:28         ` nuitari-qemu
2008-10-28 17:34 ` Ian Jackson
2008-10-28 17:45   ` Anthony Liguori
2008-10-28 17:50     ` Ian Jackson
2008-10-28 18:19       ` Jamie Lokier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=48FB865B.60906@redhat.com \
    --to=avi@redhat.com \
    --cc=Laurent.Vivier@bull.net \
    --cc=chrisw@redhat.com \
    --cc=kvm-devel@lists.sourceforge.net \
    --cc=markmc@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu@kernel.dk \
    --cc=ryanh@us.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).