From: Avi Kivity <avi@redhat.com>
To: Jens Axboe <qemu@kernel.dk>
Cc: Chris Wright <chrisw@redhat.com>,
Mark McLoughlin <markmc@redhat.com>,
kvm-devel <kvm-devel@lists.sourceforge.net>,
Laurent Vivier <Laurent.Vivier@bull.net>,
qemu-devel@nongnu.org, Ryan Harper <ryanh@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Date: Sun, 19 Oct 2008 22:16:43 +0200 [thread overview]
Message-ID: <48FB95AB.6090402@redhat.com> (raw)
In-Reply-To: <20081019193024.GX19428@kernel.dk>
Jens Axboe wrote:
>> (it seems I can't turn off the write cache even without losing my data:
>>
> Use hdparm, it's an ATA drive even if Linux currently uses the scsi
> layer for it. Or use sysfs, there's a "cache_type" attribute in the scsi
> disk sysfs directory.
>
Ok. It's moot anyway.
>> "Policy" doesn't mean you shouldn't choose good defaults.
>>
>
> Changing the hardware settings for this kind of behaviour IS most
> certainly policy.
>
Leaving bad hardware settings is also policy. But in light of FUA, the
SCSI write cache is not a bad thing, so we should definitely leave it on.
>> I guess this is the crux. According to my understanding, you shouldn't
>> see such a horrible drop, unless the application does synchronous writes
>> explicitly, in which case it is probably worried about data safety.
>>
>
> Then you need to adjust your understanding, because you definitely will
> see a big drop in performance.
>
>
Can you explain why? This is interesting.
>>> O_DIRECT should just use FUA writes, there are safe with write back
>>> caching. I'm actually testing such a change just to gauge the
>>> performance impact.
>>>
>>>
>> You mean, this is not in mainline yet?
>>
>
> It isn't.
>
What is the time frame for this? 2.6.29?
>> Some googling shows that Windows XP introduced FUA for O_DIRECT and
>> metadata writes as well.
>>
>
> There's a lot of other background information to understand to gauge the
> impact of using eg FUA for O_DIRECT in Linux as well. MS basically wrote
> the FUA for ATA proposal, and the original usage pattern (as far as I
> remember) was indeed meta data. Hence it also imposes a priority boost
> in most (all?) drive firmwares, since it's deemed important. So just
> using FUA vs non-FUA is likely to impact performance of other workloads
> in fairly unknown ways. FUA on non-queuing drives will also likely suck
> for performance, since you're basically going to be blowing a drive rev
> for each IO. And that hurts.
>
Let's assume queueing drives, since these are fairly common these days.
So qemu issuing O_DIRECT which turns into FUA writes is safe but
suboptimal. Has there been talk about exposing the difference between
FUA writes and cached writes to userspace? What about barriers?
With a rich enough userspace interface, qemu can communicate the
intentions of the guest and not force the kernel to make a
performance/correctness tradeoff.
>>
>> What about the users who aren't on qemu-devel?
>>
>
> It may be news to you, but it has been debated on lkml in the past as
> well. Not even that long ago, and I'd be surprised of lwn didn't run
> some article on it as well.
Let's postulate the existence of a user that doesn't read lkml or even lwn.
> But I agree it's important information, but
> realize that until just recently most people didn't really consider it a
> likely scenario in practice...
>
> I wrote and committed the original barrier implementation in Linux in
> 2001, and just this year XFS made it a default mount option. After the
> recent debacle on this on lkml, ext4 made it the default as well.
>
> So let me turn it around a bit - if this issue really did hit lots of
> people out there in real life, don't you think there would have been
> more noise about this and we would have made this the default years ago?
> So while we both agree it's a risk, it's not a huuuge risk...
>
I agree, not a huge risk. I guess compared to the rest of the suckiness
involved (took a long while just to get journalling), this is really a
minor issue. It's interesting though that Windows supported this in
2001, seven years ago, so at least they considered it important.
I guess I'm sensitive to this because in my filesystemy past QA would
jerk out data and power cables while running various tests and act
surprised whenever data was lost. So I'm allergic to data loss.
With qemu (at least when used with a hypervisor) we have to be extra
safe since we have no idea what workload is running and how critical
data safety is. Well, we have hints (whether FUA is set or not) when
using SCSI, but right now we don't have a way of communicating these
hints to the kernel.
One important takeaway is to find out whether virtio-blk supports FUA,
and if not, add it.
>> However, with your FUA change, they should be safe.
>>
>
> Yes, that would make O_DIRECT safe always. Except when it falls back to
> buffered IO, woops...
>
>
Woops.
>> Any write latency is buffered by the kernel. Write speed is main memory
>> speed. Disk speed only bubbles up when memory is tight.
>>
>
> That's a nice theory, in practice that is completely wrong. You end up
> waiting on writes for LOTS of other reasons!
>
>
Journal commits? Can you elaborate?
In the filesystem I worked on, one would never wait on a write to disk
unless memory was full. Even synchronous writes were serviced
immediately, since the system had a battery-backed replicated cache. I
guess the situation with Linux filesystems is different.
--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.
next prev parent reply other threads:[~2008-10-19 20:17 UTC|newest]
Thread overview: 101+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-10-09 17:00 [Qemu-devel] [RFC] Disk integrity in QEMU Anthony Liguori
2008-10-10 7:54 ` Gerd Hoffmann
2008-10-10 8:12 ` Mark McLoughlin
2008-10-12 23:10 ` Jamie Lokier
2008-10-14 17:15 ` Avi Kivity
2008-10-10 9:32 ` Avi Kivity
2008-10-12 23:00 ` Jamie Lokier
2008-10-10 8:11 ` Aurelien Jarno
2008-10-10 12:26 ` Anthony Liguori
2008-10-10 12:53 ` Paul Brook
2008-10-10 13:55 ` Anthony Liguori
2008-10-10 14:05 ` Paul Brook
2008-10-10 14:19 ` Avi Kivity
2008-10-17 13:14 ` Jens Axboe
2008-10-19 9:13 ` Avi Kivity
2008-10-10 15:48 ` Aurelien Jarno
2008-10-10 9:16 ` Avi Kivity
2008-10-10 9:58 ` Daniel P. Berrange
2008-10-10 10:26 ` Avi Kivity
2008-10-10 12:59 ` Paul Brook
2008-10-10 13:20 ` Avi Kivity
2008-10-10 12:34 ` Anthony Liguori
2008-10-10 12:56 ` Avi Kivity
2008-10-11 9:07 ` andrzej zaborowski
2008-10-11 17:54 ` Mark Wagner
2008-10-11 20:35 ` Anthony Liguori
2008-10-12 0:43 ` Mark Wagner
2008-10-12 1:50 ` Chris Wright
2008-10-12 16:22 ` Jamie Lokier
2008-10-12 17:54 ` Anthony Liguori
2008-10-12 18:14 ` nuitari-qemu
2008-10-13 0:27 ` Mark Wagner
2008-10-13 1:21 ` Anthony Liguori
2008-10-13 2:09 ` Mark Wagner
2008-10-13 3:16 ` Anthony Liguori
2008-10-13 6:42 ` Aurelien Jarno
2008-10-13 14:38 ` Steve Ofsthun
2008-10-12 0:44 ` Chris Wright
2008-10-12 10:21 ` Avi Kivity
2008-10-12 14:37 ` Dor Laor
2008-10-12 15:35 ` Jamie Lokier
2008-10-12 18:00 ` Anthony Liguori
2008-10-12 18:02 ` Anthony Liguori
2008-10-15 10:17 ` Andrea Arcangeli
2008-10-12 17:59 ` Anthony Liguori
2008-10-12 18:34 ` Avi Kivity
2008-10-12 19:33 ` Izik Eidus
2008-10-14 17:08 ` Avi Kivity
2008-10-12 19:59 ` Anthony Liguori
2008-10-12 20:43 ` Avi Kivity
2008-10-12 21:11 ` Anthony Liguori
2008-10-14 15:21 ` Avi Kivity
2008-10-14 15:32 ` Anthony Liguori
2008-10-14 15:43 ` Avi Kivity
2008-10-14 19:25 ` Laurent Vivier
2008-10-16 9:47 ` Avi Kivity
2008-10-12 10:12 ` Avi Kivity
2008-10-17 13:20 ` Jens Axboe
2008-10-19 9:01 ` Avi Kivity
2008-10-19 18:10 ` Jens Axboe
2008-10-19 18:23 ` Avi Kivity
2008-10-19 19:17 ` M. Warner Losh
2008-10-19 19:31 ` Avi Kivity
2008-10-19 18:24 ` Avi Kivity
2008-10-19 18:36 ` Jens Axboe
2008-10-19 19:11 ` Avi Kivity
2008-10-19 19:30 ` Jens Axboe
2008-10-19 20:16 ` Avi Kivity [this message]
2008-10-20 14:14 ` Avi Kivity
2008-10-10 10:03 ` Fabrice Bellard
2008-10-13 16:11 ` Laurent Vivier
2008-10-13 16:58 ` Anthony Liguori
2008-10-13 17:36 ` Jamie Lokier
2008-10-13 17:06 ` [Qemu-devel] " Ryan Harper
2008-10-13 18:43 ` Anthony Liguori
2008-10-14 16:42 ` Avi Kivity
2008-10-13 18:51 ` Laurent Vivier
2008-10-13 19:43 ` Ryan Harper
2008-10-13 20:21 ` Laurent Vivier
2008-10-13 21:05 ` Ryan Harper
2008-10-15 13:10 ` Laurent Vivier
2008-10-16 10:24 ` Laurent Vivier
2008-10-16 13:43 ` Anthony Liguori
2008-10-16 16:08 ` Laurent Vivier
2008-10-17 12:48 ` Avi Kivity
2008-10-17 13:17 ` Laurent Vivier
2008-10-14 10:05 ` Kevin Wolf
2008-10-14 14:32 ` Ryan Harper
2008-10-14 16:37 ` Avi Kivity
2008-10-13 19:00 ` Mark Wagner
2008-10-13 19:15 ` Ryan Harper
2008-10-14 16:49 ` Avi Kivity
2008-10-13 17:58 ` [Qemu-devel] " Rik van Riel
2008-10-13 18:22 ` Jamie Lokier
2008-10-13 18:34 ` Rik van Riel
2008-10-14 1:56 ` Jamie Lokier
2008-10-14 2:28 ` nuitari-qemu
2008-10-28 17:34 ` Ian Jackson
2008-10-28 17:45 ` Anthony Liguori
2008-10-28 17:50 ` Ian Jackson
2008-10-28 18:19 ` Jamie Lokier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=48FB95AB.6090402@redhat.com \
--to=avi@redhat.com \
--cc=Laurent.Vivier@bull.net \
--cc=chrisw@redhat.com \
--cc=kvm-devel@lists.sourceforge.net \
--cc=markmc@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=qemu@kernel.dk \
--cc=ryanh@us.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).