qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC] Use O_DSYNC by default and update documentation to explain IO integrity in QEMU
@ 2008-10-09 17:02 Anthony Liguori
  2008-10-10  9:27 ` Avi Kivity
  0 siblings, 1 reply; 2+ messages in thread
From: Anthony Liguori @ 2008-10-09 17:02 UTC (permalink / raw)
  To: qemu-devel@nongnu.org

[-- Attachment #1: Type: text/plain, Size: 76 bytes --]

We'll have some benchmarks later this afternoon.

Regards,

Anthony Liguori

[-- Attachment #2: o_sync.patch --]
[-- Type: text/x-patch, Size: 4057 bytes --]

diff --git a/block-raw-posix.c b/block-raw-posix.c
index 83a358c..e58f191 100644
--- a/block-raw-posix.c
+++ b/block-raw-posix.c
@@ -120,7 +120,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
 
     s->lseek_err_cnt = 0;
 
-    open_flags = O_BINARY;
+    open_flags = O_BINARY | O_DSYNC;
     if ((flags & BDRV_O_ACCESS) == O_RDWR) {
         open_flags |= O_RDWR;
     } else {
@@ -996,7 +996,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags)
             IOObjectRelease( mediaIterator );
     }
 #endif
-    open_flags = O_BINARY;
+    open_flags = O_BINARY | O_DSYNC;
     if ((flags & BDRV_O_ACCESS) == O_RDWR) {
         open_flags |= O_RDWR;
     } else {
diff --git a/qemu-doc.texi b/qemu-doc.texi
index adf270b..2e859ff 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -267,13 +267,56 @@ These options have the same definition as they have in @option{-hdachs}.
 @item snapshot=@var{snapshot}
 @var{snapshot} is "on" or "off" and allows to enable snapshot for given drive (see @option{-snapshot}).
 @item cache=@var{cache}
-@var{cache} is "on" or "off" and allows to disable host cache to access data.
+@var{cache} is "on" or "off" and allows to the use of the host page cache.
 @item format=@var{format}
 Specify which disk @var{format} will be used rather than detecting
 the format.  Can be used to specifiy format=raw to avoid interpreting
 an untrusted format header.
 @end table
 
+By default, QEMU accesses all disk data through the host's page cache.
+This allows the host to perform read-ahead and to avoid duplicating IO
+requests unnecessarily increasing disk performance.  You may notice that
+certain benchmarks in the guest perform better than they do in the host
+(for read) because of this.  This is primarily because the benchmark is
+unaware of the extra level of caching that is occurring when running in
+a virtual environment.
+
+The cache=off option can be used to disable the use of the host's page
+cache.  Disabling the use of the host's page cache will likely reduce
+performance since the host is unable to perform read-ahead and unable
+to avoid duplicating IO requests.  At this time, QEMU will copy data
+internally so the cost of copying data into the host's page cache is
+unlikely to be statistically significant.
+
+The use of cache=off may make a benchmark appear to have results that
+are closer to the results in the host.  This does not imply that data
+integrity is not preserved when using cache=on, it is simply an artifact
+of the fact that the benchmark is not aware that it is in a virtual machine.
+It also does not imply that cache=off should be used for general workloads.
+
+In the future, QEMU will be able to avoid copying data internally and
+under certain workloads, disabling the use of the host's page cache may
+increase performance provided that the guest is actively working to avoid
+bringing data into the CPU's cache.  This can only be achieved when using
+things like sendfile() in the guest or other forms of direct-io.  An example
+of a workload that may benefit from avoiding the host's page cache is a
+static web server that is serving entirely unique data and has a relatively
+large amount of memory relative to the host.  This documentation will be
+updated when this change is made.  For now, cache=off is mostly useful for
+development purposes and for benchmarks that are not virtualization aware.
+
+Write requests are only reported completed to the guest when they have
+been reported completed by the disk regardless of whether the host's
+page cache is used for access so the use of the host's page cache is
+orthogonal to data integrity.
+
+If the host's disk drive has write-back caching enabled and the disk does
+not have a battery-backed cache, then data loss can occur regardless of
+whether write-back caching is disabled in the guest.
+
+If in doubt, do not change the default value (which is cache=on).
+
 Instead of @option{-cdrom} you can use:
 @example
 qemu -drive file=file,index=2,media=cdrom

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [Qemu-devel] [RFC] Use O_DSYNC by default and update documentation to explain IO integrity in QEMU
  2008-10-09 17:02 [Qemu-devel] [RFC] Use O_DSYNC by default and update documentation to explain IO integrity in QEMU Anthony Liguori
@ 2008-10-10  9:27 ` Avi Kivity
  0 siblings, 0 replies; 2+ messages in thread
From: Avi Kivity @ 2008-10-10  9:27 UTC (permalink / raw)
  To: qemu-devel

Anthony Liguori wrote:
>  
> +By default, QEMU accesses all disk data through the host's page cache.
> +This allows the host to perform read-ahead and to avoid duplicating IO
>   

The guest does read-ahead as well, so this is not a good thing.  
Further, if using qcow2, host read ahead will read random data, and if 
using raw devices, host read ahead is equivalent to physical read ahead, 
whereas most OSes prefer logical (file based) read ahead.

> +requests unnecessarily increasing disk performance.  You may notice that
> +certain benchmarks in the guest perform better than they do in the host
> +(for read) because of this.  This is primarily because the benchmark is
> +unaware of the extra level of caching that is occurring when running in
> +a virtual environment.
>   

This is because the the guest is effectively running with a much larger 
cache.  If you'd given the guest this memory, it would perform even better.

IMO the only case where having the cache in the host is preferable is 
when the guest cannot utilize the memory effectively, such as when the 
guest reboots, or with older versions of Windows x86.

> +
> +The cache=off option can be used to disable the use of the host's page
> +cache.  Disabling the use of the host's page cache will likely reduce
> +performance since the host is unable to perform read-ahead and unable
> +to avoid duplicating IO requests.  At this time, QEMU will copy data
> +internally so the cost of copying data into the host's page cache is
> +unlikely to be statistically significant.
>   

It's a mistake to optimize for the current state of affairs which we are 
planning to change soon.

> +In the future, QEMU will be able to avoid copying data internally and
> +under certain workloads, disabling the use of the host's page cache may
> +increase performance provided that the guest is actively working to avoid
> +bringing data into the CPU's cache.  This can only be achieved when using
> +things like sendfile() in the guest or other forms of direct-io.

Guest read ahead, write back, and pageout are also examples of transfers 
that are very likely to benefit from not touching the cache.  Other 
examples are the guest reading a metadata block and only accessing one 
inode.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2008-10-10  9:27 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-09 17:02 [Qemu-devel] [RFC] Use O_DSYNC by default and update documentation to explain IO integrity in QEMU Anthony Liguori
2008-10-10  9:27 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).