From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1Knytt-0000AI-9p for qemu-devel@nongnu.org; Thu, 09 Oct 2008 13:02:09 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1Knytr-00009i-Vp for qemu-devel@nongnu.org; Thu, 09 Oct 2008 13:02:08 -0400 Received: from [199.232.76.173] (port=45535 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Knytr-00009d-Ot for qemu-devel@nongnu.org; Thu, 09 Oct 2008 13:02:07 -0400 Received: from mail-gx0-f19.google.com ([209.85.217.19]:43875) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1Knytr-0006nS-7C for qemu-devel@nongnu.org; Thu, 09 Oct 2008 13:02:07 -0400 Received: by gxk12 with SMTP id 12so256593gxk.10 for ; Thu, 09 Oct 2008 10:02:06 -0700 (PDT) Message-ID: <48EE390A.6070601@codemonkey.ws> Date: Thu, 09 Oct 2008 12:02:02 -0500 From: Anthony Liguori MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------040204060409050702040008" Subject: [Qemu-devel] [RFC] Use O_DSYNC by default and update documentation to explain IO integrity in QEMU Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "qemu-devel@nongnu.org" This is a multi-part message in MIME format. --------------040204060409050702040008 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit We'll have some benchmarks later this afternoon. Regards, Anthony Liguori --------------040204060409050702040008 Content-Type: text/x-patch; name="o_sync.patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="o_sync.patch" diff --git a/block-raw-posix.c b/block-raw-posix.c index 83a358c..e58f191 100644 --- a/block-raw-posix.c +++ b/block-raw-posix.c @@ -120,7 +120,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags) s->lseek_err_cnt = 0; - open_flags = O_BINARY; + open_flags = O_BINARY | O_DSYNC; if ((flags & BDRV_O_ACCESS) == O_RDWR) { open_flags |= O_RDWR; } else { @@ -996,7 +996,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) IOObjectRelease( mediaIterator ); } #endif - open_flags = O_BINARY; + open_flags = O_BINARY | O_DSYNC; if ((flags & BDRV_O_ACCESS) == O_RDWR) { open_flags |= O_RDWR; } else { diff --git a/qemu-doc.texi b/qemu-doc.texi index adf270b..2e859ff 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -267,13 +267,56 @@ These options have the same definition as they have in @option{-hdachs}. @item snapshot=@var{snapshot} @var{snapshot} is "on" or "off" and allows to enable snapshot for given drive (see @option{-snapshot}). @item cache=@var{cache} -@var{cache} is "on" or "off" and allows to disable host cache to access data. +@var{cache} is "on" or "off" and allows to the use of the host page cache. @item format=@var{format} Specify which disk @var{format} will be used rather than detecting the format. Can be used to specifiy format=raw to avoid interpreting an untrusted format header. @end table +By default, QEMU accesses all disk data through the host's page cache. +This allows the host to perform read-ahead and to avoid duplicating IO +requests unnecessarily increasing disk performance. You may notice that +certain benchmarks in the guest perform better than they do in the host +(for read) because of this. This is primarily because the benchmark is +unaware of the extra level of caching that is occurring when running in +a virtual environment. + +The cache=off option can be used to disable the use of the host's page +cache. Disabling the use of the host's page cache will likely reduce +performance since the host is unable to perform read-ahead and unable +to avoid duplicating IO requests. At this time, QEMU will copy data +internally so the cost of copying data into the host's page cache is +unlikely to be statistically significant. + +The use of cache=off may make a benchmark appear to have results that +are closer to the results in the host. This does not imply that data +integrity is not preserved when using cache=on, it is simply an artifact +of the fact that the benchmark is not aware that it is in a virtual machine. +It also does not imply that cache=off should be used for general workloads. + +In the future, QEMU will be able to avoid copying data internally and +under certain workloads, disabling the use of the host's page cache may +increase performance provided that the guest is actively working to avoid +bringing data into the CPU's cache. This can only be achieved when using +things like sendfile() in the guest or other forms of direct-io. An example +of a workload that may benefit from avoiding the host's page cache is a +static web server that is serving entirely unique data and has a relatively +large amount of memory relative to the host. This documentation will be +updated when this change is made. For now, cache=off is mostly useful for +development purposes and for benchmarks that are not virtualization aware. + +Write requests are only reported completed to the guest when they have +been reported completed by the disk regardless of whether the host's +page cache is used for access so the use of the host's page cache is +orthogonal to data integrity. + +If the host's disk drive has write-back caching enabled and the disk does +not have a battery-backed cache, then data loss can occur regardless of +whether write-back caching is disabled in the guest. + +If in doubt, do not change the default value (which is cache=on). + Instead of @option{-cdrom} you can use: @example qemu -drive file=file,index=2,media=cdrom --------------040204060409050702040008--