From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1JysQ3-0003DH-GQ for qemu-devel@nongnu.org; Wed, 21 May 2008 13:48:07 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1JysQ1-0003D0-G1 for qemu-devel@nongnu.org; Wed, 21 May 2008 13:48:06 -0400 Received: from [199.232.76.173] (port=57320 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1JysQ1-0003Cw-9g for qemu-devel@nongnu.org; Wed, 21 May 2008 13:48:05 -0400 Received: from host36-195-149-62.serverdedicati.aruba.it ([62.149.195.36]:58586 helo=mx.cpushare.com) by monty-python.gnu.org with esmtps (TLS-1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.60) (envelope-from ) id 1JysQ0-0005Qp-T5 for qemu-devel@nongnu.org; Wed, 21 May 2008 13:48:05 -0400 Date: Wed, 21 May 2008 19:47:54 +0200 From: Andrea Arcangeli Subject: Re: [Qemu-devel] Re: [PATCH][v2] Align file accesses with cache=off (O_DIRECT) Message-ID: <20080521174754.GG22488@duo.random> References: <483429EB.7070705@codemonkey.ws> <48342F05.2090603@qumranet.com> <48343106.4070801@codemonkey.ws> <48343844.1050107@qumranet.com> <20080521153454.GB20527@shareable.org> <48344793.2020902@codemonkey.ws> <20080521162406.GA21501@shareable.org> <48345258.9040004@qumranet.com> <20080521170129.GF22488@duo.random> <48345949.4050903@qumranet.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <48345949.4050903@qumranet.com> Reply-To: qemu-devel@nongnu.org List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Blue Swirl , Laurent Vivier , qemu-devel@nongnu.org, Paul Brook On Wed, May 21, 2008 at 08:18:01PM +0300, Avi Kivity wrote: > Yes, that's the reason. Here zerocopy is not the motivation; instead, we > have host-cached pages that are used directly in the guest. So we get both > reduced memory footprint, and host caching. O_DIRECT reduces the memory > footprint but kills host caching. Sure. So MAP_SHARED+remap_file_pages should work just fine to achieve zerocopy I/O. > The scenario is desktop/laptop use. For server use O_DIRECT is clearly > preferred due to much reduced overhead. Well in some ways there's more overhead with O_DIRECT because O_DIRECT has to call get_user_pages and walk pagetables in software before every I/O operation. MAP_SHARED walks them in hardware and it can take advantage of the CPU tlb too. The primary problem of MAP_SHARED isn't the overhead of the operation itself that infact will be lower with MAP_SHARED after the cache is allocated, but the write throttling and garbage collection of the host caches. If you've a 250G guest image, MAP_SHARED will allocate as much as 250G of cache and a cp /dev/zero /dev/hdb in the guest will mark 100% of guest RAM dirty. The mkclean methods and write throttling for MAP_SHARED introduced in reasonably recent kernels can avoid filling 100% of host ram with dirty pages, but it still requires write throttling and it'll pollute the host caches and it can result in large ram allocations in the host having to block before the ram is available, the same way as buffered writes in the host (current KVM default). I think O_DIRECT is the best solution and MAP_SHARED could become a secondary option just for certain guest workloads with light I/O where fairness isn't even a variable worth considering. The cost of garbage collection of the mapped caches on the host isn't trivial, and I don't mean because the nonlinear rmap logic has to scan all pagetables, that's a minor cost compared to shrinking the host caches before try_to_unmap is ever invoked etc... Leaving the host caches purely for the host usage is surely more fair that won't ever lead to one guest doing heavy I/O thrashing the host caches and leading to all other guests and host tasks hanging. If they will hang for a few msec with O_DIRECT it'll be because they're waiting for I/O and the elevator put them on the queue to wait for the disk to return ready. But it won't be because of some write throttling during writes, or in alloc_pages shrink methods that are calling ->writepage on the dirty pages. The other significant advantage of O_DIRECT is that you won't have to call msync to provide journaling. I think O_DIRECT will work best for all usages, and it looks higher priority to me to have than MAP_SHARED. MAP_SHARED will surely result in better benchmarks for certain workloads though, imagine 'dd if=/dev/hda of=/dev/zero iflag=direct bs=1M count=100 run on the guest', it'll read from cache and it will do zero I/O starting from the second run with MAP_SHARED ;). If it was me I'd prefer O_DIRECT by default. For full disclosure you may also want to read this but I strongly disagree with those statements. http://kerneltrap.org/node/7563