From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50105) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WKQvk-00004u-M3 for qemu-devel@nongnu.org; Mon, 03 Mar 2014 06:21:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WKQvc-0007bZ-75 for qemu-devel@nongnu.org; Mon, 03 Mar 2014 06:21:08 -0500 Received: from mx.ipv6.kamp.de ([2a02:248:0:51::16]:58175 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WKQvb-0007bS-SA for qemu-devel@nongnu.org; Mon, 03 Mar 2014 06:21:00 -0500 Message-ID: <53146596.90505@kamp.de> Date: Mon, 03 Mar 2014 12:20:54 +0100 From: Peter Lieven MIME-Version: 1.0 References: <530DBE6C.5030502@kamp.de> <20140226154154.GB20820@stefanha-thinkpad.muc.redhat.com> <530E0FF0.20501@kamp.de> <20140227085711.GC21749@stefanha-thinkpad.redhat.com> <53109E99.3020102@kamp.de> <20140303103843.GB4850@dhcp-200-207.str.redhat.com> In-Reply-To: <20140303103843.GB4850@dhcp-200-207.str.redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] qemu-img convert cache mode for source List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: Stefan Hajnoczi , "qemu-devel@nongnu.org" , Stefan Hajnoczi , Paolo Bonzini On 03.03.2014 11:38, Kevin Wolf wrote: > Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben: >> On 27.02.2014 09:57, Stefan Hajnoczi wrote: >>> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote: >>>> On 26.02.2014 16:41, Stefan Hajnoczi wrote: >>>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote: >>>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source >>>>>> files of a qemu-img convert process if the source is a host_device? >>>>>> >>>>>> Currently the backup of a host device is polluting the page cache. >>>>> Points to consider: >>>>> >>>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening >>>>> the file. A fallback is necessary. >>>>> >>>>> 2. O_DIRECT has no readahead so performance could actually decrease. >>>>> The question is, how important is reahead versus polluting page >>>>> cache? >>>>> >>>>> 3. For raw files it would make sense to tell the kernel that access is >>>>> sequential and data will be used only once. Then we can get the best >>>>> of both worlds (avoid polluting page cache but still get readahead). >>>>> This is done using posix_fadvise(2). >>>>> >>>>> The problem is what to do for image formats. An image file can be >>>>> very fragmented so the readahead might not be a win. Does this mean >>>>> that for image formats we should tell the kernel access will be >>>>> random? >>>>> >>>>> Furthermore, maybe it's best to do readahead inside QEMU so that even >>>>> network protocols (nbd, iscsi, etc) can get good performance. They >>>>> act like O_DIRECT is always on. >>>> your comments are regarding qemu-img convert, right? >>>> How would you implement this? A new open flag because >>>> the fadvise had to goto inside the protocol driver. >>>> >>>> I would start with host_devices first and see how it performs there. >>>> >>>> For qemu-img convert I would issue a FADV_DONTNEED after >>>> a write for the bytes that have been written >>>> (i have tested this with Linux and it seems to work quite well). >>>> >>>> Question is, what is the right paramter for reads? Also FADV_DONTNEED? >>> I think so but this should be justified with benchmark results. >> I ran some benchmarks at found that a FADV_DONTNEED issues after >> a read does not hurt regarding to performance. But it avoids buffers >> increasing while I read from a host_device of raw file. > Okay, sounds reasonable. > >> As for writing it does only work if I issue a fdatasync after each write, but >> this should be equivalent to O_DIRECT. So I would keep the patch >> to support qemu-img convert sources if they are host_device or file. > Doing an fdatasync() is not an option (and not equivalent to O_DIRECT > at all). of course, fdatasync is no option. I just wanted to point out that one should use O_DIRECT if the pagecache should be disable for the output file. > >> Here is a proposal for a patch: >> >> diff --git a/block.c b/block.c >> index 2fd5482..2445433 100644 >> --- a/block.c >> +++ b/block.c >> @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset, >> qemu_aio_wait(); >> } >> } >> + >> +#ifdef POSIX_FADV_DONTNEED >> + if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL && >> + bs->drv->bdrv_fadvise && !is_write) { >> + bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED); >> + } >> +#endif >> + > This #ifdef should be in the raw-posix driver. Please try to keep the > qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and > friends as an implementation detail of block drivers. I had the same idee, but as far as I see the callback to the completes request is handled in block.c. raw-posix uses the aio interface and not coroutines. > >> diff --git a/include/block/block.h b/include/block/block.h >> index 780f48b..a4dcc3c 100644 >> --- a/include/block/block.h >> +++ b/include/block/block.h >> @@ -105,6 +105,9 @@ typedef enum { >> #define BDRV_O_PROTOCOL 0x8000 /* if no block driver is explicitly given: >> select an appropriate protocol driver, >> ignoring the format layer */ >> +#define BDRV_O_SEQUENTIAL 0x10000 /* open device for sequential read/write */ >> + >> + >> >> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH) > Why two additional newlines? typo. this patch war basically an RFC :-) > > BDRV_O_SEQUENTIAL works for me as the external interface. > > Kevin Peter