From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36849) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WS5T4-0003Y0-4Q for qemu-devel@nongnu.org; Mon, 24 Mar 2014 10:03:16 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WS5Sy-0005Om-98 for qemu-devel@nongnu.org; Mon, 24 Mar 2014 10:03:10 -0400 Received: from mx-v6.kamp.de ([2a02:248:0:51::16]:51959 helo=mx01.kamp.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WS5Sx-0005OP-Rh for qemu-devel@nongnu.org; Mon, 24 Mar 2014 10:03:04 -0400 Message-ID: <53303B0F.2010105@kamp.de> Date: Mon, 24 Mar 2014 15:02:55 +0100 From: Peter Lieven MIME-Version: 1.0 References: <1395402552-22366-1-git-send-email-pl@kamp.de> <20140324091839.GB32359@T430.nay.redhat.com> In-Reply-To: <20140324091839.GB32359@T430.nay.redhat.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCHv2] block: introduce BDRV_O_SEQUENTIAL List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, shadowsor@gmail.com, pbonzini@redhat.com Am 24.03.2014 10:18, schrieb Fam Zheng: > On Fri, 03/21 12:49, Peter Lieven wrote: >> this patch introduces a new flag to indicate that we are going to sequentially >> read from a file and do not plan to reread/reuse the data after it has been read. >> >> The current use of this flag is to open the source(s) of a qemu-img convert >> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized >> to advise to the kernel that we are going to read sequentially from the >> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate >> that there is no advantage keeping the blocks in the buffers. >> >> Consider the following test case that was created to confirm the behaviour of >> the new flag: >> >> A 10G logical volume was created and filled with random data. >> Then the logical volume was exported via qemu-img convert to an iscsi target. >> Before the export was started all caches of the linux kernel where dropped. >> >> Old behavior: >> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close >> to the end of the conversion. After qemu-img terminated all the buffers were >> freed by the kernel. >> >> New behavior with the -N switch: >> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close >> to the end with some small peaks up to 30 MB durine the conversion. > s/durine/during/ > > The patch looks OK, and I have no objection with this flag. But I'm still > curious about the use case: Host page cache growing is not the real problem, > I'm not fully persudaded by commit message because I still don't know _what_ > useful cache would be dropped (if you don't empty the kernel cache before > starting). I don't think all 9.67 GB buffer will be filled by data from this > volume, so the question is how to measure the real, effective performance > impact? I ran an idle machine and indeed all the 9.67GB are buffered from the qemu-img process. The problem is that the growing buffers eventually disposses other pages from the cache. As for sharing if you have a drive of a vServer on a lvm logical volume and take a snapshot and you fadvise data from the snapshot I think that shared pages between the logical volume and its snapshot are dropped. However, this all depends on how it is handled internally. Maybe Markus has more evidence. I personally would always disable the cache entirely for my vServers harddrives. In general I personally am totally happy with having a switch. Just in case there are some side effects we don't see at this point. > >> Signed-off-by: Peter Lieven >> --- >> v1->v2: - added test example to commit msg >> - added -N knob to qemu-img >> >> block/raw-posix.c | 14 ++++++++++++++ >> include/block/block.h | 1 + >> qemu-img-cmds.hx | 4 ++-- >> qemu-img.c | 16 +++++++++++++--- >> qemu-img.texi | 9 ++++++++- >> 5 files changed, 38 insertions(+), 6 deletions(-) >> >> diff --git a/block/raw-posix.c b/block/raw-posix.c >> index 1688e16..08f7209 100644 >> --- a/block/raw-posix.c >> +++ b/block/raw-posix.c >> @@ -444,6 +444,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options, >> } >> #endif >> >> +#ifdef POSIX_FADV_SEQUENTIAL >> + if (bs->open_flags & BDRV_O_SEQUENTIAL && >> + !(bs->open_flags & BDRV_O_NOCACHE)) { >> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL); >> + } >> +#endif >> + >> ret = 0; >> fail: >> qemu_opts_del(opts); >> @@ -913,6 +920,13 @@ static int aio_worker(void *arg) >> ret = aiocb->aio_nbytes; >> } >> if (ret == aiocb->aio_nbytes) { >> +#ifdef POSIX_FADV_DONTNEED >> + if (aiocb->bs->open_flags & BDRV_O_SEQUENTIAL && >> + !(aiocb->bs->open_flags & BDRV_O_NOCACHE)) { >> + posix_fadvise(aiocb->aio_fildes, aiocb->aio_offset, >> + aiocb->aio_nbytes, POSIX_FADV_DONTNEED); >> + } >> +#endif > I'm not familiar with posix_fadvise, can we do this on the whole file in once > in raw_open_common like POSIX_FADV_SEQUENTIAL? We could do it, but the usage I have seen it call it on the pages you actually want to have dropped. At least this seems to work good. Peter