From: Peter Lieven <pl@kamp.de>
To: Fam Zheng <famz@redhat.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com,
shadowsor@gmail.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCHv2] block: introduce BDRV_O_SEQUENTIAL
Date: Mon, 24 Mar 2014 15:02:55 +0100 [thread overview]
Message-ID: <53303B0F.2010105@kamp.de> (raw)
In-Reply-To: <20140324091839.GB32359@T430.nay.redhat.com>
Am 24.03.2014 10:18, schrieb Fam Zheng:
> On Fri, 03/21 12:49, Peter Lieven wrote:
>> this patch introduces a new flag to indicate that we are going to sequentially
>> read from a file and do not plan to reread/reuse the data after it has been read.
>>
>> The current use of this flag is to open the source(s) of a qemu-img convert
>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>> to advise to the kernel that we are going to read sequentially from the
>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>> that there is no advantage keeping the blocks in the buffers.
>>
>> Consider the following test case that was created to confirm the behaviour of
>> the new flag:
>>
>> A 10G logical volume was created and filled with random data.
>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>> Before the export was started all caches of the linux kernel where dropped.
>>
>> Old behavior:
>> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>> to the end of the conversion. After qemu-img terminated all the buffers were
>> freed by the kernel.
>>
>> New behavior with the -N switch:
>> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>> to the end with some small peaks up to 30 MB durine the conversion.
> s/durine/during/
>
> The patch looks OK, and I have no objection with this flag. But I'm still
> curious about the use case: Host page cache growing is not the real problem,
> I'm not fully persudaded by commit message because I still don't know _what_
> useful cache would be dropped (if you don't empty the kernel cache before
> starting). I don't think all 9.67 GB buffer will be filled by data from this
> volume, so the question is how to measure the real, effective performance
> impact?
I ran an idle machine and indeed all the 9.67GB are buffered from the
qemu-img process. The problem is that the growing buffers eventually
disposses other pages from the cache. As for sharing if you have a
drive of a vServer on a lvm logical volume and take a snapshot and
you fadvise data from the snapshot I think that shared pages between
the logical volume and its snapshot are dropped. However, this all depends
on how it is handled internally. Maybe Markus has more evidence. I personally
would always disable the cache entirely for my vServers harddrives.
In general I personally am totally happy with having a switch. Just in case
there are some side effects we don't see at this point.
>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> v1->v2: - added test example to commit msg
>> - added -N knob to qemu-img
>>
>> block/raw-posix.c | 14 ++++++++++++++
>> include/block/block.h | 1 +
>> qemu-img-cmds.hx | 4 ++--
>> qemu-img.c | 16 +++++++++++++---
>> qemu-img.texi | 9 ++++++++-
>> 5 files changed, 38 insertions(+), 6 deletions(-)
>>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 1688e16..08f7209 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -444,6 +444,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>> }
>> #endif
>>
>> +#ifdef POSIX_FADV_SEQUENTIAL
>> + if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>> + !(bs->open_flags & BDRV_O_NOCACHE)) {
>> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>> + }
>> +#endif
>> +
>> ret = 0;
>> fail:
>> qemu_opts_del(opts);
>> @@ -913,6 +920,13 @@ static int aio_worker(void *arg)
>> ret = aiocb->aio_nbytes;
>> }
>> if (ret == aiocb->aio_nbytes) {
>> +#ifdef POSIX_FADV_DONTNEED
>> + if (aiocb->bs->open_flags & BDRV_O_SEQUENTIAL &&
>> + !(aiocb->bs->open_flags & BDRV_O_NOCACHE)) {
>> + posix_fadvise(aiocb->aio_fildes, aiocb->aio_offset,
>> + aiocb->aio_nbytes, POSIX_FADV_DONTNEED);
>> + }
>> +#endif
> I'm not familiar with posix_fadvise, can we do this on the whole file in once
> in raw_open_common like POSIX_FADV_SEQUENTIAL?
We could do it, but the usage I have seen it call it on the pages you
actually want to have dropped. At least this seems to work good.
Peter
prev parent reply other threads:[~2014-03-24 14:03 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-21 11:49 [Qemu-devel] [PATCHv2] block: introduce BDRV_O_SEQUENTIAL Peter Lieven
2014-03-21 12:06 ` Paolo Bonzini
2014-03-21 12:42 ` Peter Lieven
2014-03-28 10:02 ` Peter Lieven
2014-03-24 9:18 ` Fam Zheng
2014-03-24 14:02 ` Peter Lieven [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53303B0F.2010105@kamp.de \
--to=pl@kamp.de \
--cc=famz@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=shadowsor@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.