From: Peter Lieven <pl@kamp.de>
To: Kevin Wolf <kwolf@redhat.com>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] qemu-img convert cache mode for source
Date: Mon, 03 Mar 2014 12:20:54 +0100 [thread overview]
Message-ID: <53146596.90505@kamp.de> (raw)
In-Reply-To: <20140303103843.GB4850@dhcp-200-207.str.redhat.com>
On 03.03.2014 11:38, Kevin Wolf wrote:
> Am 28.02.2014 um 15:35 hat Peter Lieven geschrieben:
>> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
>>>> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
>>>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>>>> files of a qemu-img convert process if the source is a host_device?
>>>>>>
>>>>>> Currently the backup of a host device is polluting the page cache.
>>>>> Points to consider:
>>>>>
>>>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>>>>> the file. A fallback is necessary.
>>>>>
>>>>> 2. O_DIRECT has no readahead so performance could actually decrease.
>>>>> The question is, how important is reahead versus polluting page
>>>>> cache?
>>>>>
>>>>> 3. For raw files it would make sense to tell the kernel that access is
>>>>> sequential and data will be used only once. Then we can get the best
>>>>> of both worlds (avoid polluting page cache but still get readahead).
>>>>> This is done using posix_fadvise(2).
>>>>>
>>>>> The problem is what to do for image formats. An image file can be
>>>>> very fragmented so the readahead might not be a win. Does this mean
>>>>> that for image formats we should tell the kernel access will be
>>>>> random?
>>>>>
>>>>> Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>>> network protocols (nbd, iscsi, etc) can get good performance. They
>>>>> act like O_DIRECT is always on.
>>>> your comments are regarding qemu-img convert, right?
>>>> How would you implement this? A new open flag because
>>>> the fadvise had to goto inside the protocol driver.
>>>>
>>>> I would start with host_devices first and see how it performs there.
>>>>
>>>> For qemu-img convert I would issue a FADV_DONTNEED after
>>>> a write for the bytes that have been written
>>>> (i have tested this with Linux and it seems to work quite well).
>>>>
>>>> Question is, what is the right paramter for reads? Also FADV_DONTNEED?
>>> I think so but this should be justified with benchmark results.
>> I ran some benchmarks at found that a FADV_DONTNEED issues after
>> a read does not hurt regarding to performance. But it avoids buffers
>> increasing while I read from a host_device of raw file.
> Okay, sounds reasonable.
>
>> As for writing it does only work if I issue a fdatasync after each write, but
>> this should be equivalent to O_DIRECT. So I would keep the patch
>> to support qemu-img convert sources if they are host_device or file.
> Doing an fdatasync() is not an option (and not equivalent to O_DIRECT
> at all).
of course, fdatasync is no option. I just wanted to point out that one
should use O_DIRECT if the pagecache should be disable for the output
file.
>
>> Here is a proposal for a patch:
>>
>> diff --git a/block.c b/block.c
>> index 2fd5482..2445433 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2626,6 +2626,14 @@ static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
>> qemu_aio_wait();
>> }
>> }
>> +
>> +#ifdef POSIX_FADV_DONTNEED
>> + if (!rwco.ret && bs->open_flags & BDRV_O_SEQUENTIAL &&
>> + bs->drv->bdrv_fadvise && !is_write) {
>> + bs->drv->bdrv_fadvise(bs, offset, qiov->size, POSIX_FADV_DONTNEED);
>> + }
>> +#endif
>> +
> This #ifdef should be in the raw-posix driver. Please try to keep the
> qemu interface backend agnostic and leave POSIX_FADV_DONTNEED and
> friends as an implementation detail of block drivers.
I had the same idee, but as far as I see the callback to the completes
request is handled in block.c. raw-posix uses the aio interface and
not coroutines.
>
>> diff --git a/include/block/block.h b/include/block/block.h
>> index 780f48b..a4dcc3c 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -105,6 +105,9 @@ typedef enum {
>> #define BDRV_O_PROTOCOL 0x8000 /* if no block driver is explicitly given:
>> select an appropriate protocol driver,
>> ignoring the format layer */
>> +#define BDRV_O_SEQUENTIAL 0x10000 /* open device for sequential read/write */
>> +
>> +
>>
>> #define BDRV_O_CACHE_MASK (BDRV_O_NOCACHE | BDRV_O_CACHE_WB | BDRV_O_NO_FLUSH)
> Why two additional newlines?
typo. this patch war basically an RFC :-)
>
> BDRV_O_SEQUENTIAL works for me as the external interface.
>
> Kevin
Peter
next prev parent reply other threads:[~2014-03-03 11:21 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-26 10:14 [Qemu-devel] qemu-img convert cache mode for source Peter Lieven
2014-02-26 15:41 ` Stefan Hajnoczi
2014-02-26 15:54 ` Eric Blake
2014-02-26 16:01 ` Peter Lieven
2014-02-27 8:57 ` Stefan Hajnoczi
2014-02-28 14:35 ` Peter Lieven
2014-03-03 10:38 ` Kevin Wolf
2014-03-03 11:20 ` Peter Lieven [this message]
2014-03-03 12:59 ` Paolo Bonzini
2014-03-03 13:07 ` Peter Lieven
2014-03-03 12:03 ` Stefan Hajnoczi
2014-03-03 12:20 ` Peter Lieven
2014-03-04 9:24 ` Stefan Hajnoczi
2014-03-05 14:44 ` Peter Lieven
2014-03-05 15:20 ` Marcus
2014-03-05 15:53 ` Peter Lieven
2014-03-05 17:38 ` Marcus
2014-03-05 18:09 ` Peter Lieven
2014-03-06 10:41 ` Stefan Hajnoczi
2014-03-06 18:58 ` Peter Lieven
2014-03-06 10:29 ` Stefan Hajnoczi
2014-03-06 11:29 ` Paolo Bonzini
2014-03-06 14:19 ` Liguori, Anthony
2014-03-06 18:07 ` Peter Lieven
2014-03-07 8:03 ` Peter Lieven
2014-02-27 1:10 ` Fam Zheng
2014-02-27 11:07 ` Kevin Wolf
2014-02-27 16:12 ` Peter Lieven
2014-03-03 10:40 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=53146596.90505@kamp.de \
--to=pl@kamp.de \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@gmail.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).