qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] qemu-img convert cache mode for source
Date: Mon, 03 Mar 2014 13:20:21 +0100	[thread overview]
Message-ID: <53147385.2090906@kamp.de> (raw)
In-Reply-To: <20140303120349.GA21055@stefanha-thinkpad.redhat.com>

On 03.03.2014 13:03, Stefan Hajnoczi wrote:
> On Fri, Feb 28, 2014 at 03:35:05PM +0100, Peter Lieven wrote:
>> On 27.02.2014 09:57, Stefan Hajnoczi wrote:
>>> On Wed, Feb 26, 2014 at 05:01:52PM +0100, Peter Lieven wrote:
>>>> On 26.02.2014 16:41, Stefan Hajnoczi wrote:
>>>>> On Wed, Feb 26, 2014 at 11:14:04AM +0100, Peter Lieven wrote:
>>>>>> I was wondering if it would be a good idea to set the O_DIRECT mode for the source
>>>>>> files of a qemu-img convert process if the source is a host_device?
>>>>>>
>>>>>> Currently the backup of a host device is polluting the page cache.
>>>>> Points to consider:
>>>>>
>>>>> 1. O_DIRECT does not work on Linux tmpfs, you get EINVAL when opening
>>>>>     the file.  A fallback is necessary.
>>>>>
>>>>> 2. O_DIRECT has no readahead so performance could actually decrease.
>>>>>     The question is, how important is reahead versus polluting page
>>>>>     cache?
>>>>>
>>>>> 3. For raw files it would make sense to tell the kernel that access is
>>>>>     sequential and data will be used only once.  Then we can get the best
>>>>>     of both worlds (avoid polluting page cache but still get readahead).
>>>>>     This is done using posix_fadvise(2).
>>>>>
>>>>>     The problem is what to do for image formats.  An image file can be
>>>>>     very fragmented so the readahead might not be a win.  Does this mean
>>>>>     that for image formats we should tell the kernel access will be
>>>>>     random?
>>>>>
>>>>>     Furthermore, maybe it's best to do readahead inside QEMU so that even
>>>>>     network protocols (nbd, iscsi, etc) can get good performance.  They
>>>>>     act like O_DIRECT is always on.
>>>> your comments are regarding qemu-img convert, right?
>>>> How would you implement this? A new open flag because
>>>> the fadvise had to goto inside the protocol driver.
>>>>
>>>> I would start with host_devices first and see how it performs there.
>>>>
>>>> For qemu-img convert I would issue a FADV_DONTNEED after
>>>> a write for the bytes that have been written
>>>> (i have tested this with Linux and it seems to work quite well).
>>>>
>>>> Question is, what is the right paramter for reads? Also FADV_DONTNEED?
>>> I think so but this should be justified with benchmark results.
>> I ran some benchmarks at found that a FADV_DONTNEED issues after
>> a read does not hurt regarding to performance. But it avoids buffers
>> increasing while I read from a host_device of raw file.
> It was mentioned in this thread that a sequential shouldn't promote the
> pages anyway - they should be dropped by the kernel if there is memory
> pressure.
Yes, but this costs cpu time in spikes and the page cache is polluted
with data that is definetely not needed.
>
> So what is the actual performance problem you are trying to solve and
> what benchmark output are you getting when you compare with
> FADV_DONTNEED against without FADV_DONTNEED?
I found the performance to be identical. For the problem see below please.
>
> I think there's a danger that the discussion will go around in circles.
> Please post the performance results that kicked off this whole effort
> and let's focus on the data.  That way it's much easier to evaluate what
> changes to QEMU are a win and which are not necessary.
I found that under memory pressure situations the increasing buffers
leads to vserver memory being swapped out. This caused trouble
especially in overcommit scenarios (where all memory is backed by
swap).
>
>> As for writing it does only work if I issue a fdatasync after each write, but
>> this should be equivalent to O_DIRECT. So I would keep the patch
>> to support qemu-img convert sources if they are host_device or file.
> fdatasync(2) is much more heavy-weight than writing out a pages because
> it sends a disk write cache flush command and waits for it to complete.
as mentioned before for the write path the

FADV_DONTNEED stuff doesn't work.

Peter

  reply	other threads:[~2014-03-03 12:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-02-26 10:14 [Qemu-devel] qemu-img convert cache mode for source Peter Lieven
2014-02-26 15:41 ` Stefan Hajnoczi
2014-02-26 15:54   ` Eric Blake
2014-02-26 16:01   ` Peter Lieven
2014-02-27  8:57     ` Stefan Hajnoczi
2014-02-28 14:35       ` Peter Lieven
2014-03-03 10:38         ` Kevin Wolf
2014-03-03 11:20           ` Peter Lieven
2014-03-03 12:59             ` Paolo Bonzini
2014-03-03 13:07               ` Peter Lieven
2014-03-03 12:03         ` Stefan Hajnoczi
2014-03-03 12:20           ` Peter Lieven [this message]
2014-03-04  9:24             ` Stefan Hajnoczi
2014-03-05 14:44               ` Peter Lieven
2014-03-05 15:20                 ` Marcus
2014-03-05 15:53                   ` Peter Lieven
2014-03-05 17:38                     ` Marcus
2014-03-05 18:09                       ` Peter Lieven
2014-03-06 10:41                         ` Stefan Hajnoczi
2014-03-06 18:58                           ` Peter Lieven
2014-03-06 10:29                 ` Stefan Hajnoczi
2014-03-06 11:29                   ` Paolo Bonzini
2014-03-06 14:19                     ` Liguori, Anthony
2014-03-06 18:07                       ` Peter Lieven
2014-03-07  8:03                       ` Peter Lieven
2014-02-27  1:10   ` Fam Zheng
2014-02-27 11:07     ` Kevin Wolf
2014-02-27 16:12       ` Peter Lieven
2014-03-03 10:40         ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53147385.2090906@kamp.de \
    --to=pl@kamp.de \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).