All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org,
	stefanha@redhat.com, shadowsor@gmail.com, pbonzini@redhat.com
Subject: Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
Date: Wed, 04 Jun 2014 17:31:48 +0200	[thread overview]
Message-ID: <538F3BE4.9020602@kamp.de> (raw)
In-Reply-To: <20140604151228.GL11073@stefanha-thinkpad.redhat.com>

Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
>> this patch introduces a new flag to indicate that we are going to sequentially
>> read from a file and do not plan to reread/reuse the data after it has been read.
>>
>> The current use of this flag is to open the source(s) of a qemu-img convert
>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>> to advise to the kernel that we are going to read sequentially from the
>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>> that there is no advantage keeping the blocks in the buffers.
>>
>> Consider the following test case that was created to confirm the behaviour of
>> the new flag:
>>
>> A 10G logical volume was created and filled with random data.
>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>> Before the export was started all caches of the linux kernel where dropped.
>>
>> Old behavior:
>>  - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>>    to the end of the conversion. After qemu-img terminated all the buffers were
>>    freed by the kernel.
>>
>> New behavior with the -N switch:
>>  - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>>    to the end with some small peaks up to 30 MB during the conversion.
> FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
>
> I'm skeptical of the effort to avoid buffer cache usage using
> FADVISE_DONTNEED.  The performance results tell me that less buffer
> cache was used but that number doesn't have a direct effect on
> application performance.
>
> Let's check GNU coreutils:
>
>   $ cd coreutils
>   $ git grep FADVISE_DONTNEED
>   gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
>   gl/lib/fadvise.h:  FADVISE_DONTNEED,
>   $
>
> GNU cp(1) does not care about minimizing impact on buffer cache using
> FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
> and calls read() (plus uses FIEMAP to check extents for sparseness).
>
> I want to avoid adding code just for the heck of it.  We need a deeper
> understanding:
>
> Please drop FADVISE_DONTNEED and compare again to see if it changes the
> benchmark.
>
> By the way, did you perform several runs to check the variance of the
> running time?  I don't know if the 2 seconds difference were noise or
> because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.

There was no effect on the runtime as far as I remember. I ran
some tests, but not a number large enough to filter out the noise.

I created this one because we saw it helps under memory pressure.
Maybe its too specific to add it into mainline qemu, but I wanted to
avoid to have too much individual changes we need to maintain.


>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6586a0c..9768cc4 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>>      }
>>  #endif
>>  
>> +#ifdef POSIX_FADV_SEQUENTIAL
>> +    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>> +        !(bs->open_flags & BDRV_O_NOCACHE)) {
>> +        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>> +    }
>> +#endif
> This is only true if the image format is raw.  If the image format on
> top of this raw-posix BDS is non-raw then the read pattern may not be
> sequential.

You are right, but will the other formats set BDRV_O_SEQUENTIAL?

>
> Perhaps the extra I/O in that case doesn't matter but conceptually it's
> wrong to think that a raw-posix file will have a sequential access
> pattern just because bdrv_read() is called sequentially.

Peter

  reply	other threads:[~2014-06-04 15:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1401486037-25609-1-git-send-email-pl@kamp.de>
2014-06-04 15:12 ` [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL Stefan Hajnoczi
2014-06-04 15:31   ` Peter Lieven [this message]
2014-06-05  7:53     ` Stefan Hajnoczi
2014-06-05  8:09       ` Peter Lieven
2014-06-05  8:13         ` Kevin Wolf
2014-06-05 13:54           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=538F3BE4.9020602@kamp.de \
    --to=pl@kamp.de \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shadowsor@gmail.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.