qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
       [not found] <1401486037-25609-1-git-send-email-pl@kamp.de>
@ 2014-06-04 15:12 ` Stefan Hajnoczi
  2014-06-04 15:31   ` Peter Lieven
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-04 15:12 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, famz, qemu-devel, stefanha, shadowsor, pbonzini

On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> this patch introduces a new flag to indicate that we are going to sequentially
> read from a file and do not plan to reread/reuse the data after it has been read.
> 
> The current use of this flag is to open the source(s) of a qemu-img convert
> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> to advise to the kernel that we are going to read sequentially from the
> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> that there is no advantage keeping the blocks in the buffers.
> 
> Consider the following test case that was created to confirm the behaviour of
> the new flag:
> 
> A 10G logical volume was created and filled with random data.
> Then the logical volume was exported via qemu-img convert to an iscsi target.
> Before the export was started all caches of the linux kernel where dropped.
> 
> Old behavior:
>  - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>    to the end of the conversion. After qemu-img terminated all the buffers were
>    freed by the kernel.
> 
> New behavior with the -N switch:
>  - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>    to the end with some small peaks up to 30 MB during the conversion.

FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.

I'm skeptical of the effort to avoid buffer cache usage using
FADVISE_DONTNEED.  The performance results tell me that less buffer
cache was used but that number doesn't have a direct effect on
application performance.

Let's check GNU coreutils:

  $ cd coreutils
  $ git grep FADVISE_DONTNEED
  gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
  gl/lib/fadvise.h:  FADVISE_DONTNEED,
  $

GNU cp(1) does not care about minimizing impact on buffer cache using
FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
and calls read() (plus uses FIEMAP to check extents for sparseness).

I want to avoid adding code just for the heck of it.  We need a deeper
understanding:

Please drop FADVISE_DONTNEED and compare again to see if it changes the
benchmark.

By the way, did you perform several runs to check the variance of the
running time?  I don't know if the 2 seconds difference were noise or
because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.

> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6586a0c..9768cc4 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>      }
>  #endif
>  
> +#ifdef POSIX_FADV_SEQUENTIAL
> +    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> +        !(bs->open_flags & BDRV_O_NOCACHE)) {
> +        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> +    }
> +#endif

This is only true if the image format is raw.  If the image format on
top of this raw-posix BDS is non-raw then the read pattern may not be
sequential.

Perhaps the extra I/O in that case doesn't matter but conceptually it's
wrong to think that a raw-posix file will have a sequential access
pattern just because bdrv_read() is called sequentially.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
  2014-06-04 15:12 ` [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL Stefan Hajnoczi
@ 2014-06-04 15:31   ` Peter Lieven
  2014-06-05  7:53     ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2014-06-04 15:31 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, famz, qemu-devel, stefanha, shadowsor, pbonzini

Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
>> this patch introduces a new flag to indicate that we are going to sequentially
>> read from a file and do not plan to reread/reuse the data after it has been read.
>>
>> The current use of this flag is to open the source(s) of a qemu-img convert
>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>> to advise to the kernel that we are going to read sequentially from the
>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>> that there is no advantage keeping the blocks in the buffers.
>>
>> Consider the following test case that was created to confirm the behaviour of
>> the new flag:
>>
>> A 10G logical volume was created and filled with random data.
>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>> Before the export was started all caches of the linux kernel where dropped.
>>
>> Old behavior:
>>  - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>>    to the end of the conversion. After qemu-img terminated all the buffers were
>>    freed by the kernel.
>>
>> New behavior with the -N switch:
>>  - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>>    to the end with some small peaks up to 30 MB during the conversion.
> FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
>
> I'm skeptical of the effort to avoid buffer cache usage using
> FADVISE_DONTNEED.  The performance results tell me that less buffer
> cache was used but that number doesn't have a direct effect on
> application performance.
>
> Let's check GNU coreutils:
>
>   $ cd coreutils
>   $ git grep FADVISE_DONTNEED
>   gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
>   gl/lib/fadvise.h:  FADVISE_DONTNEED,
>   $
>
> GNU cp(1) does not care about minimizing impact on buffer cache using
> FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
> and calls read() (plus uses FIEMAP to check extents for sparseness).
>
> I want to avoid adding code just for the heck of it.  We need a deeper
> understanding:
>
> Please drop FADVISE_DONTNEED and compare again to see if it changes the
> benchmark.
>
> By the way, did you perform several runs to check the variance of the
> running time?  I don't know if the 2 seconds difference were noise or
> because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.

There was no effect on the runtime as far as I remember. I ran
some tests, but not a number large enough to filter out the noise.

I created this one because we saw it helps under memory pressure.
Maybe its too specific to add it into mainline qemu, but I wanted to
avoid to have too much individual changes we need to maintain.


>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6586a0c..9768cc4 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>>      }
>>  #endif
>>  
>> +#ifdef POSIX_FADV_SEQUENTIAL
>> +    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>> +        !(bs->open_flags & BDRV_O_NOCACHE)) {
>> +        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>> +    }
>> +#endif
> This is only true if the image format is raw.  If the image format on
> top of this raw-posix BDS is non-raw then the read pattern may not be
> sequential.

You are right, but will the other formats set BDRV_O_SEQUENTIAL?

>
> Perhaps the extra I/O in that case doesn't matter but conceptually it's
> wrong to think that a raw-posix file will have a sequential access
> pattern just because bdrv_read() is called sequentially.

Peter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
  2014-06-04 15:31   ` Peter Lieven
@ 2014-06-05  7:53     ` Stefan Hajnoczi
  2014-06-05  8:09       ` Peter Lieven
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-05  7:53 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, famz, Stefan Hajnoczi, qemu-devel, shadowsor, pbonzini

On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> > On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> >> this patch introduces a new flag to indicate that we are going to sequentially
> >> read from a file and do not plan to reread/reuse the data after it has been read.
> >>
> >> The current use of this flag is to open the source(s) of a qemu-img convert
> >> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> >> to advise to the kernel that we are going to read sequentially from the
> >> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> >> that there is no advantage keeping the blocks in the buffers.
> >>
> >> Consider the following test case that was created to confirm the behaviour of
> >> the new flag:
> >>
> >> A 10G logical volume was created and filled with random data.
> >> Then the logical volume was exported via qemu-img convert to an iscsi target.
> >> Before the export was started all caches of the linux kernel where dropped.
> >>
> >> Old behavior:
> >>  - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
> >>    to the end of the conversion. After qemu-img terminated all the buffers were
> >>    freed by the kernel.
> >>
> >> New behavior with the -N switch:
> >>  - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
> >>    to the end with some small peaks up to 30 MB during the conversion.
> > FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
> >
> > I'm skeptical of the effort to avoid buffer cache usage using
> > FADVISE_DONTNEED.  The performance results tell me that less buffer
> > cache was used but that number doesn't have a direct effect on
> > application performance.
> >
> > Let's check GNU coreutils:
> >
> >   $ cd coreutils
> >   $ git grep FADVISE_DONTNEED
> >   gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
> >   gl/lib/fadvise.h:  FADVISE_DONTNEED,
> >   $
> >
> > GNU cp(1) does not care about minimizing impact on buffer cache using
> > FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
> > and calls read() (plus uses FIEMAP to check extents for sparseness).
> >
> > I want to avoid adding code just for the heck of it.  We need a deeper
> > understanding:
> >
> > Please drop FADVISE_DONTNEED and compare again to see if it changes the
> > benchmark.
> >
> > By the way, did you perform several runs to check the variance of the
> > running time?  I don't know if the 2 seconds difference were noise or
> > because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
> 
> There was no effect on the runtime as far as I remember. I ran
> some tests, but not a number large enough to filter out the noise.
> 
> I created this one because we saw it helps under memory pressure.
> Maybe its too specific to add it into mainline qemu, but I wanted to
> avoid to have too much individual changes we need to maintain.

I'm open to merging it if the improvement can be quantified.  Right now
this might be a workaround for Linux memory management heuristics or it
might not have any effect, I don't know.

> >
> >> diff --git a/block/raw-posix.c b/block/raw-posix.c
> >> index 6586a0c..9768cc4 100644
> >> --- a/block/raw-posix.c
> >> +++ b/block/raw-posix.c
> >> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >>      }
> >>  #endif
> >>  
> >> +#ifdef POSIX_FADV_SEQUENTIAL
> >> +    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> >> +        !(bs->open_flags & BDRV_O_NOCACHE)) {
> >> +        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> >> +    }
> >> +#endif
> > This is only true if the image format is raw.  If the image format on
> > top of this raw-posix BDS is non-raw then the read pattern may not be
> > sequential.
> 
> You are right, but will the other formats set BDRV_O_SEQUENTIAL?

If the user specifies qemu-img convert -N then it will be set for any
image format.

Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
raw_bsd.c format propagate it to bs->file while other formats do not.
Then the user doesn't have to specify a command-line option and we don't
set it for non-raw image formats.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
  2014-06-05  7:53     ` Stefan Hajnoczi
@ 2014-06-05  8:09       ` Peter Lieven
  2014-06-05  8:13         ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2014-06-05  8:09 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, famz, Stefan Hajnoczi, qemu-devel, shadowsor, pbonzini

On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
>> Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
>>> On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
>>>> this patch introduces a new flag to indicate that we are going to sequentially
>>>> read from a file and do not plan to reread/reuse the data after it has been read.
>>>>
>>>> The current use of this flag is to open the source(s) of a qemu-img convert
>>>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>>>> to advise to the kernel that we are going to read sequentially from the
>>>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>>>> that there is no advantage keeping the blocks in the buffers.
>>>>
>>>> Consider the following test case that was created to confirm the behaviour of
>>>> the new flag:
>>>>
>>>> A 10G logical volume was created and filled with random data.
>>>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>>>> Before the export was started all caches of the linux kernel where dropped.
>>>>
>>>> Old behavior:
>>>>   - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>>>>     to the end of the conversion. After qemu-img terminated all the buffers were
>>>>     freed by the kernel.
>>>>
>>>> New behavior with the -N switch:
>>>>   - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>>>>     to the end with some small peaks up to 30 MB during the conversion.
>>> FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
>>>
>>> I'm skeptical of the effort to avoid buffer cache usage using
>>> FADVISE_DONTNEED.  The performance results tell me that less buffer
>>> cache was used but that number doesn't have a direct effect on
>>> application performance.
>>>
>>> Let's check GNU coreutils:
>>>
>>>    $ cd coreutils
>>>    $ git grep FADVISE_DONTNEED
>>>    gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
>>>    gl/lib/fadvise.h:  FADVISE_DONTNEED,
>>>    $
>>>
>>> GNU cp(1) does not care about minimizing impact on buffer cache using
>>> FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
>>> and calls read() (plus uses FIEMAP to check extents for sparseness).
>>>
>>> I want to avoid adding code just for the heck of it.  We need a deeper
>>> understanding:
>>>
>>> Please drop FADVISE_DONTNEED and compare again to see if it changes the
>>> benchmark.
>>>
>>> By the way, did you perform several runs to check the variance of the
>>> running time?  I don't know if the 2 seconds difference were noise or
>>> because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
>> There was no effect on the runtime as far as I remember. I ran
>> some tests, but not a number large enough to filter out the noise.
>>
>> I created this one because we saw it helps under memory pressure.
>> Maybe its too specific to add it into mainline qemu, but I wanted to
>> avoid to have too much individual changes we need to maintain.
> I'm open to merging it if the improvement can be quantified.  Right now
> this might be a workaround for Linux memory management heuristics or it
> might not have any effect, I don't know.

I understand that you are critical about it. I can just say it solved
the problem with the specific setup, kernel version etc.

I found that FADVISE_DONTNEED solves problems also in other applications.
Offtopic: i have an raspberry pi running as tvheadend and observed desync
of the DVBS2 signal at some times. Since I FADV_DONTNEED all written
frames away it runs smothly. I this case the feeing of the page cache was
CPU intensive for the small device and caused the desync.

>
>>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>> index 6586a0c..9768cc4 100644
>>>> --- a/block/raw-posix.c
>>>> +++ b/block/raw-posix.c
>>>> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>>>>       }
>>>>   #endif
>>>>   
>>>> +#ifdef POSIX_FADV_SEQUENTIAL
>>>> +    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>>>> +        !(bs->open_flags & BDRV_O_NOCACHE)) {
>>>> +        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>>>> +    }
>>>> +#endif
>>> This is only true if the image format is raw.  If the image format on
>>> top of this raw-posix BDS is non-raw then the read pattern may not be
>>> sequential.
>> You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> If the user specifies qemu-img convert -N then it will be set for any
> image format.

Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
is not passed on, or is it?

>
> Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> raw_bsd.c format propagate it to bs->file while other formats do not.
> Then the user doesn't have to specify a command-line option and we don't
> set it for non-raw image formats.

This would be an option.

Peter

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
  2014-06-05  8:09       ` Peter Lieven
@ 2014-06-05  8:13         ` Kevin Wolf
  2014-06-05 13:54           ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2014-06-05  8:13 UTC (permalink / raw)
  To: Peter Lieven
  Cc: famz, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, shadowsor,
	pbonzini

Am 05.06.2014 um 10:09 hat Peter Lieven geschrieben:
> On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> >On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> >>Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> >>>On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> >>>>this patch introduces a new flag to indicate that we are going to sequentially
> >>>>read from a file and do not plan to reread/reuse the data after it has been read.
> >>>>
> >>>>The current use of this flag is to open the source(s) of a qemu-img convert
> >>>>process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> >>>>to advise to the kernel that we are going to read sequentially from the
> >>>>file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> >>>>that there is no advantage keeping the blocks in the buffers.
> >>>>
> >>>>Consider the following test case that was created to confirm the behaviour of
> >>>>the new flag:
> >>>>
> >>>>A 10G logical volume was created and filled with random data.
> >>>>Then the logical volume was exported via qemu-img convert to an iscsi target.
> >>>>Before the export was started all caches of the linux kernel where dropped.
> >>>>
> >>>>Old behavior:
> >>>>  - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
> >>>>    to the end of the conversion. After qemu-img terminated all the buffers were
> >>>>    freed by the kernel.
> >>>>
> >>>>New behavior with the -N switch:
> >>>>  - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
> >>>>    to the end with some small peaks up to 30 MB during the conversion.
> >>>FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
> >>>
> >>>I'm skeptical of the effort to avoid buffer cache usage using
> >>>FADVISE_DONTNEED.  The performance results tell me that less buffer
> >>>cache was used but that number doesn't have a direct effect on
> >>>application performance.
> >>>
> >>>Let's check GNU coreutils:
> >>>
> >>>   $ cd coreutils
> >>>   $ git grep FADVISE_DONTNEED
> >>>   gl/lib/fadvise.h:  FADVISE_DONTNEED =   POSIX_FADV_DONTNEED,
> >>>   gl/lib/fadvise.h:  FADVISE_DONTNEED,
> >>>   $
> >>>
> >>>GNU cp(1) does not care about minimizing impact on buffer cache using
> >>>FADVISE_DONTNEED.  It just sets FADVISE_SEQUENTIAL on the source file
> >>>and calls read() (plus uses FIEMAP to check extents for sparseness).
> >>>
> >>>I want to avoid adding code just for the heck of it.  We need a deeper
> >>>understanding:
> >>>
> >>>Please drop FADVISE_DONTNEED and compare again to see if it changes the
> >>>benchmark.
> >>>
> >>>By the way, did you perform several runs to check the variance of the
> >>>running time?  I don't know if the 2 seconds difference were noise or
> >>>because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
> >>There was no effect on the runtime as far as I remember. I ran
> >>some tests, but not a number large enough to filter out the noise.
> >>
> >>I created this one because we saw it helps under memory pressure.
> >>Maybe its too specific to add it into mainline qemu, but I wanted to
> >>avoid to have too much individual changes we need to maintain.
> >I'm open to merging it if the improvement can be quantified.  Right now
> >this might be a workaround for Linux memory management heuristics or it
> >might not have any effect, I don't know.
> 
> I understand that you are critical about it. I can just say it solved
> the problem with the specific setup, kernel version etc.
> 
> I found that FADVISE_DONTNEED solves problems also in other applications.
> Offtopic: i have an raspberry pi running as tvheadend and observed desync
> of the DVBS2 signal at some times. Since I FADV_DONTNEED all written
> frames away it runs smothly. I this case the feeing of the page cache was
> CPU intensive for the small device and caused the desync.
> 
> >
> >>>>diff --git a/block/raw-posix.c b/block/raw-posix.c
> >>>>index 6586a0c..9768cc4 100644
> >>>>--- a/block/raw-posix.c
> >>>>+++ b/block/raw-posix.c
> >>>>@@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >>>>      }
> >>>>  #endif
> >>>>+#ifdef POSIX_FADV_SEQUENTIAL
> >>>>+    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> >>>>+        !(bs->open_flags & BDRV_O_NOCACHE)) {
> >>>>+        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> >>>>+    }
> >>>>+#endif
> >>>This is only true if the image format is raw.  If the image format on
> >>>top of this raw-posix BDS is non-raw then the read pattern may not be
> >>>sequential.
> >>You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> >If the user specifies qemu-img convert -N then it will be set for any
> >image format.
> 
> Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
> is not passed on, or is it?

It isn't qcow2 but block.c that opens bs->file, and unless you
explicitly filter out a flag, bs->file inherits it. (If it didn't do
that, your patch would have no effect for raw either.)

> >Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> >raw_bsd.c format propagate it to bs->file while other formats do not.
> >Then the user doesn't have to specify a command-line option and we don't
> >set it for non-raw image formats.
> 
> This would be an option.

I agree, though it's not quite clear how raw_bsd would do that. Would
that involve a bdrv_reopen() for bs->file?

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
  2014-06-05  8:13         ` Kevin Wolf
@ 2014-06-05 13:54           ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-05 13:54 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: famz, Peter Lieven, qemu-devel, Stefan Hajnoczi, shadowsor,
	pbonzini

On Thu, Jun 05, 2014 at 10:13:04AM +0200, Kevin Wolf wrote:
> Am 05.06.2014 um 10:09 hat Peter Lieven geschrieben:
> > On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> > >On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> > >>Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> > >>>On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> > >>>>diff --git a/block/raw-posix.c b/block/raw-posix.c
> > >>>>index 6586a0c..9768cc4 100644
> > >>>>--- a/block/raw-posix.c
> > >>>>+++ b/block/raw-posix.c
> > >>>>@@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> > >>>>      }
> > >>>>  #endif
> > >>>>+#ifdef POSIX_FADV_SEQUENTIAL
> > >>>>+    if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> > >>>>+        !(bs->open_flags & BDRV_O_NOCACHE)) {
> > >>>>+        posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> > >>>>+    }
> > >>>>+#endif
> > >>>This is only true if the image format is raw.  If the image format on
> > >>>top of this raw-posix BDS is non-raw then the read pattern may not be
> > >>>sequential.
> > >>You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> > >If the user specifies qemu-img convert -N then it will be set for any
> > >image format.
> > 
> > Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
> > is not passed on, or is it?
> 
> It isn't qcow2 but block.c that opens bs->file, and unless you
> explicitly filter out a flag, bs->file inherits it. (If it didn't do
> that, your patch would have no effect for raw either.)

Yes, exactly.  When a raw image file is opened there are actually two
BlockDriverStates:

  raw_bsd ("drive0")
    file: raw-posix (anonymous)

Since your patch affected the buffer cache counter, we know that the
flag was propagated down to raw-posix (by block.c as Kevin explained).

The qcow2 case looks like this:

  qcow2 ("drive0")
    file: raw-posix (anonymous)

> > >Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> > >raw_bsd.c format propagate it to bs->file while other formats do not.
> > >Then the user doesn't have to specify a command-line option and we don't
> > >set it for non-raw image formats.
> > 
> > This would be an option.
> 
> I agree, though it's not quite clear how raw_bsd would do that. Would
> that involve a bdrv_reopen() for bs->file?

One way is to add a BlockDriver bitmask field for options that get
propagated to its children.  Only raw_bsd will include
BDRV_O_SEQUENTIAL.

Stefan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-06-05 13:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <1401486037-25609-1-git-send-email-pl@kamp.de>
2014-06-04 15:12 ` [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL Stefan Hajnoczi
2014-06-04 15:31   ` Peter Lieven
2014-06-05  7:53     ` Stefan Hajnoczi
2014-06-05  8:09       ` Peter Lieven
2014-06-05  8:13         ` Kevin Wolf
2014-06-05 13:54           ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).