* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
[not found] <1401486037-25609-1-git-send-email-pl@kamp.de>
@ 2014-06-04 15:12 ` Stefan Hajnoczi
2014-06-04 15:31 ` Peter Lieven
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-04 15:12 UTC (permalink / raw)
To: Peter Lieven; +Cc: kwolf, famz, qemu-devel, stefanha, shadowsor, pbonzini
On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> this patch introduces a new flag to indicate that we are going to sequentially
> read from a file and do not plan to reread/reuse the data after it has been read.
>
> The current use of this flag is to open the source(s) of a qemu-img convert
> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> to advise to the kernel that we are going to read sequentially from the
> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> that there is no advantage keeping the blocks in the buffers.
>
> Consider the following test case that was created to confirm the behaviour of
> the new flag:
>
> A 10G logical volume was created and filled with random data.
> Then the logical volume was exported via qemu-img convert to an iscsi target.
> Before the export was started all caches of the linux kernel where dropped.
>
> Old behavior:
> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
> to the end of the conversion. After qemu-img terminated all the buffers were
> freed by the kernel.
>
> New behavior with the -N switch:
> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
> to the end with some small peaks up to 30 MB during the conversion.
FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
I'm skeptical of the effort to avoid buffer cache usage using
FADVISE_DONTNEED. The performance results tell me that less buffer
cache was used but that number doesn't have a direct effect on
application performance.
Let's check GNU coreutils:
$ cd coreutils
$ git grep FADVISE_DONTNEED
gl/lib/fadvise.h: FADVISE_DONTNEED = POSIX_FADV_DONTNEED,
gl/lib/fadvise.h: FADVISE_DONTNEED,
$
GNU cp(1) does not care about minimizing impact on buffer cache using
FADVISE_DONTNEED. It just sets FADVISE_SEQUENTIAL on the source file
and calls read() (plus uses FIEMAP to check extents for sparseness).
I want to avoid adding code just for the heck of it. We need a deeper
understanding:
Please drop FADVISE_DONTNEED and compare again to see if it changes the
benchmark.
By the way, did you perform several runs to check the variance of the
running time? I don't know if the 2 seconds difference were noise or
because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index 6586a0c..9768cc4 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> }
> #endif
>
> +#ifdef POSIX_FADV_SEQUENTIAL
> + if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> + !(bs->open_flags & BDRV_O_NOCACHE)) {
> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> + }
> +#endif
This is only true if the image format is raw. If the image format on
top of this raw-posix BDS is non-raw then the read pattern may not be
sequential.
Perhaps the extra I/O in that case doesn't matter but conceptually it's
wrong to think that a raw-posix file will have a sequential access
pattern just because bdrv_read() is called sequentially.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
2014-06-04 15:12 ` [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL Stefan Hajnoczi
@ 2014-06-04 15:31 ` Peter Lieven
2014-06-05 7:53 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2014-06-04 15:31 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, famz, qemu-devel, stefanha, shadowsor, pbonzini
Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
>> this patch introduces a new flag to indicate that we are going to sequentially
>> read from a file and do not plan to reread/reuse the data after it has been read.
>>
>> The current use of this flag is to open the source(s) of a qemu-img convert
>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>> to advise to the kernel that we are going to read sequentially from the
>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>> that there is no advantage keeping the blocks in the buffers.
>>
>> Consider the following test case that was created to confirm the behaviour of
>> the new flag:
>>
>> A 10G logical volume was created and filled with random data.
>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>> Before the export was started all caches of the linux kernel where dropped.
>>
>> Old behavior:
>> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>> to the end of the conversion. After qemu-img terminated all the buffers were
>> freed by the kernel.
>>
>> New behavior with the -N switch:
>> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>> to the end with some small peaks up to 30 MB during the conversion.
> FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
>
> I'm skeptical of the effort to avoid buffer cache usage using
> FADVISE_DONTNEED. The performance results tell me that less buffer
> cache was used but that number doesn't have a direct effect on
> application performance.
>
> Let's check GNU coreutils:
>
> $ cd coreutils
> $ git grep FADVISE_DONTNEED
> gl/lib/fadvise.h: FADVISE_DONTNEED = POSIX_FADV_DONTNEED,
> gl/lib/fadvise.h: FADVISE_DONTNEED,
> $
>
> GNU cp(1) does not care about minimizing impact on buffer cache using
> FADVISE_DONTNEED. It just sets FADVISE_SEQUENTIAL on the source file
> and calls read() (plus uses FIEMAP to check extents for sparseness).
>
> I want to avoid adding code just for the heck of it. We need a deeper
> understanding:
>
> Please drop FADVISE_DONTNEED and compare again to see if it changes the
> benchmark.
>
> By the way, did you perform several runs to check the variance of the
> running time? I don't know if the 2 seconds difference were noise or
> because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
There was no effect on the runtime as far as I remember. I ran
some tests, but not a number large enough to filter out the noise.
I created this one because we saw it helps under memory pressure.
Maybe its too specific to add it into mainline qemu, but I wanted to
avoid to have too much individual changes we need to maintain.
>
>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>> index 6586a0c..9768cc4 100644
>> --- a/block/raw-posix.c
>> +++ b/block/raw-posix.c
>> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>> }
>> #endif
>>
>> +#ifdef POSIX_FADV_SEQUENTIAL
>> + if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>> + !(bs->open_flags & BDRV_O_NOCACHE)) {
>> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>> + }
>> +#endif
> This is only true if the image format is raw. If the image format on
> top of this raw-posix BDS is non-raw then the read pattern may not be
> sequential.
You are right, but will the other formats set BDRV_O_SEQUENTIAL?
>
> Perhaps the extra I/O in that case doesn't matter but conceptually it's
> wrong to think that a raw-posix file will have a sequential access
> pattern just because bdrv_read() is called sequentially.
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
2014-06-04 15:31 ` Peter Lieven
@ 2014-06-05 7:53 ` Stefan Hajnoczi
2014-06-05 8:09 ` Peter Lieven
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-05 7:53 UTC (permalink / raw)
To: Peter Lieven
Cc: kwolf, famz, Stefan Hajnoczi, qemu-devel, shadowsor, pbonzini
On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> > On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> >> this patch introduces a new flag to indicate that we are going to sequentially
> >> read from a file and do not plan to reread/reuse the data after it has been read.
> >>
> >> The current use of this flag is to open the source(s) of a qemu-img convert
> >> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> >> to advise to the kernel that we are going to read sequentially from the
> >> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> >> that there is no advantage keeping the blocks in the buffers.
> >>
> >> Consider the following test case that was created to confirm the behaviour of
> >> the new flag:
> >>
> >> A 10G logical volume was created and filled with random data.
> >> Then the logical volume was exported via qemu-img convert to an iscsi target.
> >> Before the export was started all caches of the linux kernel where dropped.
> >>
> >> Old behavior:
> >> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
> >> to the end of the conversion. After qemu-img terminated all the buffers were
> >> freed by the kernel.
> >>
> >> New behavior with the -N switch:
> >> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
> >> to the end with some small peaks up to 30 MB during the conversion.
> > FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
> >
> > I'm skeptical of the effort to avoid buffer cache usage using
> > FADVISE_DONTNEED. The performance results tell me that less buffer
> > cache was used but that number doesn't have a direct effect on
> > application performance.
> >
> > Let's check GNU coreutils:
> >
> > $ cd coreutils
> > $ git grep FADVISE_DONTNEED
> > gl/lib/fadvise.h: FADVISE_DONTNEED = POSIX_FADV_DONTNEED,
> > gl/lib/fadvise.h: FADVISE_DONTNEED,
> > $
> >
> > GNU cp(1) does not care about minimizing impact on buffer cache using
> > FADVISE_DONTNEED. It just sets FADVISE_SEQUENTIAL on the source file
> > and calls read() (plus uses FIEMAP to check extents for sparseness).
> >
> > I want to avoid adding code just for the heck of it. We need a deeper
> > understanding:
> >
> > Please drop FADVISE_DONTNEED and compare again to see if it changes the
> > benchmark.
> >
> > By the way, did you perform several runs to check the variance of the
> > running time? I don't know if the 2 seconds difference were noise or
> > because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
>
> There was no effect on the runtime as far as I remember. I ran
> some tests, but not a number large enough to filter out the noise.
>
> I created this one because we saw it helps under memory pressure.
> Maybe its too specific to add it into mainline qemu, but I wanted to
> avoid to have too much individual changes we need to maintain.
I'm open to merging it if the improvement can be quantified. Right now
this might be a workaround for Linux memory management heuristics or it
might not have any effect, I don't know.
> >
> >> diff --git a/block/raw-posix.c b/block/raw-posix.c
> >> index 6586a0c..9768cc4 100644
> >> --- a/block/raw-posix.c
> >> +++ b/block/raw-posix.c
> >> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >> }
> >> #endif
> >>
> >> +#ifdef POSIX_FADV_SEQUENTIAL
> >> + if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> >> + !(bs->open_flags & BDRV_O_NOCACHE)) {
> >> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> >> + }
> >> +#endif
> > This is only true if the image format is raw. If the image format on
> > top of this raw-posix BDS is non-raw then the read pattern may not be
> > sequential.
>
> You are right, but will the other formats set BDRV_O_SEQUENTIAL?
If the user specifies qemu-img convert -N then it will be set for any
image format.
Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
raw_bsd.c format propagate it to bs->file while other formats do not.
Then the user doesn't have to specify a command-line option and we don't
set it for non-raw image formats.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
2014-06-05 7:53 ` Stefan Hajnoczi
@ 2014-06-05 8:09 ` Peter Lieven
2014-06-05 8:13 ` Kevin Wolf
0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2014-06-05 8:09 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: kwolf, famz, Stefan Hajnoczi, qemu-devel, shadowsor, pbonzini
On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
>> Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
>>> On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
>>>> this patch introduces a new flag to indicate that we are going to sequentially
>>>> read from a file and do not plan to reread/reuse the data after it has been read.
>>>>
>>>> The current use of this flag is to open the source(s) of a qemu-img convert
>>>> process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
>>>> to advise to the kernel that we are going to read sequentially from the
>>>> file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
>>>> that there is no advantage keeping the blocks in the buffers.
>>>>
>>>> Consider the following test case that was created to confirm the behaviour of
>>>> the new flag:
>>>>
>>>> A 10G logical volume was created and filled with random data.
>>>> Then the logical volume was exported via qemu-img convert to an iscsi target.
>>>> Before the export was started all caches of the linux kernel where dropped.
>>>>
>>>> Old behavior:
>>>> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
>>>> to the end of the conversion. After qemu-img terminated all the buffers were
>>>> freed by the kernel.
>>>>
>>>> New behavior with the -N switch:
>>>> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
>>>> to the end with some small peaks up to 30 MB during the conversion.
>>> FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
>>>
>>> I'm skeptical of the effort to avoid buffer cache usage using
>>> FADVISE_DONTNEED. The performance results tell me that less buffer
>>> cache was used but that number doesn't have a direct effect on
>>> application performance.
>>>
>>> Let's check GNU coreutils:
>>>
>>> $ cd coreutils
>>> $ git grep FADVISE_DONTNEED
>>> gl/lib/fadvise.h: FADVISE_DONTNEED = POSIX_FADV_DONTNEED,
>>> gl/lib/fadvise.h: FADVISE_DONTNEED,
>>> $
>>>
>>> GNU cp(1) does not care about minimizing impact on buffer cache using
>>> FADVISE_DONTNEED. It just sets FADVISE_SEQUENTIAL on the source file
>>> and calls read() (plus uses FIEMAP to check extents for sparseness).
>>>
>>> I want to avoid adding code just for the heck of it. We need a deeper
>>> understanding:
>>>
>>> Please drop FADVISE_DONTNEED and compare again to see if it changes the
>>> benchmark.
>>>
>>> By the way, did you perform several runs to check the variance of the
>>> running time? I don't know if the 2 seconds difference were noise or
>>> because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
>> There was no effect on the runtime as far as I remember. I ran
>> some tests, but not a number large enough to filter out the noise.
>>
>> I created this one because we saw it helps under memory pressure.
>> Maybe its too specific to add it into mainline qemu, but I wanted to
>> avoid to have too much individual changes we need to maintain.
> I'm open to merging it if the improvement can be quantified. Right now
> this might be a workaround for Linux memory management heuristics or it
> might not have any effect, I don't know.
I understand that you are critical about it. I can just say it solved
the problem with the specific setup, kernel version etc.
I found that FADVISE_DONTNEED solves problems also in other applications.
Offtopic: i have an raspberry pi running as tvheadend and observed desync
of the DVBS2 signal at some times. Since I FADV_DONTNEED all written
frames away it runs smothly. I this case the feeing of the page cache was
CPU intensive for the small device and caused the desync.
>
>>>> diff --git a/block/raw-posix.c b/block/raw-posix.c
>>>> index 6586a0c..9768cc4 100644
>>>> --- a/block/raw-posix.c
>>>> +++ b/block/raw-posix.c
>>>> @@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
>>>> }
>>>> #endif
>>>>
>>>> +#ifdef POSIX_FADV_SEQUENTIAL
>>>> + if (bs->open_flags & BDRV_O_SEQUENTIAL &&
>>>> + !(bs->open_flags & BDRV_O_NOCACHE)) {
>>>> + posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
>>>> + }
>>>> +#endif
>>> This is only true if the image format is raw. If the image format on
>>> top of this raw-posix BDS is non-raw then the read pattern may not be
>>> sequential.
>> You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> If the user specifies qemu-img convert -N then it will be set for any
> image format.
Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
is not passed on, or is it?
>
> Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> raw_bsd.c format propagate it to bs->file while other formats do not.
> Then the user doesn't have to specify a command-line option and we don't
> set it for non-raw image formats.
This would be an option.
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
2014-06-05 8:09 ` Peter Lieven
@ 2014-06-05 8:13 ` Kevin Wolf
2014-06-05 13:54 ` Stefan Hajnoczi
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2014-06-05 8:13 UTC (permalink / raw)
To: Peter Lieven
Cc: famz, Stefan Hajnoczi, qemu-devel, Stefan Hajnoczi, shadowsor,
pbonzini
Am 05.06.2014 um 10:09 hat Peter Lieven geschrieben:
> On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> >On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> >>Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> >>>On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> >>>>this patch introduces a new flag to indicate that we are going to sequentially
> >>>>read from a file and do not plan to reread/reuse the data after it has been read.
> >>>>
> >>>>The current use of this flag is to open the source(s) of a qemu-img convert
> >>>>process. If a protocol from block/raw-posix.c is used posix_fadvise is utilized
> >>>>to advise to the kernel that we are going to read sequentially from the
> >>>>file and a POSIX_FADV_DONTNEED advise is issued after each write to indicate
> >>>>that there is no advantage keeping the blocks in the buffers.
> >>>>
> >>>>Consider the following test case that was created to confirm the behaviour of
> >>>>the new flag:
> >>>>
> >>>>A 10G logical volume was created and filled with random data.
> >>>>Then the logical volume was exported via qemu-img convert to an iscsi target.
> >>>>Before the export was started all caches of the linux kernel where dropped.
> >>>>
> >>>>Old behavior:
> >>>> - The convert process took 3m45s and the buffer cache grew up to 9.67 GB close
> >>>> to the end of the conversion. After qemu-img terminated all the buffers were
> >>>> freed by the kernel.
> >>>>
> >>>>New behavior with the -N switch:
> >>>> - The convert process took 3m43s and the buffer cache grew up to 15.48 MB close
> >>>> to the end with some small peaks up to 30 MB during the conversion.
> >>>FADVISE_SEQUENTIAL can be good since it doubles read-ahead on Linux.
> >>>
> >>>I'm skeptical of the effort to avoid buffer cache usage using
> >>>FADVISE_DONTNEED. The performance results tell me that less buffer
> >>>cache was used but that number doesn't have a direct effect on
> >>>application performance.
> >>>
> >>>Let's check GNU coreutils:
> >>>
> >>> $ cd coreutils
> >>> $ git grep FADVISE_DONTNEED
> >>> gl/lib/fadvise.h: FADVISE_DONTNEED = POSIX_FADV_DONTNEED,
> >>> gl/lib/fadvise.h: FADVISE_DONTNEED,
> >>> $
> >>>
> >>>GNU cp(1) does not care about minimizing impact on buffer cache using
> >>>FADVISE_DONTNEED. It just sets FADVISE_SEQUENTIAL on the source file
> >>>and calls read() (plus uses FIEMAP to check extents for sparseness).
> >>>
> >>>I want to avoid adding code just for the heck of it. We need a deeper
> >>>understanding:
> >>>
> >>>Please drop FADVISE_DONTNEED and compare again to see if it changes the
> >>>benchmark.
> >>>
> >>>By the way, did you perform several runs to check the variance of the
> >>>running time? I don't know if the 2 seconds difference were noise or
> >>>because FADVISE_SEQUENTIAL or because FADVISE_DONTNEED or because both.
> >>There was no effect on the runtime as far as I remember. I ran
> >>some tests, but not a number large enough to filter out the noise.
> >>
> >>I created this one because we saw it helps under memory pressure.
> >>Maybe its too specific to add it into mainline qemu, but I wanted to
> >>avoid to have too much individual changes we need to maintain.
> >I'm open to merging it if the improvement can be quantified. Right now
> >this might be a workaround for Linux memory management heuristics or it
> >might not have any effect, I don't know.
>
> I understand that you are critical about it. I can just say it solved
> the problem with the specific setup, kernel version etc.
>
> I found that FADVISE_DONTNEED solves problems also in other applications.
> Offtopic: i have an raspberry pi running as tvheadend and observed desync
> of the DVBS2 signal at some times. Since I FADV_DONTNEED all written
> frames away it runs smothly. I this case the feeing of the page cache was
> CPU intensive for the small device and caused the desync.
>
> >
> >>>>diff --git a/block/raw-posix.c b/block/raw-posix.c
> >>>>index 6586a0c..9768cc4 100644
> >>>>--- a/block/raw-posix.c
> >>>>+++ b/block/raw-posix.c
> >>>>@@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> >>>> }
> >>>> #endif
> >>>>+#ifdef POSIX_FADV_SEQUENTIAL
> >>>>+ if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> >>>>+ !(bs->open_flags & BDRV_O_NOCACHE)) {
> >>>>+ posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> >>>>+ }
> >>>>+#endif
> >>>This is only true if the image format is raw. If the image format on
> >>>top of this raw-posix BDS is non-raw then the read pattern may not be
> >>>sequential.
> >>You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> >If the user specifies qemu-img convert -N then it will be set for any
> >image format.
>
> Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
> is not passed on, or is it?
It isn't qcow2 but block.c that opens bs->file, and unless you
explicitly filter out a flag, bs->file inherits it. (If it didn't do
that, your patch would have no effect for raw either.)
> >Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> >raw_bsd.c format propagate it to bs->file while other formats do not.
> >Then the user doesn't have to specify a command-line option and we don't
> >set it for non-raw image formats.
>
> This would be an option.
I agree, though it's not quite clear how raw_bsd would do that. Would
that involve a bdrv_reopen() for bs->file?
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL
2014-06-05 8:13 ` Kevin Wolf
@ 2014-06-05 13:54 ` Stefan Hajnoczi
0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2014-06-05 13:54 UTC (permalink / raw)
To: Kevin Wolf
Cc: famz, Peter Lieven, qemu-devel, Stefan Hajnoczi, shadowsor,
pbonzini
On Thu, Jun 05, 2014 at 10:13:04AM +0200, Kevin Wolf wrote:
> Am 05.06.2014 um 10:09 hat Peter Lieven geschrieben:
> > On 05.06.2014 09:53, Stefan Hajnoczi wrote:
> > >On Wed, Jun 04, 2014 at 05:31:48PM +0200, Peter Lieven wrote:
> > >>Am 04.06.2014 17:12, schrieb Stefan Hajnoczi:
> > >>>On Fri, May 30, 2014 at 11:40:37PM +0200, Peter Lieven wrote:
> > >>>>diff --git a/block/raw-posix.c b/block/raw-posix.c
> > >>>>index 6586a0c..9768cc4 100644
> > >>>>--- a/block/raw-posix.c
> > >>>>+++ b/block/raw-posix.c
> > >>>>@@ -447,6 +447,13 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
> > >>>> }
> > >>>> #endif
> > >>>>+#ifdef POSIX_FADV_SEQUENTIAL
> > >>>>+ if (bs->open_flags & BDRV_O_SEQUENTIAL &&
> > >>>>+ !(bs->open_flags & BDRV_O_NOCACHE)) {
> > >>>>+ posix_fadvise(s->fd, 0, 0, POSIX_FADV_SEQUENTIAL);
> > >>>>+ }
> > >>>>+#endif
> > >>>This is only true if the image format is raw. If the image format on
> > >>>top of this raw-posix BDS is non-raw then the read pattern may not be
> > >>>sequential.
> > >>You are right, but will the other formats set BDRV_O_SEQUENTIAL?
> > >If the user specifies qemu-img convert -N then it will be set for any
> > >image format.
> >
> > Of course, but when e.g. qcow2 opens its underlying file, then BDRV_O_SEQUENTIAL
> > is not passed on, or is it?
>
> It isn't qcow2 but block.c that opens bs->file, and unless you
> explicitly filter out a flag, bs->file inherits it. (If it didn't do
> that, your patch would have no effect for raw either.)
Yes, exactly. When a raw image file is opened there are actually two
BlockDriverStates:
raw_bsd ("drive0")
file: raw-posix (anonymous)
Since your patch affected the buffer cache counter, we know that the
flag was propagated down to raw-posix (by block.c as Kevin explained).
The qcow2 case looks like this:
qcow2 ("drive0")
file: raw-posix (anonymous)
> > >Maybe qemu-img convert can always set BDRV_O_SEQUENTIAL and the have the
> > >raw_bsd.c format propagate it to bs->file while other formats do not.
> > >Then the user doesn't have to specify a command-line option and we don't
> > >set it for non-raw image formats.
> >
> > This would be an option.
>
> I agree, though it's not quite clear how raw_bsd would do that. Would
> that involve a bdrv_reopen() for bs->file?
One way is to add a BlockDriver bitmask field for options that get
propagated to its children. Only raw_bsd will include
BDRV_O_SEQUENTIAL.
Stefan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2014-06-05 13:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1401486037-25609-1-git-send-email-pl@kamp.de>
2014-06-04 15:12 ` [Qemu-devel] [PATCHv3 RESEND] block: introduce BDRV_O_SEQUENTIAL Stefan Hajnoczi
2014-06-04 15:31 ` Peter Lieven
2014-06-05 7:53 ` Stefan Hajnoczi
2014-06-05 8:09 ` Peter Lieven
2014-06-05 8:13 ` Kevin Wolf
2014-06-05 13:54 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).