* [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
@ 2018-06-07 12:46 Peter Lieven
2018-06-11 13:30 ` Max Reitz
0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2018-06-07 12:46 UTC (permalink / raw)
To: qemu-devel, qemu-block; +Cc: kwolf, mreitz, Peter Lieven
We currently don't enforce that the sparse segments we detect during convert are
aligned. This leads to unnecessary and costly read-modify-write cycles either
internally in Qemu or in the background on the storage device as nearly all
modern filesystems or hardware has a 4k alignment internally.
As we per default set the min_sparse size to 4k it makes perfectly sense to ensure
that these sparse holes in the file are placed at 4k boundaries.
The number of RMW cycles when converting an example image [1] to a raw device that
has 4k sector size is about 4600 4k read requests to perform a total of about 15000
write requests. With this path the 4600 additional read requests are eliminated.
[1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
Signed-off-by: Peter Lieven <pl@kamp.de>
---
qemu-img.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 75f1610..68eefba 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1096,24 +1096,33 @@ static int64_t find_nonzero(const uint8_t *buf, int64_t n)
*
* 'pnum' is set to the number of sectors (including and immediately following
* the first one) that are known to be in the same allocated/unallocated state.
+ * The function will try to align 'pnum' to 8 sectors (4k) to avoid unnecassary
+ * RMW cycles on modern hardware.
*/
static int is_allocated_sectors(const uint8_t *buf, int n, int *pnum)
{
bool is_zero;
- int i;
+ int i, alignment = 1;
if (n <= 0) {
*pnum = 0;
return 0;
}
- is_zero = buffer_is_zero(buf, 512);
- for(i = 1; i < n; i++) {
- buf += 512;
- if (is_zero != buffer_is_zero(buf, 512)) {
+
+ if (!(n & 7)) {
+ /* the buffer size is dividable by 4k */
+ alignment = 8;
+ n /= 8;
+ }
+
+ is_zero = buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment);
+ for (i = 1; i < n; i++) {
+ buf += BDRV_SECTOR_SIZE * alignment;
+ if (is_zero != buffer_is_zero(buf, BDRV_SECTOR_SIZE * alignment)) {
break;
}
}
- *pnum = i;
+ *pnum = i * alignment;
return !is_zero;
}
--
2.7.4
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
2018-06-07 12:46 [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k Peter Lieven
@ 2018-06-11 13:30 ` Max Reitz
2018-06-11 13:59 ` Peter Lieven
0 siblings, 1 reply; 6+ messages in thread
From: Max Reitz @ 2018-06-11 13:30 UTC (permalink / raw)
To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf
[-- Attachment #1: Type: text/plain, Size: 1614 bytes --]
On 2018-06-07 14:46, Peter Lieven wrote:
> We currently don't enforce that the sparse segments we detect during convert are
> aligned. This leads to unnecessary and costly read-modify-write cycles either
> internally in Qemu or in the background on the storage device as nearly all
> modern filesystems or hardware has a 4k alignment internally.
>
> As we per default set the min_sparse size to 4k it makes perfectly sense to ensure
> that these sparse holes in the file are placed at 4k boundaries.
>
> The number of RMW cycles when converting an example image [1] to a raw device that
> has 4k sector size is about 4600 4k read requests to perform a total of about 15000
> write requests. With this path the 4600 additional read requests are eliminated.
>
> [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
> qemu-img.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
I like the idea, but it doesn't seem guaranteed that
is_allocated_sectors() is called on aligned offsets, so this alignment
work may still leave things unaligned.
Furthermore, we should probably not blindly assume 4k but instead use
some block limit of the target, like pwrite_zeroes_alignment, or
pdiscard_alignment, depending on the case. (Or probably still
min_sparse, if that's less.)
Since is_allocated_sectors_min() (the only caller of
is_allocated_sectors()) is called from just a single place, taking those
factors into account should be possible.
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
2018-06-11 13:30 ` Max Reitz
@ 2018-06-11 13:59 ` Peter Lieven
2018-06-11 14:04 ` Max Reitz
0 siblings, 1 reply; 6+ messages in thread
From: Peter Lieven @ 2018-06-11 13:59 UTC (permalink / raw)
To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf
Am 11.06.2018 um 15:30 schrieb Max Reitz:
> On 2018-06-07 14:46, Peter Lieven wrote:
>> We currently don't enforce that the sparse segments we detect during convert are
>> aligned. This leads to unnecessary and costly read-modify-write cycles either
>> internally in Qemu or in the background on the storage device as nearly all
>> modern filesystems or hardware has a 4k alignment internally.
>>
>> As we per default set the min_sparse size to 4k it makes perfectly sense to ensure
>> that these sparse holes in the file are placed at 4k boundaries.
>>
>> The number of RMW cycles when converting an example image [1] to a raw device that
>> has 4k sector size is about 4600 4k read requests to perform a total of about 15000
>> write requests. With this path the 4600 additional read requests are eliminated.
>>
>> [1] https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> qemu-img.c | 21 +++++++++++++++------
>> 1 file changed, 15 insertions(+), 6 deletions(-)
> I like the idea, but it doesn't seem guaranteed that
> is_allocated_sectors() is called on aligned offsets, so this alignment
> work may still leave things unaligned.
I can't image why this should happen. As long as the alignment devides the buffer size we either
write or skip aligned bytes. Maybe get_block_status returns an unaligned number of sectors?
>
> Furthermore, we should probably not blindly assume 4k but instead use
> some block limit of the target, like pwrite_zeroes_alignment, or
> pdiscard_alignment, depending on the case. (Or probably still
> min_sparse, if that's less.)
>
> Since is_allocated_sectors_min() (the only caller of
> is_allocated_sectors()) is called from just a single place, taking those
> factors into account should be possible.
I also thought of this, but for instance for raw-posix I always get a request_alignment of 1.
But maybe the alignments you proposed produce a better result. I will check that.
Thanks,
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
2018-06-11 13:59 ` Peter Lieven
@ 2018-06-11 14:04 ` Max Reitz
2018-06-11 14:07 ` Peter Lieven
2018-06-25 20:29 ` Peter Lieven
0 siblings, 2 replies; 6+ messages in thread
From: Max Reitz @ 2018-06-11 14:04 UTC (permalink / raw)
To: Peter Lieven, qemu-devel, qemu-block; +Cc: kwolf
[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]
On 2018-06-11 15:59, Peter Lieven wrote:
> Am 11.06.2018 um 15:30 schrieb Max Reitz:
>> On 2018-06-07 14:46, Peter Lieven wrote:
>>> We currently don't enforce that the sparse segments we detect during
>>> convert are
>>> aligned. This leads to unnecessary and costly read-modify-write
>>> cycles either
>>> internally in Qemu or in the background on the storage device as
>>> nearly all
>>> modern filesystems or hardware has a 4k alignment internally.
>>>
>>> As we per default set the min_sparse size to 4k it makes perfectly
>>> sense to ensure
>>> that these sparse holes in the file are placed at 4k boundaries.
>>>
>>> The number of RMW cycles when converting an example image [1] to a
>>> raw device that
>>> has 4k sector size is about 4600 4k read requests to perform a total
>>> of about 15000
>>> write requests. With this path the 4600 additional read requests are
>>> eliminated.
>>>
>>> [1]
>>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
>>>
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> ---
>>> qemu-img.c | 21 +++++++++++++++------
>>> 1 file changed, 15 insertions(+), 6 deletions(-)
>> I like the idea, but it doesn't seem guaranteed that
>> is_allocated_sectors() is called on aligned offsets, so this alignment
>> work may still leave things unaligned.
>
> I can't image why this should happen. As long as the alignment devides
> the buffer size we either
> write or skip aligned bytes. Maybe get_block_status returns an unaligned
> number of sectors?
Yes, because the source medium does not need to be the same as the
destination (so the source may have e.g. 512-byte clusters).
>> Furthermore, we should probably not blindly assume 4k but instead use
>> some block limit of the target, like pwrite_zeroes_alignment, or
>> pdiscard_alignment, depending on the case. (Or probably still
>> min_sparse, if that's less.)
>>
>> Since is_allocated_sectors_min() (the only caller of
>> is_allocated_sectors()) is called from just a single place, taking those
>> factors into account should be possible.
>
> I also thought of this, but for instance for raw-posix I always get a
> request_alignment of 1.
Yes, because request_alignment is a hard requirement. With caching, you
can send requests with any alignment, so it's 1.
pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal
alignment", so those should contain the values we/you want. If they are
0, then you should probably fall back to opt_transfer instead of
request_alignment.
Max
> But maybe the alignments you proposed produce a better result. I will
> check that.
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
2018-06-11 14:04 ` Max Reitz
@ 2018-06-11 14:07 ` Peter Lieven
2018-06-25 20:29 ` Peter Lieven
1 sibling, 0 replies; 6+ messages in thread
From: Peter Lieven @ 2018-06-11 14:07 UTC (permalink / raw)
To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf
Am 11.06.2018 um 16:04 schrieb Max Reitz:
> On 2018-06-11 15:59, Peter Lieven wrote:
>> Am 11.06.2018 um 15:30 schrieb Max Reitz:
>>> On 2018-06-07 14:46, Peter Lieven wrote:
>>>> We currently don't enforce that the sparse segments we detect during
>>>> convert are
>>>> aligned. This leads to unnecessary and costly read-modify-write
>>>> cycles either
>>>> internally in Qemu or in the background on the storage device as
>>>> nearly all
>>>> modern filesystems or hardware has a 4k alignment internally.
>>>>
>>>> As we per default set the min_sparse size to 4k it makes perfectly
>>>> sense to ensure
>>>> that these sparse holes in the file are placed at 4k boundaries.
>>>>
>>>> The number of RMW cycles when converting an example image [1] to a
>>>> raw device that
>>>> has 4k sector size is about 4600 4k read requests to perform a total
>>>> of about 15000
>>>> write requests. With this path the 4600 additional read requests are
>>>> eliminated.
>>>>
>>>> [1]
>>>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
>>>>
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> qemu-img.c | 21 +++++++++++++++------
>>>> 1 file changed, 15 insertions(+), 6 deletions(-)
>>> I like the idea, but it doesn't seem guaranteed that
>>> is_allocated_sectors() is called on aligned offsets, so this alignment
>>> work may still leave things unaligned.
>> I can't image why this should happen. As long as the alignment devides
>> the buffer size we either
>> write or skip aligned bytes. Maybe get_block_status returns an unaligned
>> number of sectors?
> Yes, because the source medium does not need to be the same as the
> destination (so the source may have e.g. 512-byte clusters).
Okay, I will try to figure out how to cope with it. So the function needs
to get the offset and the alignment to make the right "decision".
>
>>> Furthermore, we should probably not blindly assume 4k but instead use
>>> some block limit of the target, like pwrite_zeroes_alignment, or
>>> pdiscard_alignment, depending on the case. (Or probably still
>>> min_sparse, if that's less.)
>>>
>>> Since is_allocated_sectors_min() (the only caller of
>>> is_allocated_sectors()) is called from just a single place, taking those
>>> factors into account should be possible.
>> I also thought of this, but for instance for raw-posix I always get a
>> request_alignment of 1.
> Yes, because request_alignment is a hard requirement. With caching, you
> can send requests with any alignment, so it's 1.
>
> pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal
> alignment", so those should contain the values we/you want. If they are
> 0, then you should probably fall back to opt_transfer instead of
> request_alignment.
I will check that for the targets that I can test and send a V2.
Thanks for your feedback,
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k
2018-06-11 14:04 ` Max Reitz
2018-06-11 14:07 ` Peter Lieven
@ 2018-06-25 20:29 ` Peter Lieven
1 sibling, 0 replies; 6+ messages in thread
From: Peter Lieven @ 2018-06-25 20:29 UTC (permalink / raw)
To: Max Reitz, qemu-devel, qemu-block; +Cc: kwolf
Am 11.06.2018 um 16:04 schrieb Max Reitz:
> On 2018-06-11 15:59, Peter Lieven wrote:
>> Am 11.06.2018 um 15:30 schrieb Max Reitz:
>>> On 2018-06-07 14:46, Peter Lieven wrote:
>>>> We currently don't enforce that the sparse segments we detect during
>>>> convert are
>>>> aligned. This leads to unnecessary and costly read-modify-write
>>>> cycles either
>>>> internally in Qemu or in the background on the storage device as
>>>> nearly all
>>>> modern filesystems or hardware has a 4k alignment internally.
>>>>
>>>> As we per default set the min_sparse size to 4k it makes perfectly
>>>> sense to ensure
>>>> that these sparse holes in the file are placed at 4k boundaries.
>>>>
>>>> The number of RMW cycles when converting an example image [1] to a
>>>> raw device that
>>>> has 4k sector size is about 4600 4k read requests to perform a total
>>>> of about 15000
>>>> write requests. With this path the 4600 additional read requests are
>>>> eliminated.
>>>>
>>>> [1]
>>>> https://cloud-images.ubuntu.com/releases/16.04/release/ubuntu-16.04-server-cloudimg-amd64-disk1.vmdk
>>>>
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> qemu-img.c | 21 +++++++++++++++------
>>>> 1 file changed, 15 insertions(+), 6 deletions(-)
>>> I like the idea, but it doesn't seem guaranteed that
>>> is_allocated_sectors() is called on aligned offsets, so this alignment
>>> work may still leave things unaligned.
>> I can't image why this should happen. As long as the alignment devides
>> the buffer size we either
>> write or skip aligned bytes. Maybe get_block_status returns an unaligned
>> number of sectors?
> Yes, because the source medium does not need to be the same as the
> destination (so the source may have e.g. 512-byte clusters).
>
>>> Furthermore, we should probably not blindly assume 4k but instead use
>>> some block limit of the target, like pwrite_zeroes_alignment, or
>>> pdiscard_alignment, depending on the case. (Or probably still
>>> min_sparse, if that's less.)
>>>
>>> Since is_allocated_sectors_min() (the only caller of
>>> is_allocated_sectors()) is called from just a single place, taking those
>>> factors into account should be possible.
>> I also thought of this, but for instance for raw-posix I always get a
>> request_alignment of 1.
> Yes, because request_alignment is a hard requirement. With caching, you
> can send requests with any alignment, so it's 1.
>
> pwrite_zeroes_alignment and pdiscard_alignment are described as "Optimal
> alignment", so those should contain the values we/you want. If they are
> 0, then you should probably fall back to opt_transfer instead of
> request_alignment.
I am still trying to figure out what is the best solution. If I take the optima into
account I might ending up transfering more data than necessary just to create an optimal
request. I just want to avoid unnecessary RMW cycles. And even if modern byte interfaces
advertise a request_alignment of 1 someone has to do the RMW cycle. Either the OS or the
harddrive itself.
I am thinking about sth like
alignment = MAX(request_alignment, opt_transfer, min_sparse)
as a starting point?
I found that opt_transfer seems to be 0 for everything I found to test.
So maybe even reduce the alignment to MAX(request_alignment, min_sparse).
Peter
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-06-25 20:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-07 12:46 [Qemu-devel] [PATCH] qemu-img: align is_allocated_sectors to 4k Peter Lieven
2018-06-11 13:30 ` Max Reitz
2018-06-11 13:59 ` Peter Lieven
2018-06-11 14:04 ` Max Reitz
2018-06-11 14:07 ` Peter Lieven
2018-06-25 20:29 ` Peter Lieven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).