* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-17 20:58 ` Eric Blake
@ 2016-02-18 4:46 ` Denis V. Lunev
2016-02-18 8:30 ` Denis V. Lunev
2016-02-18 9:18 ` Roman Kagan
2016-02-18 12:14 ` [Qemu-devel] " Daniel P. Berrange
2 siblings, 1 reply; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-18 4:46 UTC (permalink / raw)
To: Eric Blake; +Cc: nbd-general, qemu-devel
On 02/17/2016 11:58 PM, Eric Blake wrote:
> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>> This patch proposes a new command to reduce the amount of data passed
>> through the wire when it is known that the data is all zeroes. This
>> functionality is generally useful for mirroring or backup operations.
>>
>> Currently available NBD_CMD_TRIM command can not be used as the
>> specification explicitely says that "a client MUST NOT make any
> s/explicitely/explicitly/
>
>> assumptions about the contents of the export affected by this
>> [NBD_CMD_TRIM] command, until overwriting it again with `NBD_CMD_WRITE`"
>>
>> Particular use case could be the following:
>>
>> QEMU project uses own implementation of NBD server to transfer data
>> in between different instances of QEMU. Typically we tranfer VM virtual
> s/tranfer/transfer/
>
>> disks over this channel. VM virtual disks are sparse and thus the
>> efficiency of backup and mirroring operations could be improved a lot.
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> ---
>> doc/proto.md | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/doc/proto.md b/doc/proto.md
>> index 43065b7..c94751a 100644
>> --- a/doc/proto.md
>> +++ b/doc/proto.md
>> @@ -241,6 +241,8 @@ immediately after the global flags field in oldstyle negotiation:
>> schedule I/O accesses as for a rotational medium
>> - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
>> `NBD_CMD_TRIM` commands
>> +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
>> + supports `NBD_CMD_WRITE_ZEROES` commands
>>
>> ##### Client flags
>>
>> @@ -446,6 +448,11 @@ The following request types exist:
>> about the contents of the export affected by this command, until
>> overwriting it again with `NBD_CMD_WRITE`.
>>
>> +* `NBD_CMD_WRITE_ZEROES` (6)
>> +
>> + A request to write zeroes. The command is functional equivalent of
>> + the NBD_WRITE_COMMAND but without payload sent through the channel.
> This lets us push holes during writes.
from my point this allows client to apply his policy. For QCOW2 output
target the
client could skip the block. For RAW file he could decide whether to use
UNMAP
and produce sparse file or use fallocate.
> Do we have the converse
> operation, that is, an easy way to query if a block of data will read as
> all zeroes, and therefore the client can bypass reading that portion of
> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
>
exactly!
static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
...
ret = bdrv_get_block_status_above(source, NULL, sector_num,
<------- query block state
nb_sectors, &pnum, &file);
if (ret < 0 || pnum < nb_sectors ||
(ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
mirror_read_complete, op);
} else if (ret & BDRV_BLOCK_ZERO) {
bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
<------ skip read op if allowed
s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
mirror_write_complete, op);
} else {
assert(!(ret & BDRV_BLOCK_DATA));
bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
mirror_write_complete, op);
}
return delay_ns;
Actually I have tried early at day begins to add .bdrv_co_write_zeroes
callback to NBD and it just works as expected. The problem is that
callback can not be written using NDB_SEND_TRIM to conform with the
NBD spec. But in QEMU -> QEMU communication it just works.
http://lists.nongnu.org/archive/html/qemu-devel/2016-02/msg03810.html
Den
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 4:46 ` Denis V. Lunev
@ 2016-02-18 8:30 ` Denis V. Lunev
0 siblings, 0 replies; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-18 8:30 UTC (permalink / raw)
To: Eric Blake; +Cc: nbd-general, qemu-devel
On 02/18/2016 07:46 AM, Denis V. Lunev wrote:
> On 02/17/2016 11:58 PM, Eric Blake wrote:
>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>> This patch proposes a new command to reduce the amount of data passed
>>> through the wire when it is known that the data is all zeroes. This
>>> functionality is generally useful for mirroring or backup operations.
>>>
>>> Currently available NBD_CMD_TRIM command can not be used as the
>>> specification explicitely says that "a client MUST NOT make any
>> s/explicitely/explicitly/
>>
>>> assumptions about the contents of the export affected by this
>>> [NBD_CMD_TRIM] command, until overwriting it again with
>>> `NBD_CMD_WRITE`"
>>>
>>> Particular use case could be the following:
>>>
>>> QEMU project uses own implementation of NBD server to transfer data
>>> in between different instances of QEMU. Typically we tranfer VM virtual
>> s/tranfer/transfer/
>>
>>> disks over this channel. VM virtual disks are sparse and thus the
>>> efficiency of backup and mirroring operations could be improved a lot.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> ---
>>> doc/proto.md | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/doc/proto.md b/doc/proto.md
>>> index 43065b7..c94751a 100644
>>> --- a/doc/proto.md
>>> +++ b/doc/proto.md
>>> @@ -241,6 +241,8 @@ immediately after the global flags field in
>>> oldstyle negotiation:
>>> schedule I/O accesses as for a rotational medium
>>> - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server
>>> supports
>>> `NBD_CMD_TRIM` commands
>>> +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the
>>> server
>>> + supports `NBD_CMD_WRITE_ZEROES` commands
>>> ##### Client flags
>>> @@ -446,6 +448,11 @@ The following request types exist:
>>> about the contents of the export affected by this command, until
>>> overwriting it again with `NBD_CMD_WRITE`.
>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>> +
>>> + A request to write zeroes. The command is functional equivalent of
>>> + the NBD_WRITE_COMMAND but without payload sent through the
>>> channel.
>> This lets us push holes during writes.
> from my point this allows client to apply his policy. For QCOW2 output
> target the
s/client/server/
Sorry, have mistyped.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-17 20:58 ` Eric Blake
2016-02-18 4:46 ` Denis V. Lunev
@ 2016-02-18 9:18 ` Roman Kagan
2016-02-18 10:36 ` Denis V. Lunev
2016-02-18 16:35 ` Eric Blake
2016-02-18 12:14 ` [Qemu-devel] " Daniel P. Berrange
2 siblings, 2 replies; 22+ messages in thread
From: Roman Kagan @ 2016-02-18 9:18 UTC (permalink / raw)
To: Eric Blake; +Cc: nbd-general, Denis V. Lunev, qemu-devel
On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
> > @@ -446,6 +448,11 @@ The following request types exist:
> > about the contents of the export affected by this command, until
> > overwriting it again with `NBD_CMD_WRITE`.
> >
> > +* `NBD_CMD_WRITE_ZEROES` (6)
> > +
> > + A request to write zeroes. The command is functional equivalent of
> > + the NBD_WRITE_COMMAND but without payload sent through the channel.
>
> This lets us push holes during writes. Do we have the converse
> operation, that is, an easy way to query if a block of data will read as
> all zeroes, and therefore the client can bypass reading that portion of
> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
The spec doesn't have anything like that.
OTOH, unlike the write case, where you have all the information and just
choose whether to send normal write or zero write, the extra round-trip
of a separate SEEK_HOLE/SEEK_DATA request may lead to actually degrading
the overall throughput.
Rather it may be a better idea to add something like sparse read where
the server would, instead of sending the full length of data in the
response payload, send a smarter variable-length package with a
scatter-gather list or a bitmap of used blocks in the beginning, and let
the client decode it and fill the gaps with zeros.
Roman.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 9:18 ` Roman Kagan
@ 2016-02-18 10:36 ` Denis V. Lunev
2016-02-18 16:35 ` Eric Blake
1 sibling, 0 replies; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-18 10:36 UTC (permalink / raw)
To: Roman Kagan, Eric Blake, nbd-general, qemu-devel
On 02/18/2016 12:18 PM, Roman Kagan wrote:
> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>> @@ -446,6 +448,11 @@ The following request types exist:
>>> about the contents of the export affected by this command, until
>>> overwriting it again with `NBD_CMD_WRITE`.
>>>
>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>> +
>>> + A request to write zeroes. The command is functional equivalent of
>>> + the NBD_WRITE_COMMAND but without payload sent through the channel.
>> This lets us push holes during writes. Do we have the converse
>> operation, that is, an easy way to query if a block of data will read as
>> all zeroes, and therefore the client can bypass reading that portion of
>> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
> The spec doesn't have anything like that.
>
> OTOH, unlike the write case, where you have all the information and just
> choose whether to send normal write or zero write, the extra round-trip
> of a separate SEEK_HOLE/SEEK_DATA request may lead to actually degrading
> the overall throughput.
>
> Rather it may be a better idea to add something like sparse read where
> the server would, instead of sending the full length of data in the
> response payload, send a smarter variable-length package with a
> scatter-gather list or a bitmap of used blocks in the beginning, and let
> the client decode it and fill the gaps with zeros.
>
> Roman.
ah, I see.
This story is more difficult but also viable for backup dirty bitmap
reading. But this will make the protocol more complex and will
require more efforts at specification stage.
I'd better start with the current change, which is simple enough
and make changes in a right direction and after that continue
with READ2 or whatever command.
Den
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 9:18 ` Roman Kagan
2016-02-18 10:36 ` Denis V. Lunev
@ 2016-02-18 16:35 ` Eric Blake
2016-02-18 17:23 ` [Qemu-devel] SUMMARY: " Denis V. Lunev
1 sibling, 1 reply; 22+ messages in thread
From: Eric Blake @ 2016-02-18 16:35 UTC (permalink / raw)
To: Roman Kagan, Denis V. Lunev, nbd-general, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 2038 bytes --]
On 02/18/2016 02:18 AM, Roman Kagan wrote:
> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>> @@ -446,6 +448,11 @@ The following request types exist:
>>> about the contents of the export affected by this command, until
>>> overwriting it again with `NBD_CMD_WRITE`.
>>>
>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>> +
>>> + A request to write zeroes. The command is functional equivalent of
>>> + the NBD_WRITE_COMMAND but without payload sent through the channel.
>>
>> This lets us push holes during writes. Do we have the converse
>> operation, that is, an easy way to query if a block of data will read as
>> all zeroes, and therefore the client can bypass reading that portion of
>> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
>
> The spec doesn't have anything like that.
>
> OTOH, unlike the write case, where you have all the information and just
> choose whether to send normal write or zero write, the extra round-trip
> of a separate SEEK_HOLE/SEEK_DATA request may lead to actually degrading
> the overall throughput.
>
> Rather it may be a better idea to add something like sparse read where
> the server would, instead of sending the full length of data in the
> response payload, send a smarter variable-length package with a
> scatter-gather list or a bitmap of used blocks in the beginning, and let
> the client decode it and fill the gaps with zeros.
Sure, that would work too, and sounds nicer. Either way, the point is
that we should strongly consider improving the NBD protocol to allow
more efficient handling of sparse files, in both the push and in the
pull direction. Qemu already has a desire to use both directions of
improvements, but there are more programs, both clients and servers,
outside of qemu, that could benefit from such protocol improvements.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* [Qemu-devel] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 16:35 ` Eric Blake
@ 2016-02-18 17:23 ` Denis V. Lunev
2016-02-18 17:55 ` Eric Blake
` (2 more replies)
0 siblings, 3 replies; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-18 17:23 UTC (permalink / raw)
To: Eric Blake, Roman Kagan, nbd-general, qemu-devel, Stefan Hajnoczi,
Daniel P. Berrange, Vladimir Sementsov-Ogievskiy
On 02/18/2016 07:35 PM, Eric Blake wrote:
> On 02/18/2016 02:18 AM, Roman Kagan wrote:
>> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>>> @@ -446,6 +448,11 @@ The following request types exist:
>>>> about the contents of the export affected by this command, until
>>>> overwriting it again with `NBD_CMD_WRITE`.
>>>>
>>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>>> +
>>>> + A request to write zeroes. The command is functional equivalent of
>>>> + the NBD_WRITE_COMMAND but without payload sent through the channel.
>>> This lets us push holes during writes. Do we have the converse
>>> operation, that is, an easy way to query if a block of data will read as
>>> all zeroes, and therefore the client can bypass reading that portion of
>>> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
>> The spec doesn't have anything like that.
>>
>> OTOH, unlike the write case, where you have all the information and just
>> choose whether to send normal write or zero write, the extra round-trip
>> of a separate SEEK_HOLE/SEEK_DATA request may lead to actually degrading
>> the overall throughput.
>>
>> Rather it may be a better idea to add something like sparse read where
>> the server would, instead of sending the full length of data in the
>> response payload, send a smarter variable-length package with a
>> scatter-gather list or a bitmap of used blocks in the beginning, and let
>> the client decode it and fill the gaps with zeros.
> Sure, that would work too, and sounds nicer. Either way, the point is
> that we should strongly consider improving the NBD protocol to allow
> more efficient handling of sparse files, in both the push and in the
> pull direction. Qemu already has a desire to use both directions of
> improvements, but there are more programs, both clients and servers,
> outside of qemu, that could benefit from such protocol improvements.
>
OK
Here is a short summary of features which seems necessary from QEMU point of
view:
- ability to avoid sending zeroes during write operation. The proposal
comes in
the thread-starter letter
- ability to request block status (allocate/not allocated) from server.
This seems
interesting to preserve "sparseness" of the transferring data
- ability to skip zeroes during read operation, i.e. something like
READ2 command
which will return vector of chunks as a reply
All 3 features seem usable for generic NBD use-cases and not only for QEMU.
If there are no objections I'll sum this up and come with a
specification draft.
Den
P.S. I have added here all parties which have participated in
conversation in
different threads on QEMU side.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 17:23 ` [Qemu-devel] SUMMARY: " Denis V. Lunev
@ 2016-02-18 17:55 ` Eric Blake
2016-02-18 19:29 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-02-19 7:12 ` [Qemu-devel] " Denis V. Lunev
2 siblings, 0 replies; 22+ messages in thread
From: Eric Blake @ 2016-02-18 17:55 UTC (permalink / raw)
To: Denis V. Lunev, Roman Kagan, nbd-general, qemu-devel,
Stefan Hajnoczi, Daniel P. Berrange, Vladimir Sementsov-Ogievskiy,
Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 1910 bytes --]
On 02/18/2016 10:23 AM, Denis V. Lunev wrote:
>
> Here is a short summary of features which seems necessary from QEMU
> point of
> view:
> - ability to avoid sending zeroes during write operation. The proposal
> comes in
> the thread-starter letter
> - ability to request block status (allocate/not allocated) from server.
> This seems
> interesting to preserve "sparseness" of the transferring data
> - ability to skip zeroes during read operation, i.e. something like
> READ2 command
> which will return vector of chunks as a reply
>
> All 3 features seem usable for generic NBD use-cases and not only for QEMU.
All three features must be negotiated as part of connection handshake.
And we want to ensure sane fallbacks:
Client - if the server does not support the features, we fall back to
writing explicit zeroes (and give up on sparseness), and to assuming the
entire image is non-sparse (can't query or read sparseness).
Server - if client requests write 0, optimize where underlying storage
allows it, but we can always fall back to explicitly writing 0s and
merely treating the protocol as a compression of what is sent over the
wire. If client requests block status, but underlying storage doesn't
provide it, we can always fall back to claiming the entire image is
allocated. If client requests RAED2 but underlying storage has no way
to detect holes, we can always fall back to sending a single vector
covering the entire read request (no compression).
>
> If there are no objections I'll sum this up and come with a
> specification draft.
Good luck! I'm sure you'll get good reviews.
>
> Den
>
> P.S. I have added here all parties which have participated in
> conversation in
> different threads on QEMU side.
>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [Nbd] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 17:23 ` [Qemu-devel] SUMMARY: " Denis V. Lunev
2016-02-18 17:55 ` Eric Blake
@ 2016-02-18 19:29 ` Alex Bligh
2016-02-19 7:12 ` [Qemu-devel] " Denis V. Lunev
2 siblings, 0 replies; 22+ messages in thread
From: Alex Bligh @ 2016-02-18 19:29 UTC (permalink / raw)
To: Denis V. Lunev
Cc: nbd-general@lists.sourceforge.net, Vladimir Sementsov-Ogievskiy,
Alex Bligh, qemu-devel@nongnu.org, Roman Kagan,
Stefan stefanha@redhat. com, Fam famz@redhat. com
On 18 Feb 2016, at 17:23, Denis V. Lunev <den@openvz.org> wrote:
> - ability to skip zeroes during read operation, i.e. something like
> READ2 command
> which will return vector of chunks as a reply
...
> If there are no objections I'll sum this up and come with a
> specification draft.
If you are fixing READ2 to allow a vector based reply, please also
consider allowing an error to be part of that vector. The protocol
currently has an issue where it is non-obvious how to return
an error encountered midway through a long read where some
data has already been sent to the client (from memory
i.e. unless it's already been fixed).
--
Alex Bligh
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 17:23 ` [Qemu-devel] SUMMARY: " Denis V. Lunev
2016-02-18 17:55 ` Eric Blake
2016-02-18 19:29 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-02-19 7:12 ` Denis V. Lunev
2016-02-19 8:56 ` Vladimir Sementsov-Ogievskiy
2016-02-19 9:11 ` Daniel P. Berrange
2 siblings, 2 replies; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-19 7:12 UTC (permalink / raw)
To: Eric Blake, Roman Kagan, nbd-general, qemu-devel, Stefan Hajnoczi,
Daniel P. Berrange, Vladimir Sementsov-Ogievskiy
On 02/18/2016 08:23 PM, Denis V. Lunev wrote:
> On 02/18/2016 07:35 PM, Eric Blake wrote:
>> On 02/18/2016 02:18 AM, Roman Kagan wrote:
>>> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>>>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>>>> @@ -446,6 +448,11 @@ The following request types exist:
>>>>> about the contents of the export affected by this command,
>>>>> until
>>>>> overwriting it again with `NBD_CMD_WRITE`.
>>>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>>>> +
>>>>> + A request to write zeroes. The command is functional
>>>>> equivalent of
>>>>> + the NBD_WRITE_COMMAND but without payload sent through the
>>>>> channel.
>>>> This lets us push holes during writes. Do we have the converse
>>>> operation, that is, an easy way to query if a block of data will
>>>> read as
>>>> all zeroes, and therefore the client can bypass reading that
>>>> portion of
>>>> the disk (in other words, an equivalent to
>>>> lseek(SEEK_HOLE/SEEK_DATA))?
>>> The spec doesn't have anything like that.
>>>
>>> OTOH, unlike the write case, where you have all the information and
>>> just
>>> choose whether to send normal write or zero write, the extra round-trip
>>> of a separate SEEK_HOLE/SEEK_DATA request may lead to actually
>>> degrading
>>> the overall throughput.
>>>
>>> Rather it may be a better idea to add something like sparse read where
>>> the server would, instead of sending the full length of data in the
>>> response payload, send a smarter variable-length package with a
>>> scatter-gather list or a bitmap of used blocks in the beginning, and
>>> let
>>> the client decode it and fill the gaps with zeros.
>> Sure, that would work too, and sounds nicer. Either way, the point is
>> that we should strongly consider improving the NBD protocol to allow
>> more efficient handling of sparse files, in both the push and in the
>> pull direction. Qemu already has a desire to use both directions of
>> improvements, but there are more programs, both clients and servers,
>> outside of qemu, that could benefit from such protocol improvements.
>>
> OK
>
> Here is a short summary of features which seems necessary from QEMU
> point of
> view:
> - ability to avoid sending zeroes during write operation. The proposal
> comes in
> the thread-starter letter
> - ability to request block status (allocate/not allocated) from
> server. This seems
> interesting to preserve "sparseness" of the transferring data
> - ability to skip zeroes during read operation, i.e. something like
> READ2 command
> which will return vector of chunks as a reply
>
> All 3 features seem usable for generic NBD use-cases and not only for
> QEMU.
>
> If there are no objections I'll sum this up and come with a
> specification draft.
>
> Den
>
> P.S. I have added here all parties which have participated in
> conversation in
> different threads on QEMU side.
interesting point from a verbal discussion with one of my friends.
Protocol level compression could eliminate the necessity to
think about zeroes in channel either from read or from write
point of views and will also reduce the amount of data to
transfer.
Den
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-19 7:12 ` [Qemu-devel] " Denis V. Lunev
@ 2016-02-19 8:56 ` Vladimir Sementsov-Ogievskiy
2016-02-19 9:11 ` Daniel P. Berrange
1 sibling, 0 replies; 22+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2016-02-19 8:56 UTC (permalink / raw)
To: Denis V. Lunev, Eric Blake, Roman Kagan, nbd-general, qemu-devel,
Stefan Hajnoczi, Daniel P. Berrange, Fam Zheng
On 19.02.2016 10:12, Denis V. Lunev wrote:
> On 02/18/2016 08:23 PM, Denis V. Lunev wrote:
>> On 02/18/2016 07:35 PM, Eric Blake wrote:
>>> On 02/18/2016 02:18 AM, Roman Kagan wrote:
>>>> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>>>>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>>>>> @@ -446,6 +448,11 @@ The following request types exist:
>>>>>> about the contents of the export affected by this command,
>>>>>> until
>>>>>> overwriting it again with `NBD_CMD_WRITE`.
>>>>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>>>>> +
>>>>>> + A request to write zeroes. The command is functional
>>>>>> equivalent of
>>>>>> + the NBD_WRITE_COMMAND but without payload sent through the
>>>>>> channel.
>>>>> This lets us push holes during writes. Do we have the converse
>>>>> operation, that is, an easy way to query if a block of data will
>>>>> read as
>>>>> all zeroes, and therefore the client can bypass reading that
>>>>> portion of
>>>>> the disk (in other words, an equivalent to
>>>>> lseek(SEEK_HOLE/SEEK_DATA))?
>>>> The spec doesn't have anything like that.
>>>>
>>>> OTOH, unlike the write case, where you have all the information and
>>>> just
>>>> choose whether to send normal write or zero write, the extra
>>>> round-trip
>>>> of a separate SEEK_HOLE/SEEK_DATA request may lead to actually
>>>> degrading
>>>> the overall throughput.
>>>>
>>>> Rather it may be a better idea to add something like sparse read where
>>>> the server would, instead of sending the full length of data in the
>>>> response payload, send a smarter variable-length package with a
>>>> scatter-gather list or a bitmap of used blocks in the beginning,
>>>> and let
>>>> the client decode it and fill the gaps with zeros.
>>> Sure, that would work too, and sounds nicer. Either way, the point is
>>> that we should strongly consider improving the NBD protocol to allow
>>> more efficient handling of sparse files, in both the push and in the
>>> pull direction. Qemu already has a desire to use both directions of
>>> improvements, but there are more programs, both clients and servers,
>>> outside of qemu, that could benefit from such protocol improvements.
>>>
>> OK
>>
>> Here is a short summary of features which seems necessary from QEMU
>> point of
>> view:
>> - ability to avoid sending zeroes during write operation. The
>> proposal comes in
>> the thread-starter letter
>> - ability to request block status (allocate/not allocated) from
>> server. This seems
>> interesting to preserve "sparseness" of the transferring data
>> - ability to skip zeroes during read operation, i.e. something like
>> READ2 command
>> which will return vector of chunks as a reply
>>
>> All 3 features seem usable for generic NBD use-cases and not only for
>> QEMU.
>>
>> If there are no objections I'll sum this up and come with a
>> specification draft.
>>
>> Den
>>
>> P.S. I have added here all parties which have participated in
>> conversation in
>> different threads on QEMU side.
>
> interesting point from a verbal discussion with one of my friends.
> Protocol level compression could eliminate the necessity to
> think about zeroes in channel either from read or from write
> point of views and will also reduce the amount of data to
> transfer.
>
> Den
Compression is worse than separate commands, because after decompression
we will have to write or somehow test these zeroes again.
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] SUMMARY: Re: [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-19 7:12 ` [Qemu-devel] " Denis V. Lunev
2016-02-19 8:56 ` Vladimir Sementsov-Ogievskiy
@ 2016-02-19 9:11 ` Daniel P. Berrange
1 sibling, 0 replies; 22+ messages in thread
From: Daniel P. Berrange @ 2016-02-19 9:11 UTC (permalink / raw)
To: Denis V. Lunev
Cc: nbd-general, Vladimir Sementsov-Ogievskiy, Fam Zheng, qemu-devel,
Roman Kagan, Stefan Hajnoczi
On Fri, Feb 19, 2016 at 10:12:09AM +0300, Denis V. Lunev wrote:
> On 02/18/2016 08:23 PM, Denis V. Lunev wrote:
> >On 02/18/2016 07:35 PM, Eric Blake wrote:
> >>On 02/18/2016 02:18 AM, Roman Kagan wrote:
> >>>On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
> >>>>On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
> >>>>>@@ -446,6 +448,11 @@ The following request types exist:
> >>>>> about the contents of the export affected by this command,
> >>>>>until
> >>>>> overwriting it again with `NBD_CMD_WRITE`.
> >>>>> +* `NBD_CMD_WRITE_ZEROES` (6)
> >>>>>+
> >>>>>+ A request to write zeroes. The command is functional
> >>>>>equivalent of
> >>>>>+ the NBD_WRITE_COMMAND but without payload sent through the
> >>>>>channel.
> >>>>This lets us push holes during writes. Do we have the converse
> >>>>operation, that is, an easy way to query if a block of data will
> >>>>read as
> >>>>all zeroes, and therefore the client can bypass reading that portion
> >>>>of
> >>>>the disk (in other words, an equivalent to
> >>>>lseek(SEEK_HOLE/SEEK_DATA))?
> >>>The spec doesn't have anything like that.
> >>>
> >>>OTOH, unlike the write case, where you have all the information and
> >>>just
> >>>choose whether to send normal write or zero write, the extra round-trip
> >>>of a separate SEEK_HOLE/SEEK_DATA request may lead to actually
> >>>degrading
> >>>the overall throughput.
> >>>
> >>>Rather it may be a better idea to add something like sparse read where
> >>>the server would, instead of sending the full length of data in the
> >>>response payload, send a smarter variable-length package with a
> >>>scatter-gather list or a bitmap of used blocks in the beginning, and
> >>>let
> >>>the client decode it and fill the gaps with zeros.
> >>Sure, that would work too, and sounds nicer. Either way, the point is
> >>that we should strongly consider improving the NBD protocol to allow
> >>more efficient handling of sparse files, in both the push and in the
> >>pull direction. Qemu already has a desire to use both directions of
> >>improvements, but there are more programs, both clients and servers,
> >>outside of qemu, that could benefit from such protocol improvements.
> >>
> >OK
> >
> >Here is a short summary of features which seems necessary from QEMU point
> >of
> >view:
> >- ability to avoid sending zeroes during write operation. The proposal
> >comes in
> > the thread-starter letter
> >- ability to request block status (allocate/not allocated) from server.
> >This seems
> > interesting to preserve "sparseness" of the transferring data
> >- ability to skip zeroes during read operation, i.e. something like READ2
> >command
> > which will return vector of chunks as a reply
> >
> >All 3 features seem usable for generic NBD use-cases and not only for
> >QEMU.
> >
> >If there are no objections I'll sum this up and come with a specification
> >draft.
> >
> >Den
> >
> >P.S. I have added here all parties which have participated in conversation
> >in
> > different threads on QEMU side.
>
> interesting point from a verbal discussion with one of my friends.
> Protocol level compression could eliminate the necessity to
> think about zeroes in channel either from read or from write
> point of views and will also reduce the amount of data to
> transfer.
With compression you have thrown away information about sparseness which
you really want to have when writing out the file on the other end. It
forces you to do memcmp detection of zero regions after decompression
which is CPU intensive.
Compression is a fine as a concept, but it is not a replacement for
handling sparseness directly in the protocol.
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-17 20:58 ` Eric Blake
2016-02-18 4:46 ` Denis V. Lunev
2016-02-18 9:18 ` Roman Kagan
@ 2016-02-18 12:14 ` Daniel P. Berrange
2016-02-18 14:05 ` Denis V. Lunev
2 siblings, 1 reply; 22+ messages in thread
From: Daniel P. Berrange @ 2016-02-18 12:14 UTC (permalink / raw)
To: Eric Blake; +Cc: nbd-general, Denis V. Lunev, qemu-devel
On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
> > This patch proposes a new command to reduce the amount of data passed
> > through the wire when it is known that the data is all zeroes. This
> > functionality is generally useful for mirroring or backup operations.
> >
> > Currently available NBD_CMD_TRIM command can not be used as the
> > specification explicitely says that "a client MUST NOT make any
>
> s/explicitely/explicitly/
>
> > assumptions about the contents of the export affected by this
> > [NBD_CMD_TRIM] command, until overwriting it again with `NBD_CMD_WRITE`"
> >
> > Particular use case could be the following:
> >
> > QEMU project uses own implementation of NBD server to transfer data
> > in between different instances of QEMU. Typically we tranfer VM virtual
>
> s/tranfer/transfer/
>
> > disks over this channel. VM virtual disks are sparse and thus the
> > efficiency of backup and mirroring operations could be improved a lot.
> >
> > Signed-off-by: Denis V. Lunev <den@openvz.org>
> > ---
> > doc/proto.md | 7 +++++++
> > 1 file changed, 7 insertions(+)
> >
> > diff --git a/doc/proto.md b/doc/proto.md
> > index 43065b7..c94751a 100644
> > --- a/doc/proto.md
> > +++ b/doc/proto.md
> > @@ -241,6 +241,8 @@ immediately after the global flags field in oldstyle negotiation:
> > schedule I/O accesses as for a rotational medium
> > - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
> > `NBD_CMD_TRIM` commands
> > +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
> > + supports `NBD_CMD_WRITE_ZEROES` commands
> >
> > ##### Client flags
> >
> > @@ -446,6 +448,11 @@ The following request types exist:
> > about the contents of the export affected by this command, until
> > overwriting it again with `NBD_CMD_WRITE`.
> >
> > +* `NBD_CMD_WRITE_ZEROES` (6)
> > +
> > + A request to write zeroes. The command is functional equivalent of
> > + the NBD_WRITE_COMMAND but without payload sent through the channel.
>
> This lets us push holes during writes. Do we have the converse
> operation, that is, an easy way to query if a block of data will read as
> all zeroes, and therefore the client can bypass reading that portion of
> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
Stefan has suggested that we add a command to the NBD spec that
implements the SCSI Get LBA Status command. This lets clients
query the allocation bitmap for the device, which would serve
this purpose.
https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg03582.html
In that thread he talks about it being a way to serve up the dirty
bitmap for live backup scenario, but in regular usage it obviously
provides the normal allocation bitmap
Regards,
Daniel
--
|: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org -o- http://virt-manager.org :|
|: http://autobuild.org -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [Qemu-devel] [RFC 1/1] nbd (specification): add NBD_CMD_WRITE_ZEROES command
2016-02-18 12:14 ` [Qemu-devel] " Daniel P. Berrange
@ 2016-02-18 14:05 ` Denis V. Lunev
0 siblings, 0 replies; 22+ messages in thread
From: Denis V. Lunev @ 2016-02-18 14:05 UTC (permalink / raw)
To: Daniel P. Berrange, Eric Blake; +Cc: nbd-general, qemu-devel
On 02/18/2016 03:14 PM, Daniel P. Berrange wrote:
> On Wed, Feb 17, 2016 at 01:58:47PM -0700, Eric Blake wrote:
>> On 02/17/2016 11:10 AM, Denis V. Lunev wrote:
>>> This patch proposes a new command to reduce the amount of data passed
>>> through the wire when it is known that the data is all zeroes. This
>>> functionality is generally useful for mirroring or backup operations.
>>>
>>> Currently available NBD_CMD_TRIM command can not be used as the
>>> specification explicitely says that "a client MUST NOT make any
>> s/explicitely/explicitly/
>>
>>> assumptions about the contents of the export affected by this
>>> [NBD_CMD_TRIM] command, until overwriting it again with `NBD_CMD_WRITE`"
>>>
>>> Particular use case could be the following:
>>>
>>> QEMU project uses own implementation of NBD server to transfer data
>>> in between different instances of QEMU. Typically we tranfer VM virtual
>> s/tranfer/transfer/
>>
>>> disks over this channel. VM virtual disks are sparse and thus the
>>> efficiency of backup and mirroring operations could be improved a lot.
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> ---
>>> doc/proto.md | 7 +++++++
>>> 1 file changed, 7 insertions(+)
>>>
>>> diff --git a/doc/proto.md b/doc/proto.md
>>> index 43065b7..c94751a 100644
>>> --- a/doc/proto.md
>>> +++ b/doc/proto.md
>>> @@ -241,6 +241,8 @@ immediately after the global flags field in oldstyle negotiation:
>>> schedule I/O accesses as for a rotational medium
>>> - bit 5, `NBD_FLAG_SEND_TRIM`; should be set to 1 if the server supports
>>> `NBD_CMD_TRIM` commands
>>> +- bit 6, `NBD_FLAG_SEND_WRITE_ZEROES`; should be set to 1 if the server
>>> + supports `NBD_CMD_WRITE_ZEROES` commands
>>>
>>> ##### Client flags
>>>
>>> @@ -446,6 +448,11 @@ The following request types exist:
>>> about the contents of the export affected by this command, until
>>> overwriting it again with `NBD_CMD_WRITE`.
>>>
>>> +* `NBD_CMD_WRITE_ZEROES` (6)
>>> +
>>> + A request to write zeroes. The command is functional equivalent of
>>> + the NBD_WRITE_COMMAND but without payload sent through the channel.
>> This lets us push holes during writes. Do we have the converse
>> operation, that is, an easy way to query if a block of data will read as
>> all zeroes, and therefore the client can bypass reading that portion of
>> the disk (in other words, an equivalent to lseek(SEEK_HOLE/SEEK_DATA))?
> Stefan has suggested that we add a command to the NBD spec that
> implements the SCSI Get LBA Status command. This lets clients
> query the allocation bitmap for the device, which would serve
> this purpose.
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-02/msg03582.html
>
> In that thread he talks about it being a way to serve up the dirty
> bitmap for live backup scenario, but in regular usage it obviously
> provides the normal allocation bitmap
>
>
> Regards,
> Daniel
But in this case we should allow to query the information for
more than one block at once and also we will have to make
an agreement in between the client and server about
the granularity of the request or specify the granularity
along as the range in the call.
Den
^ permalink raw reply [flat|nested] 22+ messages in thread