qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Markus Pargmann <mpa@pengutronix.de>, nbd-general@lists.sourceforge.net
Cc: den@openvz.org, qemu-devel@nongnu.org, pborzenkov@virtuozzo.com
Subject: Re: [Qemu-devel] [Nbd] [PATCH v2] doc: Allow NBD_CMD_FLAG_NO_HOLE during NBD_CMD_WRITE
Date: Tue, 5 Apr 2016 10:43:14 -0600	[thread overview]
Message-ID: <5703EB22.5010507@redhat.com> (raw)
In-Reply-To: <4138293.p7OeGaqt34@adelgunde>

[-- Attachment #1: Type: text/plain, Size: 3599 bytes --]

On 04/05/2016 03:38 AM, Markus Pargmann wrote:
> Hi,
> 
> On Monday 04 April 2016 16:15:43 Eric Blake wrote:
>> qemu already has an existing server implementation option that will
>> explicitly search the payload of NBD_CMD_WRITE for large blocks of
>> zeroes, and punch holes in the underlying file.  For old clients
>> that don't know how to use the new NBD_CMD_WRITE_ZEROES, this is a
>> workaround to keep the server's destination file approximately as
>> sparse as the client's source.  However, for new clients that know
>> how to explicitly request holes, it is unnecessary overhead; and
>> can lead to the server punching a hole and risking fragmentation or
>> future ENOSPC even when the client explicitly wanted to write
>> zeroes rather than a hole.  So it makes sense to let the new
>> NBD_CMD_FLAG_NO_HOLE work for WRITE as well as WRITE_ZEROES.
> 
> From the commit message it sounds like this is only for new clients
> supporting WRITE_ZEROES because for those we don't want to search
> through all the data of normal WRITEs. If you don't need to set this for
> each WRITE individually perhaps we could move it to the negotiation
> part?

Interesting idea.  So we'd add a new NBD_OPT_XXX that lets the server
know that "I plan on using WRITE_ZEROS and TRIM as the only places where
I want you to trim, so you can avoid scanning for zeroes in WRITE"; the
server replies with NBD_REP_ACK if it understands the client (in which
case the server _should_ be advertising NBD_FLAG_SEND_WRITE_ZEROES
and/or NBD_FLAG_SEND_TRIM), and with NBD_REP_ERR_UNSUP if it is too old
(the server may still advertise TRIM, but probably should not advertise
WRITE_ZEROES - we are still early enough that we could mandate that any
server that supports WRITE_ZEROES also supports the new NBD_OPT_XXX).
The client then knows that either the server will be efficient with
WRITE and the client uses WRITE_ZEROES and TRIM as desired, or that the
server is old and the client is stuck with WRITE anyways (and whether
the server trims or not is beyond the client's control).

Meanwhile, the new server can unconditionally advertise SEND_TRIM and
SEND_WRITE_ZEROES, whether or not a client uses NBD_OPT_XXX.  If it is
an old client connecting and no NBD_OPT_XXX is sent, chances are the
client is also too old to ever use WRITE_ZEROES, so the server should
feel free to apply its policy on whether to scan for zeroes in WRITE (in
qemu's case, the server policy is set via command line options,
precisely to cater to the scenario where we WANT the server to scan
zeroes to make up for the client being unable to pass sparse regions
efficiently vs. cases where the scanning is deemed too expensive); but
if the client DID send NBD_OPT_XXX, the server SHOULD NOT punch holes
during WRITE, and should not waste time scanning, no matter what the
command line policy permitted.

This also helps the case of clients divided between userspace and
kernel: the way I wrote the proposal, the kernel has to pay attention to
NBD_FLAG_SEND_WRITE_ZEROES, and if present, add NBD_CMD_FLAG_NO_HOLE to
every write.  But with your proposal of option negotiation (done in
userspace), the default of WRITE is now the most efficient on new
servers, and unchanged for old servers, so the kernel doesn't have to do
anything different.

Does the idea of a new NBD_OPT_ make enough sense to write that up
rather than mandating the use of NBD_CMD_FLAG_NO_HOLE with WRITE?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]

  reply	other threads:[~2016-04-05 16:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-04 14:15 [Qemu-devel] [PATCH] doc: Allow NBD_CMD_FLAG_NO_HOLE during NBD_CMD_WRITE Eric Blake
2016-04-04 14:47 ` Denis V. Lunev
2016-04-04 15:00   ` Eric Blake
2016-04-04 15:16 ` Alex Bligh
2016-04-04 22:15 ` [Qemu-devel] [PATCH v2] " Eric Blake
2016-04-05  9:38   ` [Qemu-devel] [Nbd] " Markus Pargmann
2016-04-05 16:43     ` Eric Blake [this message]
2016-04-05 20:45       ` Wouter Verhelst
2016-04-05 22:51 ` [Qemu-devel] [PATCH] " Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5703EB22.5010507@redhat.com \
    --to=eblake@redhat.com \
    --cc=den@openvz.org \
    --cc=mpa@pengutronix.de \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pborzenkov@virtuozzo.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).