From: Wouter Verhelst <w@uter.be>
To: Eric Blake <eblake@redhat.com>
Cc: Alex Bligh <alex@alex.org.uk>,
"nbd-general@lists.sourceforge.net"
<nbd-general@lists.sourceforge.net>,
qemu block <qemu-block@nongnu.org>,
"qemu-trivial@nongnu.org" <qemu-trivial@nongnu.org>,
"qemu-stable@nongnu.org" <qemu-stable@nongnu.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Quentin Casasnovas <quentin.casasnovas@oracle.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
Date: Wed, 11 May 2016 23:10:20 +0200 [thread overview]
Message-ID: <20160511211020.GC5054@grep.be> (raw)
In-Reply-To: <5731FE53.6010602@redhat.com>
On Tue, May 10, 2016 at 09:29:23AM -0600, Eric Blake wrote:
> On 05/10/2016 09:08 AM, Alex Bligh wrote:
> > Eric,
> >
> >> Hmm. The current wording of the experimental block size additions does
> >> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
> >> maximum NBD_CMD_WRITE:
> >> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> >
> > Correct
> >
> >> Maybe we should revisit that in the spec, and/or advertise yet another
> >> block size (since the maximum size for a trim and/or write_zeroes
> >> request may indeed be different than the maximum size for a read/write).
> >
> > I think it's up to the server to either handle large requests, or
> > for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
>
> >
> > The core problem here is that the kernel (and, ahem, most servers) are
> > ignorant of the block size extension, and need to guess how to break
> > things up. In my view the client (kernel in this case) should
> > be breaking the trim requests up into whatever size it uses as the
> > maximum size write requests. But then it would have to know about block
> > sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension. (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size, but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size, and are at the
> mercy of however the kernel currently decides to split large requests).
I don't actually think it does that at all, tbh. There is an
"integrityhuge" test in the reference server test suite which performs a
number of large requests (up to 50M), and which was created by a script
that just does direct read requests to /dev/nbdX.
It just so happens that most upper layers (filesystems etc) don't make
requests larger than about 32MiB, but that's not related.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling). My question above only applies to
> clients that use the experimental block size extensions.
Right.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
WARNING: multiple messages have this Message-ID (diff)
From: Wouter Verhelst <w@uter.be>
To: Eric Blake <eblake@redhat.com>
Cc: Alex Bligh <alex@alex.org.uk>,
"nbd-general@lists.sourceforge.net"
<nbd-general@lists.sourceforge.net>,
qemu block <qemu-block@nongnu.org>,
"qemu-trivial@nongnu.org" <qemu-trivial@nongnu.org>,
"qemu-stable@nongnu.org" <qemu-stable@nongnu.org>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
Quentin Casasnovas <quentin.casasnovas@oracle.com>,
Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
Date: Wed, 11 May 2016 23:10:20 +0200 [thread overview]
Message-ID: <20160511211020.GC5054@grep.be> (raw)
In-Reply-To: <5731FE53.6010602@redhat.com>
On Tue, May 10, 2016 at 09:29:23AM -0600, Eric Blake wrote:
> On 05/10/2016 09:08 AM, Alex Bligh wrote:
> > Eric,
> >
> >> Hmm. The current wording of the experimental block size additions does
> >> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
> >> maximum NBD_CMD_WRITE:
> >> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> >
> > Correct
> >
> >> Maybe we should revisit that in the spec, and/or advertise yet another
> >> block size (since the maximum size for a trim and/or write_zeroes
> >> request may indeed be different than the maximum size for a read/write).
> >
> > I think it's up to the server to either handle large requests, or
> > for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
>
> >
> > The core problem here is that the kernel (and, ahem, most servers) are
> > ignorant of the block size extension, and need to guess how to break
> > things up. In my view the client (kernel in this case) should
> > be breaking the trim requests up into whatever size it uses as the
> > maximum size write requests. But then it would have to know about block
> > sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension. (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size, but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size, and are at the
> mercy of however the kernel currently decides to split large requests).
I don't actually think it does that at all, tbh. There is an
"integrityhuge" test in the reference server test suite which performs a
number of large requests (up to 50M), and which was created by a script
that just does direct read requests to /dev/nbdX.
It just so happens that most upper layers (filesystems etc) don't make
requests larger than about 32MiB, but that's not related.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling). My question above only applies to
> clients that use the experimental block size extensions.
Right.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
next prev parent reply other threads:[~2016-05-12 9:22 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-05-06 8:45 [Qemu-trivial] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE Quentin Casasnovas
2016-05-06 8:45 ` [Qemu-devel] " Quentin Casasnovas
2016-05-10 14:01 ` [Qemu-trivial] " Eric Blake
2016-05-10 14:01 ` Eric Blake
2016-05-10 15:08 ` [Qemu-trivial] [Nbd] " Alex Bligh
2016-05-10 15:08 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 15:29 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Eric Blake
2016-05-10 15:29 ` [Qemu-devel] [Nbd] " Eric Blake
2016-05-10 15:38 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-10 15:38 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 15:45 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Quentin Casasnovas
2016-05-10 15:45 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
2016-05-10 15:49 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-10 15:49 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 16:04 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Quentin Casasnovas
2016-05-10 16:04 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
2016-05-10 16:23 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-10 16:23 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 16:27 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Quentin Casasnovas
2016-05-10 16:27 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
2016-05-11 9:38 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Paolo Bonzini
2016-05-11 9:38 ` [Qemu-devel] [Nbd] " Paolo Bonzini
2016-05-11 14:08 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Eric Blake
2016-05-11 14:08 ` [Qemu-devel] [Nbd] " Eric Blake
2016-05-11 14:55 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-11 14:55 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-11 15:08 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Paolo Bonzini
2016-05-11 15:08 ` [Qemu-devel] [Nbd] " Paolo Bonzini
2016-05-10 17:55 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Paolo Bonzini
2016-05-10 17:55 ` [Qemu-devel] [Nbd] " Paolo Bonzini
2016-05-11 21:12 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Wouter Verhelst
2016-05-11 21:12 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-05-12 15:33 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-12 15:33 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 15:41 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-10 15:41 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 15:46 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Eric Blake
2016-05-10 15:46 ` [Qemu-devel] [Nbd] " Eric Blake
2016-05-10 15:52 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-10 15:52 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 15:54 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Quentin Casasnovas
2016-05-10 15:54 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
2016-05-10 16:33 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Quentin Casasnovas
2016-05-10 16:33 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
2016-05-10 20:24 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Eric Blake
2016-05-10 20:24 ` [Qemu-devel] [Nbd] " Eric Blake
2016-05-10 19:13 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Michał Belczyk
2016-05-10 19:13 ` [Qemu-devel] [Nbd] " Michał Belczyk
2016-05-11 21:10 ` Wouter Verhelst [this message]
2016-05-11 21:10 ` Wouter Verhelst
2016-05-11 21:06 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Wouter Verhelst
2016-05-11 21:06 ` [Qemu-devel] [Nbd] " Wouter Verhelst
2016-05-12 15:03 ` [Qemu-trivial] [Nbd] [Qemu-devel] " Alex Bligh
2016-05-12 15:03 ` [Qemu-devel] [Nbd] " Alex Bligh
2016-05-10 20:34 ` [Qemu-trivial] [Qemu-devel] " Eric Blake
2016-05-10 20:34 ` Eric Blake
2016-05-11 8:34 ` [Qemu-trivial] " Quentin Casasnovas
2016-05-11 8:34 ` Quentin Casasnovas
2016-05-11 14:11 ` [Qemu-trivial] " Eric Blake
2016-05-11 14:11 ` Eric Blake
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160511211020.GC5054@grep.be \
--to=w@uter.be \
--cc=alex@alex.org.uk \
--cc=eblake@redhat.com \
--cc=nbd-general@lists.sourceforge.net \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
--cc=qemu-trivial@nongnu.org \
--cc=quentin.casasnovas@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.