* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:29 ` Eric Blake
0 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-10 15:29 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 3461 bytes --]
On 05/10/2016 09:08 AM, Alex Bligh wrote:
> Eric,
>
>> Hmm. The current wording of the experimental block size additions does
>> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
>> maximum NBD_CMD_WRITE:
>> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
>
> Correct
>
>> Maybe we should revisit that in the spec, and/or advertise yet another
>> block size (since the maximum size for a trim and/or write_zeroes
>> request may indeed be different than the maximum size for a read/write).
>
> I think it's up to the server to either handle large requests, or
> for the client to break these up.
But the question at hand here is whether we should permit servers to
advertise multiple maximum block sizes (one for read/write, another one
for trim/write_zero, or even two [at least qemu tracks a separate
maximum trim vs. write_zero sizing in its generic block layer]), or
merely stick with the current wording that requires clients that honor
maximum block size to obey the same maximum for ALL commands, regardless
of amount of data sent over the wire.
>
> The core problem here is that the kernel (and, ahem, most servers) are
> ignorant of the block size extension, and need to guess how to break
> things up. In my view the client (kernel in this case) should
> be breaking the trim requests up into whatever size it uses as the
> maximum size write requests. But then it would have to know about block
> sizes which are in (another) experimental extension.
Correct - no one has yet patched the kernel to honor block sizes
advertised through what is currently an experimental extension. (We
have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
minimum block size, but I haven't audited whether the kernel actually
guarantees that all client requests are sent aligned to the value passed
that way - but we have nothing to set the maximum size, and are at the
mercy of however the kernel currently decides to split large requests).
So the kernel is currently one of the clients that does NOT honor block
sizes, and as such, servers should be prepared for ANY size up to
UINT_MAX (other than DoS handling). My question above only applies to
clients that use the experimental block size extensions.
>
> I've nothing against you fixing it qemu's server (after all I don't
> think there is anything to /prohibit/ a server working on something
> larger than the maximum block size), and ...
>
>> But since the kernel is the one sending the large length request, and
>> since you are right that this is not a denial-of-service in the amount
>> of data being sent in a single NBD message,
>
> ... currently there isn't actually a maximum block size advertised (that
> being in another experimental addition), so this is just DoS prevention,
> which qemu is free to do how it wishes.
Okay, sounds like that part is uncontroversial, and it is indeed worth
improving qemu's behavior.
>
> What surprises me is that a release kernel is using experimental
> NBD extensions; there's no guarantee these won't change. Or does
> fstrim work some other way?
No extension in play. The kernel is obeying NBD_FLAG_SEND_TRIM, which
is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
extensions.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:29 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-10 15:38 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:38 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]
Eric,
On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>>> Maybe we should revisit that in the spec, and/or advertise yet another
>>> block size (since the maximum size for a trim and/or write_zeroes
>>> request may indeed be different than the maximum size for a read/write).
>>
>> I think it's up to the server to either handle large requests, or
>> for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
In my view, we should not change this. Block sizes maxima are not there
to support DoS prevention (that's a separate phrase). They are there
to impose maximum block sizes. Adding a different maximum block size
for different commands is way too overengineered. There are after
all reasons (especially without structured replies) why you'd want
different maximum block sizes for writes and reads. If clients support
block sizes, they will necessarily have to have the infrastructure
to break requests up.
IE maximum block size should continue to mean maximum block size.
>>
>> The core problem here is that the kernel (and, ahem, most servers) are
>> ignorant of the block size extension, and need to guess how to break
>> things up. In my view the client (kernel in this case) should
>> be breaking the trim requests up into whatever size it uses as the
>> maximum size write requests. But then it would have to know about block
>> sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension.
Unsurprising at it's still experimental, and only settled down a couple
of weeks ago :-)
> (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size,
Technically that is 'out of band transmission of block size
constraints' :-)
> but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size,
indeed
> and are at the
> mercy of however the kernel currently decides to split large requests).
I am surprised TRIM doesn't get broken up the same way READ and WRITE
do.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).
Or not to permit a connection.
> My question above only applies to
> clients that use the experimental block size extensions.
Indeed.
>> What surprises me is that a release kernel is using experimental
>> NBD extensions; there's no guarantee these won't change. Or does
>> fstrim work some other way?
>
> No extension in play. The kernel is obeying NBD_FLAG_SEND_TRIM, which
> is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
> extensions.
My mistake. I was confusing 'WRITE_ZEROES' with 'TRIM'.
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:38 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:38 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]
Eric,
On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>>> Maybe we should revisit that in the spec, and/or advertise yet another
>>> block size (since the maximum size for a trim and/or write_zeroes
>>> request may indeed be different than the maximum size for a read/write).
>>
>> I think it's up to the server to either handle large requests, or
>> for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
In my view, we should not change this. Block sizes maxima are not there
to support DoS prevention (that's a separate phrase). They are there
to impose maximum block sizes. Adding a different maximum block size
for different commands is way too overengineered. There are after
all reasons (especially without structured replies) why you'd want
different maximum block sizes for writes and reads. If clients support
block sizes, they will necessarily have to have the infrastructure
to break requests up.
IE maximum block size should continue to mean maximum block size.
>>
>> The core problem here is that the kernel (and, ahem, most servers) are
>> ignorant of the block size extension, and need to guess how to break
>> things up. In my view the client (kernel in this case) should
>> be breaking the trim requests up into whatever size it uses as the
>> maximum size write requests. But then it would have to know about block
>> sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension.
Unsurprising at it's still experimental, and only settled down a couple
of weeks ago :-)
> (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size,
Technically that is 'out of band transmission of block size
constraints' :-)
> but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size,
indeed
> and are at the
> mercy of however the kernel currently decides to split large requests).
I am surprised TRIM doesn't get broken up the same way READ and WRITE
do.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).
Or not to permit a connection.
> My question above only applies to
> clients that use the experimental block size extensions.
Indeed.
>> What surprises me is that a release kernel is using experimental
>> NBD extensions; there's no guarantee these won't change. Or does
>> fstrim work some other way?
>
> No extension in play. The kernel is obeying NBD_FLAG_SEND_TRIM, which
> is in the normative standard, and unrelated to the INFO/BLOCK_SIZE
> extensions.
My mistake. I was confusing 'WRITE_ZEROES' with 'TRIM'.
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:38 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 15:45 ` Quentin Casasnovas
-1 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 15:45 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> Eric,
>
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> >>> Maybe we should revisit that in the spec, and/or advertise yet another
> >>> block size (since the maximum size for a trim and/or write_zeroes
> >>> request may indeed be different than the maximum size for a read/write).
> >>
> >> I think it's up to the server to either handle large requests, or
> >> for the client to break these up.
> >
> > But the question at hand here is whether we should permit servers to
> > advertise multiple maximum block sizes (one for read/write, another one
> > for trim/write_zero, or even two [at least qemu tracks a separate
> > maximum trim vs. write_zero sizing in its generic block layer]), or
> > merely stick with the current wording that requires clients that honor
> > maximum block size to obey the same maximum for ALL commands, regardless
> > of amount of data sent over the wire.
>
> In my view, we should not change this. Block sizes maxima are not there
> to support DoS prevention (that's a separate phrase). They are there
> to impose maximum block sizes. Adding a different maximum block size
> for different commands is way too overengineered. There are after
> all reasons (especially without structured replies) why you'd want
> different maximum block sizes for writes and reads. If clients support
> block sizes, they will necessarily have to have the infrastructure
> to break requests up.
>
> IE maximum block size should continue to mean maximum block size.
>
> >>
> >> The core problem here is that the kernel (and, ahem, most servers) are
> >> ignorant of the block size extension, and need to guess how to break
> >> things up. In my view the client (kernel in this case) should
> >> be breaking the trim requests up into whatever size it uses as the
> >> maximum size write requests. But then it would have to know about block
> >> sizes which are in (another) experimental extension.
> >
> > Correct - no one has yet patched the kernel to honor block sizes
> > advertised through what is currently an experimental extension.
>
> Unsurprising at it's still experimental, and only settled down a couple
> of weeks ago :-)
>
> > (We
> > have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> > minimum block size,
>
> Technically that is 'out of band transmission of block size
> constraints' :-)
>
> > but I haven't audited whether the kernel actually
> > guarantees that all client requests are sent aligned to the value passed
> > that way - but we have nothing to set the maximum size,
>
> indeed
>
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
>
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
>
I'm by no mean an expert in this, but why would the kernel break up those
TRIM commands? After all, breaking things up makes sense when the length
of the request is big, not that much when it only contains the request
header, which is the case for TRIM commands.
What am I missing?
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:45 ` Quentin Casasnovas
0 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 15:45 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> Eric,
>
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> >>> Maybe we should revisit that in the spec, and/or advertise yet another
> >>> block size (since the maximum size for a trim and/or write_zeroes
> >>> request may indeed be different than the maximum size for a read/write).
> >>
> >> I think it's up to the server to either handle large requests, or
> >> for the client to break these up.
> >
> > But the question at hand here is whether we should permit servers to
> > advertise multiple maximum block sizes (one for read/write, another one
> > for trim/write_zero, or even two [at least qemu tracks a separate
> > maximum trim vs. write_zero sizing in its generic block layer]), or
> > merely stick with the current wording that requires clients that honor
> > maximum block size to obey the same maximum for ALL commands, regardless
> > of amount of data sent over the wire.
>
> In my view, we should not change this. Block sizes maxima are not there
> to support DoS prevention (that's a separate phrase). They are there
> to impose maximum block sizes. Adding a different maximum block size
> for different commands is way too overengineered. There are after
> all reasons (especially without structured replies) why you'd want
> different maximum block sizes for writes and reads. If clients support
> block sizes, they will necessarily have to have the infrastructure
> to break requests up.
>
> IE maximum block size should continue to mean maximum block size.
>
> >>
> >> The core problem here is that the kernel (and, ahem, most servers) are
> >> ignorant of the block size extension, and need to guess how to break
> >> things up. In my view the client (kernel in this case) should
> >> be breaking the trim requests up into whatever size it uses as the
> >> maximum size write requests. But then it would have to know about block
> >> sizes which are in (another) experimental extension.
> >
> > Correct - no one has yet patched the kernel to honor block sizes
> > advertised through what is currently an experimental extension.
>
> Unsurprising at it's still experimental, and only settled down a couple
> of weeks ago :-)
>
> > (We
> > have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> > minimum block size,
>
> Technically that is 'out of band transmission of block size
> constraints' :-)
>
> > but I haven't audited whether the kernel actually
> > guarantees that all client requests are sent aligned to the value passed
> > that way - but we have nothing to set the maximum size,
>
> indeed
>
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
>
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
>
I'm by no mean an expert in this, but why would the kernel break up those
TRIM commands? After all, breaking things up makes sense when the length
of the request is big, not that much when it only contains the request
header, which is the case for TRIM commands.
What am I missing?
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:45 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
@ 2016-05-10 15:49 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:49 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
> I'm by no mean an expert in this, but why would the kernel break up those
> TRIM commands? After all, breaking things up makes sense when the length
> of the request is big, not that much when it only contains the request
> header, which is the case for TRIM commands.
1. You are assuming that the only reason for limiting the size of operations is
limiting the data transferred within one request. That is not necessarily
the case. There are good reasons (if only orthogonality) to have limits
in place even where no data is transferred.
2. As and when the blocksize extension is implemented in the kernel (it isn't
now), the protocol requires it.
3. The maximum length of an NBD trim operation is 2^32. The maximum length
of a trim operation is larger. Therefore the kernel needs to do at least
some breaking up.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:49 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:49 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
> I'm by no mean an expert in this, but why would the kernel break up those
> TRIM commands? After all, breaking things up makes sense when the length
> of the request is big, not that much when it only contains the request
> header, which is the case for TRIM commands.
1. You are assuming that the only reason for limiting the size of operations is
limiting the data transferred within one request. That is not necessarily
the case. There are good reasons (if only orthogonality) to have limits
in place even where no data is transferred.
2. As and when the blocksize extension is implemented in the kernel (it isn't
now), the protocol requires it.
3. The maximum length of an NBD trim operation is 2^32. The maximum length
of a trim operation is larger. Therefore the kernel needs to do at least
some breaking up.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:49 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 16:04 ` Quentin Casasnovas
-1 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:04 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 04:49:57PM +0100, Alex Bligh wrote:
>
> On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
>
> > I'm by no mean an expert in this, but why would the kernel break up those
> > TRIM commands? After all, breaking things up makes sense when the length
> > of the request is big, not that much when it only contains the request
> > header, which is the case for TRIM commands.
>
> 1. You are assuming that the only reason for limiting the size of operations is
> limiting the data transferred within one request. That is not necessarily
> the case. There are good reasons (if only orthogonality) to have limits
> in place even where no data is transferred.
>
> 2. As and when the blocksize extension is implemented in the kernel (it isn't
> now), the protocol requires it.
>
> 3. The maximum length of an NBD trim operation is 2^32. The maximum length
> of a trim operation is larger. Therefore the kernel needs to do at least
> some breaking up.
Alright, I assumed by 'length of an NBD request', the specification was
talking about the length of.. well, the request as opposed to whatever is
in the length field of the request header.
Is there a use case where you'd want to split up a single big TRIM request
in smaller ones (as in some hardware would not support it or something)?
Even then, it looks like this splitting up would be hardware dependant and
better implemented in block device drivers.
I'm just finding odd that something that fits inside the length field can't
be used. I do agree with your point number 3, obviously if the lenght
field type doesn't allow something bigger than a u32, then the kernel has
to do some breaking up in that case.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 16:04 ` Quentin Casasnovas
0 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:04 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 04:49:57PM +0100, Alex Bligh wrote:
>
> On 10 May 2016, at 16:45, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
>
> > I'm by no mean an expert in this, but why would the kernel break up those
> > TRIM commands? After all, breaking things up makes sense when the length
> > of the request is big, not that much when it only contains the request
> > header, which is the case for TRIM commands.
>
> 1. You are assuming that the only reason for limiting the size of operations is
> limiting the data transferred within one request. That is not necessarily
> the case. There are good reasons (if only orthogonality) to have limits
> in place even where no data is transferred.
>
> 2. As and when the blocksize extension is implemented in the kernel (it isn't
> now), the protocol requires it.
>
> 3. The maximum length of an NBD trim operation is 2^32. The maximum length
> of a trim operation is larger. Therefore the kernel needs to do at least
> some breaking up.
Alright, I assumed by 'length of an NBD request', the specification was
talking about the length of.. well, the request as opposed to whatever is
in the length field of the request header.
Is there a use case where you'd want to split up a single big TRIM request
in smaller ones (as in some hardware would not support it or something)?
Even then, it looks like this splitting up would be hardware dependant and
better implemented in block device drivers.
I'm just finding odd that something that fits inside the length field can't
be used. I do agree with your point number 3, obviously if the lenght
field type doesn't allow something bigger than a u32, then the kernel has
to do some breaking up in that case.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 16:04 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
@ 2016-05-10 16:23 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 16:23 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10 May 2016, at 17:04, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
> Alright, I assumed by 'length of an NBD request', the specification was
> talking about the length of.. well, the request as opposed to whatever is
> in the length field of the request header.
With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
the length of data to trim, not the length of the data transferred (which
is none).
> Is there a use case where you'd want to split up a single big TRIM request
> in smaller ones (as in some hardware would not support it or something)?
> Even then, it looks like this splitting up would be hardware dependant and
> better implemented in block device drivers.
Part of the point of the block size extension is to push such limits to the
client.
I could make up use cases (e.g. that performing a multi-gigabyte trim in
a single threaded server will effectively block all other I/O), but the
main reason in my book is orthogonality, and the fact the client needs
to do some breaking up anyway.
> I'm just finding odd that something that fits inside the length field can't
> be used.
That's a different point. That's Qemu's 'Denial of Service Attack'
prevention, *not* maximum block sizes. It isn't dropping it because
of a maximum block size parameter. If it doesn't support the block size
extension which the version you're looking at does not, it's meant
to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
so long as to cause a denial of service attack. As this doesn't fit
into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
I agree Qemu should fix that.
(So in a sense Eric and I are arguing about something irrelevant to
your current problem, which is how this would work /with/ the block
size extensions, as Eric is in the process of implementing them).
> I do agree with your point number 3, obviously if the lenght
> field type doesn't allow something bigger than a u32, then the kernel has
> to do some breaking up in that case.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 16:23 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 16:23 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10 May 2016, at 17:04, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
> Alright, I assumed by 'length of an NBD request', the specification was
> talking about the length of.. well, the request as opposed to whatever is
> in the length field of the request header.
With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
the length of data to trim, not the length of the data transferred (which
is none).
> Is there a use case where you'd want to split up a single big TRIM request
> in smaller ones (as in some hardware would not support it or something)?
> Even then, it looks like this splitting up would be hardware dependant and
> better implemented in block device drivers.
Part of the point of the block size extension is to push such limits to the
client.
I could make up use cases (e.g. that performing a multi-gigabyte trim in
a single threaded server will effectively block all other I/O), but the
main reason in my book is orthogonality, and the fact the client needs
to do some breaking up anyway.
> I'm just finding odd that something that fits inside the length field can't
> be used.
That's a different point. That's Qemu's 'Denial of Service Attack'
prevention, *not* maximum block sizes. It isn't dropping it because
of a maximum block size parameter. If it doesn't support the block size
extension which the version you're looking at does not, it's meant
to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
so long as to cause a denial of service attack. As this doesn't fit
into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
I agree Qemu should fix that.
(So in a sense Eric and I are arguing about something irrelevant to
your current problem, which is how this would work /with/ the block
size extensions, as Eric is in the process of implementing them).
> I do agree with your point number 3, obviously if the lenght
> field type doesn't allow something bigger than a u32, then the kernel has
> to do some breaking up in that case.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 16:23 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 16:27 ` Quentin Casasnovas
-1 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:27 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 05:23:21PM +0100, Alex Bligh wrote:
>
> On 10 May 2016, at 17:04, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
>
> > Alright, I assumed by 'length of an NBD request', the specification was
> > talking about the length of.. well, the request as opposed to whatever is
> > in the length field of the request header.
>
> With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
> the length of data to trim, not the length of the data transferred (which
> is none).
>
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
>
> Part of the point of the block size extension is to push such limits to the
> client.
>
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
>
> > I'm just finding odd that something that fits inside the length field can't
> > be used.
>
> That's a different point. That's Qemu's 'Denial of Service Attack'
> prevention, *not* maximum block sizes. It isn't dropping it because
> of a maximum block size parameter. If it doesn't support the block size
> extension which the version you're looking at does not, it's meant
> to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
> so long as to cause a denial of service attack. As this doesn't fit
> into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
>
> I agree Qemu should fix that.
>
> (So in a sense Eric and I are arguing about something irrelevant to
> your current problem, which is how this would work /with/ the block
> size extensions, as Eric is in the process of implementing them).
>
Riight! OK understood, thanks for the explanation.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 16:27 ` Quentin Casasnovas
0 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:27 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, Eric Blake, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 05:23:21PM +0100, Alex Bligh wrote:
>
> On 10 May 2016, at 17:04, Quentin Casasnovas <quentin.casasnovas@oracle.com> wrote:
>
> > Alright, I assumed by 'length of an NBD request', the specification was
> > talking about the length of.. well, the request as opposed to whatever is
> > in the length field of the request header.
>
> With NBD_CMD_TRIM the length in the header field is 32 bit and specifies
> the length of data to trim, not the length of the data transferred (which
> is none).
>
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
>
> Part of the point of the block size extension is to push such limits to the
> client.
>
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
>
> > I'm just finding odd that something that fits inside the length field can't
> > be used.
>
> That's a different point. That's Qemu's 'Denial of Service Attack'
> prevention, *not* maximum block sizes. It isn't dropping it because
> of a maximum block size parameter. If it doesn't support the block size
> extension which the version you're looking at does not, it's meant
> to handle requests up to 2^32-1 long EXCEPT that it MAY error requests
> so long as to cause a denial of service attack. As this doesn't fit
> into that case (it's a TRIM), it shouldn't be erroring it on that grounds.
>
> I agree Qemu should fix that.
>
> (So in a sense Eric and I are arguing about something irrelevant to
> your current problem, which is how this would work /with/ the block
> size extensions, as Eric is in the process of implementing them).
>
Riight! OK understood, thanks for the explanation.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 16:23 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-11 9:38 ` Paolo Bonzini
-1 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-11 9:38 UTC (permalink / raw)
To: Alex Bligh, Quentin Casasnovas
Cc: Eric Blake, qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10/05/2016 18:23, Alex Bligh wrote:
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
>
> Part of the point of the block size extension is to push such limits to the
> client.
>
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
That's why SCSI for example has a limit to UNMAP and WRITE SAME requests
(actually it has three limits: number of ranges per unmap, which in NBD
and in SCSI WRITE SAME is 1; number of blocks per unmap descriptor;
number of blocks per WRITE SAME operation). These limits however are a
different one than the read/write limit, because you want to support
systems where TRIM is much faster than regular I/O (and possibly even
O(1) if trimming something that is already trimmed).
Paolo
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 9:38 ` Paolo Bonzini
0 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-11 9:38 UTC (permalink / raw)
To: Alex Bligh, Quentin Casasnovas
Cc: Eric Blake, qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On 10/05/2016 18:23, Alex Bligh wrote:
> > Is there a use case where you'd want to split up a single big TRIM request
> > in smaller ones (as in some hardware would not support it or something)?
> > Even then, it looks like this splitting up would be hardware dependant and
> > better implemented in block device drivers.
>
> Part of the point of the block size extension is to push such limits to the
> client.
>
> I could make up use cases (e.g. that performing a multi-gigabyte trim in
> a single threaded server will effectively block all other I/O), but the
> main reason in my book is orthogonality, and the fact the client needs
> to do some breaking up anyway.
That's why SCSI for example has a limit to UNMAP and WRITE SAME requests
(actually it has three limits: number of ranges per unmap, which in NBD
and in SCSI WRITE SAME is 1; number of blocks per unmap descriptor;
number of blocks per WRITE SAME operation). These limits however are a
different one than the read/write limit, because you want to support
systems where TRIM is much faster than regular I/O (and possibly even
O(1) if trimming something that is already trimmed).
Paolo
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-11 9:38 ` [Qemu-devel] [Nbd] " Paolo Bonzini
@ 2016-05-11 14:08 ` Eric Blake
-1 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-11 14:08 UTC (permalink / raw)
To: Paolo Bonzini, Alex Bligh, Quentin Casasnovas
Cc: qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 1881 bytes --]
On 05/11/2016 03:38 AM, Paolo Bonzini wrote:
>
>
> On 10/05/2016 18:23, Alex Bligh wrote:
>>> Is there a use case where you'd want to split up a single big TRIM request
>>> in smaller ones (as in some hardware would not support it or something)?
>>> Even then, it looks like this splitting up would be hardware dependant and
>>> better implemented in block device drivers.
>>
>> Part of the point of the block size extension is to push such limits to the
>> client.
>>
>> I could make up use cases (e.g. that performing a multi-gigabyte trim in
>> a single threaded server will effectively block all other I/O), but the
>> main reason in my book is orthogonality, and the fact the client needs
>> to do some breaking up anyway.
>
> That's why SCSI for example has a limit to UNMAP and WRITE SAME requests
> (actually it has three limits: number of ranges per unmap, which in NBD
> and in SCSI WRITE SAME is 1; number of blocks per unmap descriptor;
> number of blocks per WRITE SAME operation). These limits however are a
> different one than the read/write limit, because you want to support
> systems where TRIM is much faster than regular I/O (and possibly even
> O(1) if trimming something that is already trimmed).
Then I think I will propose a doc patch to the extension-info branch
that adds new INFO items for NBD_INFO_TRIM_SIZE and
NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
then this can be larger than the normal maximum size in
NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
and the two infos are separate items because NBD_FLAG_SEND_TRIM and
NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 14:08 ` Eric Blake
0 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-11 14:08 UTC (permalink / raw)
To: Paolo Bonzini, Alex Bligh, Quentin Casasnovas
Cc: qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 1881 bytes --]
On 05/11/2016 03:38 AM, Paolo Bonzini wrote:
>
>
> On 10/05/2016 18:23, Alex Bligh wrote:
>>> Is there a use case where you'd want to split up a single big TRIM request
>>> in smaller ones (as in some hardware would not support it or something)?
>>> Even then, it looks like this splitting up would be hardware dependant and
>>> better implemented in block device drivers.
>>
>> Part of the point of the block size extension is to push such limits to the
>> client.
>>
>> I could make up use cases (e.g. that performing a multi-gigabyte trim in
>> a single threaded server will effectively block all other I/O), but the
>> main reason in my book is orthogonality, and the fact the client needs
>> to do some breaking up anyway.
>
> That's why SCSI for example has a limit to UNMAP and WRITE SAME requests
> (actually it has three limits: number of ranges per unmap, which in NBD
> and in SCSI WRITE SAME is 1; number of blocks per unmap descriptor;
> number of blocks per WRITE SAME operation). These limits however are a
> different one than the read/write limit, because you want to support
> systems where TRIM is much faster than regular I/O (and possibly even
> O(1) if trimming something that is already trimmed).
Then I think I will propose a doc patch to the extension-info branch
that adds new INFO items for NBD_INFO_TRIM_SIZE and
NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
then this can be larger than the normal maximum size in
NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
and the two infos are separate items because NBD_FLAG_SEND_TRIM and
NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-11 14:08 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-11 14:55 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-11 14:55 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Paolo Bonzini, Quentin Casasnovas,
qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 699 bytes --]
On 11 May 2016, at 15:08, Eric Blake <eblake@redhat.com> wrote:
> Then I think I will propose a doc patch to the extension-info branch
> that adds new INFO items for NBD_INFO_TRIM_SIZE and
> NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
> then this can be larger than the normal maximum size in
> NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
> assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
> and the two infos are separate items because NBD_FLAG_SEND_TRIM and
> NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
mmmm ...
at this rate you'd be better with a list of command / maximum size
tuples!
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 14:55 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-11 14:55 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Paolo Bonzini, Quentin Casasnovas,
qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 699 bytes --]
On 11 May 2016, at 15:08, Eric Blake <eblake@redhat.com> wrote:
> Then I think I will propose a doc patch to the extension-info branch
> that adds new INFO items for NBD_INFO_TRIM_SIZE and
> NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
> then this can be larger than the normal maximum size in
> NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
> assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
> and the two infos are separate items because NBD_FLAG_SEND_TRIM and
> NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
mmmm ...
at this rate you'd be better with a list of command / maximum size
tuples!
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-11 14:55 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-11 15:08 ` Paolo Bonzini
-1 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-11 15:08 UTC (permalink / raw)
To: Alex Bligh, Eric Blake
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
[-- Attachment #1.1: Type: text/plain, Size: 858 bytes --]
On 11/05/2016 16:55, Alex Bligh wrote:
>> > Then I think I will propose a doc patch to the extension-info branch
>> > that adds new INFO items for NBD_INFO_TRIM_SIZE and
>> > NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
>> > then this can be larger than the normal maximum size in
>> > NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
>> > assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
>> > and the two infos are separate items because NBD_FLAG_SEND_TRIM and
>> > NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
> mmmm ...
>
> at this rate you'd be better with a list of command / maximum size
> tuples!
Not sure if serious or not, but anyway I think enumerating the items
individually should be enough. It's not like commands grow on trees. :)
Paolo
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 15:08 ` Paolo Bonzini
0 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-11 15:08 UTC (permalink / raw)
To: Alex Bligh, Eric Blake
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 858 bytes --]
On 11/05/2016 16:55, Alex Bligh wrote:
>> > Then I think I will propose a doc patch to the extension-info branch
>> > that adds new INFO items for NBD_INFO_TRIM_SIZE and
>> > NBD_INFO_WRITE_ZERO_SIZE (if requested by client and replied by server,
>> > then this can be larger than the normal maximum size in
>> > NBD_INFO_BLOCK_SIZE; if either side doesn't request the info, then
>> > assume any maximum in NBD_INFO_BLOCK_SIZE applies, otherwise UINT32_MAX;
>> > and the two infos are separate items because NBD_FLAG_SEND_TRIM and
>> > NBD_FLAG_SEND_WRITE_ZEROES are orthogonally optional).
> mmmm ...
>
> at this rate you'd be better with a list of command / maximum size
> tuples!
Not sure if serious or not, but anyway I think enumerating the items
individually should be enough. It's not like commands grow on trees. :)
Paolo
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:38 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 17:55 ` Paolo Bonzini
-1 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-10 17:55 UTC (permalink / raw)
To: Alex Bligh, Eric Blake
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
On 10/05/2016 17:38, Alex Bligh wrote:
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
>
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
The payload of TRIM has constant size, so it makes sense to do that.
The kernel then self-imposes a 2GB limit in blkdev_issue_discard.
On the other hand, READ and WRITE of size n can possibly require O(n)
memory.
Paolo
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 17:55 ` Paolo Bonzini
0 siblings, 0 replies; 60+ messages in thread
From: Paolo Bonzini @ 2016-05-10 17:55 UTC (permalink / raw)
To: Alex Bligh, Eric Blake
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
On 10/05/2016 17:38, Alex Bligh wrote:
> > and are at the
> > mercy of however the kernel currently decides to split large requests).
>
> I am surprised TRIM doesn't get broken up the same way READ and WRITE
> do.
The payload of TRIM has constant size, so it makes sense to do that.
The kernel then self-imposes a 2GB limit in blkdev_issue_discard.
On the other hand, READ and WRITE of size n can possibly require O(n)
memory.
Paolo
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:38 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-11 21:12 ` Wouter Verhelst
-1 siblings, 0 replies; 60+ messages in thread
From: Wouter Verhelst @ 2016-05-11 21:12 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> > So the kernel is currently one of the clients that does NOT honor block
> > sizes, and as such, servers should be prepared for ANY size up to
> > UINT_MAX (other than DoS handling).
>
> Or not to permit a connection.
Right -- and this is why I was recommending against making this a MUST in the
first place.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
^ permalink raw reply [flat|nested] 60+ messages in thread* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 21:12 ` Wouter Verhelst
0 siblings, 0 replies; 60+ messages in thread
From: Wouter Verhelst @ 2016-05-11 21:12 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> > So the kernel is currently one of the clients that does NOT honor block
> > sizes, and as such, servers should be prepared for ANY size up to
> > UINT_MAX (other than DoS handling).
>
> Or not to permit a connection.
Right -- and this is why I was recommending against making this a MUST in the
first place.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
^ permalink raw reply [flat|nested] 60+ messages in thread* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-11 21:12 ` [Qemu-devel] [Nbd] " Wouter Verhelst
@ 2016-05-12 15:33 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-12 15:33 UTC (permalink / raw)
To: Wouter Verhelst
Cc: Alex Bligh, Eric Blake, nbd-general@lists.sourceforge.net,
qemu block, qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On 11 May 2016, at 22:12, Wouter Verhelst <w@uter.be> wrote:
> On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
>> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>>> So the kernel is currently one of the clients that does NOT honor block
>>> sizes, and as such, servers should be prepared for ANY size up to
>>> UINT_MAX (other than DoS handling).
>>
>> Or not to permit a connection.
>
> Right -- and this is why I was recommending against making this a MUST in the
> first place.
Indeed, and it currently is a 'MAY':
> except that if a server believes a client's behaviour constitutes
> a denial of service, it MAY initiate a hard disconnect.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-12 15:33 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-12 15:33 UTC (permalink / raw)
To: Wouter Verhelst
Cc: Alex Bligh, Eric Blake, nbd-general@lists.sourceforge.net,
qemu block, qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On 11 May 2016, at 22:12, Wouter Verhelst <w@uter.be> wrote:
> On Tue, May 10, 2016 at 04:38:29PM +0100, Alex Bligh wrote:
>> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>>> So the kernel is currently one of the clients that does NOT honor block
>>> sizes, and as such, servers should be prepared for ANY size up to
>>> UINT_MAX (other than DoS handling).
>>
>> Or not to permit a connection.
>
> Right -- and this is why I was recommending against making this a MUST in the
> first place.
Indeed, and it currently is a 'MAY':
> except that if a server believes a client's behaviour constitutes
> a denial of service, it MAY initiate a hard disconnect.
--
Alex Bligh
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:29 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-10 15:41 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:41 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 668 bytes --]
On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).
Interesting followup question:
If the kernel does not fragment TRIM requests at all (in the
same way it fragments read and write requests), I suspect
something bad may happen with TRIM requests over 2^31
in size (particularly over 2^32 in size), as the length
field in nbd only has 32 bits.
Whether it supports block size constraints or not, it is
going to need to do *some* breaking up of requests.
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:41 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:41 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 668 bytes --]
On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling).
Interesting followup question:
If the kernel does not fragment TRIM requests at all (in the
same way it fragments read and write requests), I suspect
something bad may happen with TRIM requests over 2^31
in size (particularly over 2^32 in size), as the length
field in nbd only has 32 bits.
Whether it supports block size constraints or not, it is
going to need to do *some* breaking up of requests.
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:41 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 15:46 ` Eric Blake
-1 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-10 15:46 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]
On 05/10/2016 09:41 AM, Alex Bligh wrote:
>
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
>
> Interesting followup question:
>
> If the kernel does not fragment TRIM requests at all (in the
> same way it fragments read and write requests), I suspect
> something bad may happen with TRIM requests over 2^31
> in size (particularly over 2^32 in size), as the length
> field in nbd only has 32 bits.
>
> Whether it supports block size constraints or not, it is
> going to need to do *some* breaking up of requests.
Does anyone have an easy way to cause the kernel to request a trim
operation that large on a > 4G export? I'm not familiar enough with
EXT4 operation to know what file system operations you can run to
ultimately indirectly create a file system trim operation that large.
But maybe there is something simpler - does the kernel let you use the
fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:46 ` Eric Blake
0 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-10 15:46 UTC (permalink / raw)
To: Alex Bligh
Cc: Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 1318 bytes --]
On 05/10/2016 09:41 AM, Alex Bligh wrote:
>
> On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
>
> Interesting followup question:
>
> If the kernel does not fragment TRIM requests at all (in the
> same way it fragments read and write requests), I suspect
> something bad may happen with TRIM requests over 2^31
> in size (particularly over 2^32 in size), as the length
> field in nbd only has 32 bits.
>
> Whether it supports block size constraints or not, it is
> going to need to do *some* breaking up of requests.
Does anyone have an easy way to cause the kernel to request a trim
operation that large on a > 4G export? I'm not familiar enough with
EXT4 operation to know what file system operations you can run to
ultimately indirectly create a file system trim operation that large.
But maybe there is something simpler - does the kernel let you use the
fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:46 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-10 15:52 ` Alex Bligh
-1 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:52 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 743 bytes --]
On 10 May 2016, at 16:46, Eric Blake <eblake@redhat.com> wrote:
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export? I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
Not tried it, but fallocate(1) with -p ?
http://man7.org/linux/man-pages/man1/fallocate.1.html
As it takes length and offset in TB, PB, EB, ZB and YB, it
seems to be 64 bit aware :-)
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:52 ` Alex Bligh
0 siblings, 0 replies; 60+ messages in thread
From: Alex Bligh @ 2016-05-10 15:52 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
[-- Attachment #1: Type: text/plain, Size: 743 bytes --]
On 10 May 2016, at 16:46, Eric Blake <eblake@redhat.com> wrote:
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export? I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
Not tried it, but fallocate(1) with -p ?
http://man7.org/linux/man-pages/man1/fallocate.1.html
As it takes length and offset in TB, PB, EB, ZB and YB, it
seems to be 64 bit aware :-)
--
Alex Bligh
[-- Attachment #2: Message signed with OpenPGP using GPGMail --]
[-- Type: application/pgp-signature, Size: 842 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:46 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-10 15:54 ` Quentin Casasnovas
-1 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 15:54 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> On 05/10/2016 09:41 AM, Alex Bligh wrote:
> >
> > On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> >
> >> So the kernel is currently one of the clients that does NOT honor block
> >> sizes, and as such, servers should be prepared for ANY size up to
> >> UINT_MAX (other than DoS handling).
> >
> > Interesting followup question:
> >
> > If the kernel does not fragment TRIM requests at all (in the
> > same way it fragments read and write requests), I suspect
> > something bad may happen with TRIM requests over 2^31
> > in size (particularly over 2^32 in size), as the length
> > field in nbd only has 32 bits.
> >
> > Whether it supports block size constraints or not, it is
> > going to need to do *some* breaking up of requests.
>
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export? I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
>
It was fairly reproducible here, we just used a random qcow2 image with
some Debian minimal system pre-installed, mounted that qcow2 image through
qemu-nbd then compiled a whole kernel inside it. Then you can make clean
and run fstrim on the mount point. I'm assuming you can go faster than
that by just writing a big file to the qcow2 image mounted without -o
discard, delete the big file, then remount with -o discard + run fstrim.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 15:54 ` Quentin Casasnovas
0 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 15:54 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, Quentin Casasnovas, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> On 05/10/2016 09:41 AM, Alex Bligh wrote:
> >
> > On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> >
> >> So the kernel is currently one of the clients that does NOT honor block
> >> sizes, and as such, servers should be prepared for ANY size up to
> >> UINT_MAX (other than DoS handling).
> >
> > Interesting followup question:
> >
> > If the kernel does not fragment TRIM requests at all (in the
> > same way it fragments read and write requests), I suspect
> > something bad may happen with TRIM requests over 2^31
> > in size (particularly over 2^32 in size), as the length
> > field in nbd only has 32 bits.
> >
> > Whether it supports block size constraints or not, it is
> > going to need to do *some* breaking up of requests.
>
> Does anyone have an easy way to cause the kernel to request a trim
> operation that large on a > 4G export? I'm not familiar enough with
> EXT4 operation to know what file system operations you can run to
> ultimately indirectly create a file system trim operation that large.
> But maybe there is something simpler - does the kernel let you use the
> fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
>
It was fairly reproducible here, we just used a random qcow2 image with
some Debian minimal system pre-installed, mounted that qcow2 image through
qemu-nbd then compiled a whole kernel inside it. Then you can make clean
and run fstrim on the mount point. I'm assuming you can go faster than
that by just writing a big file to the qcow2 image mounted without -o
discard, delete the big file, then remount with -o discard + run fstrim.
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:54 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
@ 2016-05-10 16:33 ` Quentin Casasnovas
-1 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:33 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Eric Blake, Alex Bligh, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 05:54:44PM +0200, Quentin Casasnovas wrote:
> On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> > On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > >
> > > On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> > >
> > >> So the kernel is currently one of the clients that does NOT honor block
> > >> sizes, and as such, servers should be prepared for ANY size up to
> > >> UINT_MAX (other than DoS handling).
> > >
> > > Interesting followup question:
> > >
> > > If the kernel does not fragment TRIM requests at all (in the
> > > same way it fragments read and write requests), I suspect
> > > something bad may happen with TRIM requests over 2^31
> > > in size (particularly over 2^32 in size), as the length
> > > field in nbd only has 32 bits.
> > >
> > > Whether it supports block size constraints or not, it is
> > > going to need to do *some* breaking up of requests.
> >
> > Does anyone have an easy way to cause the kernel to request a trim
> > operation that large on a > 4G export? I'm not familiar enough with
> > EXT4 operation to know what file system operations you can run to
> > ultimately indirectly create a file system trim operation that large.
> > But maybe there is something simpler - does the kernel let you use the
> > fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> > FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> >
>
> It was fairly reproducible here, we just used a random qcow2 image with
> some Debian minimal system pre-installed, mounted that qcow2 image through
> qemu-nbd then compiled a whole kernel inside it. Then you can make clean
> and run fstrim on the mount point. I'm assuming you can go faster than
> that by just writing a big file to the qcow2 image mounted without -o
> discard, delete the big file, then remount with -o discard + run fstrim.
>
Looks like there's an easier way:
$ qemu-img create -f qcow2 foo.qcow2 10G
$ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
$ mkfs.ext4 /dev/nbd0
mke2fs 1.42.13 (17-May-2015)
Discarding device blocks: failed - Input/output error
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: 25aeb51f-0dea-4c1d-8b65-61f6bcdf97e9
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
Notice the "Discarding device blocks: failed - Input/output error" line, I
bet that it is mkfs.ext4 trying to trim all blocks prior to writing the
filesystem, but it gets an I/O error while doing so. I haven't verified it
is the same problem, but it it isn't, simply mount the resulting filesystem
and run fstrim on it:
$ mount -o discard /dev/nbd0 /tmp/foo
$ fstrim /tmp/foo
fstrim: /tmp/foo: FITRIM ioctl failed: Input/output error
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 16:33 ` Quentin Casasnovas
0 siblings, 0 replies; 60+ messages in thread
From: Quentin Casasnovas @ 2016-05-10 16:33 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Eric Blake, Alex Bligh, qemu-devel@nongnu.org,
qemu-trivial@nongnu.org, Paolo Bonzini,
nbd-general@lists.sourceforge.net, qemu-stable@nongnu.org,
qemu block
On Tue, May 10, 2016 at 05:54:44PM +0200, Quentin Casasnovas wrote:
> On Tue, May 10, 2016 at 09:46:36AM -0600, Eric Blake wrote:
> > On 05/10/2016 09:41 AM, Alex Bligh wrote:
> > >
> > > On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
> > >
> > >> So the kernel is currently one of the clients that does NOT honor block
> > >> sizes, and as such, servers should be prepared for ANY size up to
> > >> UINT_MAX (other than DoS handling).
> > >
> > > Interesting followup question:
> > >
> > > If the kernel does not fragment TRIM requests at all (in the
> > > same way it fragments read and write requests), I suspect
> > > something bad may happen with TRIM requests over 2^31
> > > in size (particularly over 2^32 in size), as the length
> > > field in nbd only has 32 bits.
> > >
> > > Whether it supports block size constraints or not, it is
> > > going to need to do *some* breaking up of requests.
> >
> > Does anyone have an easy way to cause the kernel to request a trim
> > operation that large on a > 4G export? I'm not familiar enough with
> > EXT4 operation to know what file system operations you can run to
> > ultimately indirectly create a file system trim operation that large.
> > But maybe there is something simpler - does the kernel let you use the
> > fallocate(2) syscall operation with FALLOC_FL_PUNCH_HOLE or
> > FALLOC_FL_ZERO_RANGE on an fd backed by an NBD device?
> >
>
> It was fairly reproducible here, we just used a random qcow2 image with
> some Debian minimal system pre-installed, mounted that qcow2 image through
> qemu-nbd then compiled a whole kernel inside it. Then you can make clean
> and run fstrim on the mount point. I'm assuming you can go faster than
> that by just writing a big file to the qcow2 image mounted without -o
> discard, delete the big file, then remount with -o discard + run fstrim.
>
Looks like there's an easier way:
$ qemu-img create -f qcow2 foo.qcow2 10G
$ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
$ mkfs.ext4 /dev/nbd0
mke2fs 1.42.13 (17-May-2015)
Discarding device blocks: failed - Input/output error
Creating filesystem with 2621440 4k blocks and 655360 inodes
Filesystem UUID: 25aeb51f-0dea-4c1d-8b65-61f6bcdf97e9
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
Notice the "Discarding device blocks: failed - Input/output error" line, I
bet that it is mkfs.ext4 trying to trim all blocks prior to writing the
filesystem, but it gets an I/O error while doing so. I haven't verified it
is the same problem, but it it isn't, simply mount the resulting filesystem
and run fstrim on it:
$ mount -o discard /dev/nbd0 /tmp/foo
$ fstrim /tmp/foo
fstrim: /tmp/foo: FITRIM ioctl failed: Input/output error
Quentin
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 16:33 ` [Qemu-devel] [Nbd] " Quentin Casasnovas
@ 2016-05-10 20:24 ` Eric Blake
-1 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-10 20:24 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
Paolo Bonzini, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On 05/10/2016 10:33 AM, Quentin Casasnovas wrote:
> Looks like there's an easier way:
>
> $ qemu-img create -f qcow2 foo.qcow2 10G
> $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
> $ mkfs.ext4 /dev/nbd0
> mke2fs 1.42.13 (17-May-2015)
> Discarding device blocks: failed - Input/output error
strace on mkfs.ext4 shows:
...
open("/dev/nbd1", O_RDWR|O_EXCL) = 3
stat("/dev/nbd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(43, 1), ...}) = 0
ioctl(3, BLKDISCARDZEROES, [0]) = 0
ioctl(3, BLKROGET, [0]) = 0
uname({sysname="Linux", nodename="red", ...}) = 0
ioctl(3, BLKDISCARD, [0, 1000000]) = 0
write(1, "Discarding device blocks: ", 26) = 26
write(1, " 4096/2621440", 15 4096/2621440) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
ioctl(3, BLKDISCARD, [1000000, 80000000]) = -1 EIO (Input/output error)
write(1, " ", 15 ) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
write(1, "failed - ", 9failed - ) = 9
so it does indeed look like the smaller request [0, 0x1000000] succeeded
while the larger [0x1000000, 0x80000000] failed with EIO. But I
instrumented my qemu-nbd server to output all of the incoming requests
it was receiving from the kernel, and I never saw a mention of
NBD_CMD_TRIM being received. Maybe it's dependent on what version of
the kernel and/or NBD kernel module you are running?
I also tried fallocate(1) on /dev/nbd1, as in:
# strace fallocate -o 0 -l 64k -p /dev/nbd1
but it looks like the kernel doesn't yet wire up fallocate(2) to forward
to the appropriate device ioctls and/or NBD_CMD_TRIM, where strace says:
open("/dev/nbd1", O_RDWR) = 3
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 65536) = -1
ENODEV (No such device)
The ENODEV is a rather unusual choice of error.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 20:24 ` Eric Blake
0 siblings, 0 replies; 60+ messages in thread
From: Eric Blake @ 2016-05-10 20:24 UTC (permalink / raw)
To: Quentin Casasnovas
Cc: Alex Bligh, qemu-devel@nongnu.org, qemu-trivial@nongnu.org,
Paolo Bonzini, nbd-general@lists.sourceforge.net,
qemu-stable@nongnu.org, qemu block
[-- Attachment #1: Type: text/plain, Size: 2005 bytes --]
On 05/10/2016 10:33 AM, Quentin Casasnovas wrote:
> Looks like there's an easier way:
>
> $ qemu-img create -f qcow2 foo.qcow2 10G
> $ qemu-nbd --discard=on -c /dev/nbd0 foo.qcow2
> $ mkfs.ext4 /dev/nbd0
> mke2fs 1.42.13 (17-May-2015)
> Discarding device blocks: failed - Input/output error
strace on mkfs.ext4 shows:
...
open("/dev/nbd1", O_RDWR|O_EXCL) = 3
stat("/dev/nbd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(43, 1), ...}) = 0
ioctl(3, BLKDISCARDZEROES, [0]) = 0
ioctl(3, BLKROGET, [0]) = 0
uname({sysname="Linux", nodename="red", ...}) = 0
ioctl(3, BLKDISCARD, [0, 1000000]) = 0
write(1, "Discarding device blocks: ", 26) = 26
write(1, " 4096/2621440", 15 4096/2621440) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
ioctl(3, BLKDISCARD, [1000000, 80000000]) = -1 EIO (Input/output error)
write(1, " ", 15 ) = 15
write(1, "\10\10\10\10\10\10\10\10\10\10\10\1) = 15
write(1, "failed - ", 9failed - ) = 9
so it does indeed look like the smaller request [0, 0x1000000] succeeded
while the larger [0x1000000, 0x80000000] failed with EIO. But I
instrumented my qemu-nbd server to output all of the incoming requests
it was receiving from the kernel, and I never saw a mention of
NBD_CMD_TRIM being received. Maybe it's dependent on what version of
the kernel and/or NBD kernel module you are running?
I also tried fallocate(1) on /dev/nbd1, as in:
# strace fallocate -o 0 -l 64k -p /dev/nbd1
but it looks like the kernel doesn't yet wire up fallocate(2) to forward
to the appropriate device ioctls and/or NBD_CMD_TRIM, where strace says:
open("/dev/nbd1", O_RDWR) = 3
fallocate(3, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 0, 65536) = -1
ENODEV (No such device)
The ENODEV is a rather unusual choice of error.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:41 ` [Qemu-devel] [Nbd] " Alex Bligh
@ 2016-05-10 19:13 ` Michał Belczyk
-1 siblings, 0 replies; 60+ messages in thread
From: Michał Belczyk @ 2016-05-10 19:13 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 04:41:27PM +0100, Alex Bligh wrote:
>
>On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
>
>Interesting followup question:
>
>If the kernel does not fragment TRIM requests at all (in the
>same way it fragments read and write requests), I suspect
>something bad may happen with TRIM requests over 2^31
>in size (particularly over 2^32 in size), as the length
>field in nbd only has 32 bits.
>
>Whether it supports block size constraints or not, it is
>going to need to do *some* breaking up of requests.
Doesn't the kernel break TRIM requests at max_discard_sectors?
I remember testing the following change in the nbd kmod long time ago:
#if 0
disk->queue->limits.max_discard_sectors = UINT_MAX;
#else
disk->queue->limits.max_discard_sectors = 65536;
#endif
The problem with large TRIM requests is exactly the same as with other
(READ/WRITE) large requests -- they _may_ take loads of time and if the client
wants to support a fast switch over to another replica it must put some
constraints on the timeout value... which may be very easily broken if you
allow things like a 1GB trim request on the server using DIO on the other side
(and say heavily fragmented sparse file over XFS, never mind).
I don't think it's the problem of QEMU NBD server to fix that, the constraint
on the server side is perfectly fine, the problem is to note that a change on
the client side is required to limit the maximum size of the TRIM requests.
The reason for the patch I pasted above was that at the time I looked into it
there was no other way to change the TRIM request size via
/sys/block/$dev/queue/max_discard_sectors, but maybe the more recent kernels
allow that?
Not to mention that the NBD client option to set that at NBD queue setup time
would be nice...
Just my 2p.
--
Michał Belczyk Snr
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-10 19:13 ` Michał Belczyk
0 siblings, 0 replies; 60+ messages in thread
From: Michał Belczyk @ 2016-05-10 19:13 UTC (permalink / raw)
To: Alex Bligh
Cc: Eric Blake, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-devel@nongnu.org,
qemu-stable@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 04:41:27PM +0100, Alex Bligh wrote:
>
>On 10 May 2016, at 16:29, Eric Blake <eblake@redhat.com> wrote:
>
>> So the kernel is currently one of the clients that does NOT honor block
>> sizes, and as such, servers should be prepared for ANY size up to
>> UINT_MAX (other than DoS handling).
>
>Interesting followup question:
>
>If the kernel does not fragment TRIM requests at all (in the
>same way it fragments read and write requests), I suspect
>something bad may happen with TRIM requests over 2^31
>in size (particularly over 2^32 in size), as the length
>field in nbd only has 32 bits.
>
>Whether it supports block size constraints or not, it is
>going to need to do *some* breaking up of requests.
Doesn't the kernel break TRIM requests at max_discard_sectors?
I remember testing the following change in the nbd kmod long time ago:
#if 0
disk->queue->limits.max_discard_sectors = UINT_MAX;
#else
disk->queue->limits.max_discard_sectors = 65536;
#endif
The problem with large TRIM requests is exactly the same as with other
(READ/WRITE) large requests -- they _may_ take loads of time and if the client
wants to support a fast switch over to another replica it must put some
constraints on the timeout value... which may be very easily broken if you
allow things like a 1GB trim request on the server using DIO on the other side
(and say heavily fragmented sparse file over XFS, never mind).
I don't think it's the problem of QEMU NBD server to fix that, the constraint
on the server side is perfectly fine, the problem is to note that a change on
the client side is required to limit the maximum size of the TRIM requests.
The reason for the patch I pasted above was that at the time I looked into it
there was no other way to change the TRIM request size via
/sys/block/$dev/queue/max_discard_sectors, but maybe the more recent kernels
allow that?
Not to mention that the NBD client option to set that at NBD queue setup time
would be nice...
Just my 2p.
--
Michał Belczyk Snr
^ permalink raw reply [flat|nested] 60+ messages in thread
* Re: [Qemu-trivial] [Nbd] [Qemu-devel] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
2016-05-10 15:29 ` [Qemu-devel] [Nbd] " Eric Blake
@ 2016-05-11 21:10 ` Wouter Verhelst
-1 siblings, 0 replies; 60+ messages in thread
From: Wouter Verhelst @ 2016-05-11 21:10 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-stable@nongnu.org,
qemu-devel@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 09:29:23AM -0600, Eric Blake wrote:
> On 05/10/2016 09:08 AM, Alex Bligh wrote:
> > Eric,
> >
> >> Hmm. The current wording of the experimental block size additions does
> >> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
> >> maximum NBD_CMD_WRITE:
> >> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> >
> > Correct
> >
> >> Maybe we should revisit that in the spec, and/or advertise yet another
> >> block size (since the maximum size for a trim and/or write_zeroes
> >> request may indeed be different than the maximum size for a read/write).
> >
> > I think it's up to the server to either handle large requests, or
> > for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
>
> >
> > The core problem here is that the kernel (and, ahem, most servers) are
> > ignorant of the block size extension, and need to guess how to break
> > things up. In my view the client (kernel in this case) should
> > be breaking the trim requests up into whatever size it uses as the
> > maximum size write requests. But then it would have to know about block
> > sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension. (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size, but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size, and are at the
> mercy of however the kernel currently decides to split large requests).
I don't actually think it does that at all, tbh. There is an
"integrityhuge" test in the reference server test suite which performs a
number of large requests (up to 50M), and which was created by a script
that just does direct read requests to /dev/nbdX.
It just so happens that most upper layers (filesystems etc) don't make
requests larger than about 32MiB, but that's not related.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling). My question above only applies to
> clients that use the experimental block size extensions.
Right.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
^ permalink raw reply [flat|nested] 60+ messages in thread* Re: [Qemu-devel] [Nbd] [PATCH] nbd: fix trim/discard commands with a length bigger than NBD_MAX_BUFFER_SIZE
@ 2016-05-11 21:10 ` Wouter Verhelst
0 siblings, 0 replies; 60+ messages in thread
From: Wouter Verhelst @ 2016-05-11 21:10 UTC (permalink / raw)
To: Eric Blake
Cc: Alex Bligh, nbd-general@lists.sourceforge.net, qemu block,
qemu-trivial@nongnu.org, qemu-stable@nongnu.org,
qemu-devel@nongnu.org, Quentin Casasnovas, Paolo Bonzini
On Tue, May 10, 2016 at 09:29:23AM -0600, Eric Blake wrote:
> On 05/10/2016 09:08 AM, Alex Bligh wrote:
> > Eric,
> >
> >> Hmm. The current wording of the experimental block size additions does
> >> NOT allow the client to send a NBD_CMD_TRIM with a size larger than the
> >> maximum NBD_CMD_WRITE:
> >> https://github.com/yoe/nbd/blob/extension-info/doc/proto.md#block-size-constraints
> >
> > Correct
> >
> >> Maybe we should revisit that in the spec, and/or advertise yet another
> >> block size (since the maximum size for a trim and/or write_zeroes
> >> request may indeed be different than the maximum size for a read/write).
> >
> > I think it's up to the server to either handle large requests, or
> > for the client to break these up.
>
> But the question at hand here is whether we should permit servers to
> advertise multiple maximum block sizes (one for read/write, another one
> for trim/write_zero, or even two [at least qemu tracks a separate
> maximum trim vs. write_zero sizing in its generic block layer]), or
> merely stick with the current wording that requires clients that honor
> maximum block size to obey the same maximum for ALL commands, regardless
> of amount of data sent over the wire.
>
> >
> > The core problem here is that the kernel (and, ahem, most servers) are
> > ignorant of the block size extension, and need to guess how to break
> > things up. In my view the client (kernel in this case) should
> > be breaking the trim requests up into whatever size it uses as the
> > maximum size write requests. But then it would have to know about block
> > sizes which are in (another) experimental extension.
>
> Correct - no one has yet patched the kernel to honor block sizes
> advertised through what is currently an experimental extension. (We
> have ioctl(NBD_SET_BLKSIZE) which can be argued to set the kernel's
> minimum block size, but I haven't audited whether the kernel actually
> guarantees that all client requests are sent aligned to the value passed
> that way - but we have nothing to set the maximum size, and are at the
> mercy of however the kernel currently decides to split large requests).
I don't actually think it does that at all, tbh. There is an
"integrityhuge" test in the reference server test suite which performs a
number of large requests (up to 50M), and which was created by a script
that just does direct read requests to /dev/nbdX.
It just so happens that most upper layers (filesystems etc) don't make
requests larger than about 32MiB, but that's not related.
> So the kernel is currently one of the clients that does NOT honor block
> sizes, and as such, servers should be prepared for ANY size up to
> UINT_MAX (other than DoS handling). My question above only applies to
> clients that use the experimental block size extensions.
Right.
[...]
--
< ron> I mean, the main *practical* problem with C++, is there's like a dozen
people in the world who think they really understand all of its rules,
and pretty much all of them are just lying to themselves too.
-- #debian-devel, OFTC, 2016-02-12
^ permalink raw reply [flat|nested] 60+ messages in thread