* [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
@ 2016-04-30 21:48 Eric Blake
2016-05-02 15:35 ` Kevin Wolf
0 siblings, 1 reply; 6+ messages in thread
From: Eric Blake @ 2016-04-30 21:48 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-stable, Stefan Hajnoczi, Fam Zheng, Kevin Wolf, Max Reitz,
open list:Block I/O path
NBD has situations where it can support FUA but not ZERO_WRITE;
when that happens, the generic block layer fallback was losing
the FUA flag. The problem of losing flags unrelated to
ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
aa7bfbff, but back then, it did not matter because there was no
FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(),
the loss of flags can impact correctness.
Compare to commit 9eeb6dd, which got it right in
bdrv_co_do_zero_pwritev().
Symptoms: I tested with qemu-io with default writethrough cache
(which is supposed to use FUA semantics on every write), and
targetted an NBD client connected to a server that intentionally
did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512',
the NBD client sent two operations (NBD_CMD_WRITE then
NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the
missing flush meant that an ill-timed disconnect could leave
the zeroes unflushed.
CC: qemu-stable@nongnu.org
Signed-off-by: Eric Blake <eblake@redhat.com>
---
As written, this patch applies to 2.7 on top of Kevin's block-next
branch. Since it's (probably) too late for 2.6, we'll need to
backport it to there, but the backport will have to use
bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev().
block/io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/io.c b/block/io.c
index 0db1146..bd46e47 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
qemu_iovec_init_external(&qiov, &iov, 1);
ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
- num * BDRV_SECTOR_SIZE, &qiov, 0);
+ num * BDRV_SECTOR_SIZE, &qiov,
+ flags & ~BDRV_REQ_ZERO_WRITE);
/* Keep bounce buffer around if it is big enough for all
* all future requests.
--
2.5.5
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
2016-04-30 21:48 [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback Eric Blake
@ 2016-05-02 15:35 ` Kevin Wolf
2016-05-02 15:42 ` Eric Blake
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2016-05-02 15:35 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Fam Zheng, Max Reitz,
open list:Block I/O path
Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
> NBD has situations where it can support FUA but not ZERO_WRITE;
> when that happens, the generic block layer fallback was losing
> the FUA flag. The problem of losing flags unrelated to
> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
> aa7bfbff, but back then, it did not matter because there was no
> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(),
> the loss of flags can impact correctness.
>
> Compare to commit 9eeb6dd, which got it right in
> bdrv_co_do_zero_pwritev().
>
> Symptoms: I tested with qemu-io with default writethrough cache
> (which is supposed to use FUA semantics on every write), and
> targetted an NBD client connected to a server that intentionally
> did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512',
> the NBD client sent two operations (NBD_CMD_WRITE then
> NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing
> 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE; the
> missing flush meant that an ill-timed disconnect could leave
> the zeroes unflushed.
>
> CC: qemu-stable@nongnu.org
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>
> As written, this patch applies to 2.7 on top of Kevin's block-next
> branch. Since it's (probably) too late for 2.6, we'll need to
> backport it to there, but the backport will have to use
> bdrv_co_writev_flags since 2.6 lacks bdrv_driver_pwritev().
>
> block/io.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/block/io.c b/block/io.c
> index 0db1146..bd46e47 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
> qemu_iovec_init_external(&qiov, &iov, 1);
>
> ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
> - num * BDRV_SECTOR_SIZE, &qiov, 0);
> + num * BDRV_SECTOR_SIZE, &qiov,
> + flags & ~BDRV_REQ_ZERO_WRITE);
This is a good change, but it's only in the fallback code. If we succeed
here:
if (drv->bdrv_co_write_zeroes) {
ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags);
}
then we still don't get the necessary flush unless the driver's
.bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
I know, we don't have any driver that implements FUA there.
Kevin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
2016-05-02 15:35 ` Kevin Wolf
@ 2016-05-02 15:42 ` Eric Blake
2016-05-02 17:14 ` Eric Blake
0 siblings, 1 reply; 6+ messages in thread
From: Eric Blake @ 2016-05-02 15:42 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-devel, qemu-stable, Stefan Hajnoczi, Fam Zheng, Max Reitz,
open list:Block I/O path
[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]
On 05/02/2016 09:35 AM, Kevin Wolf wrote:
> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
>> NBD has situations where it can support FUA but not ZERO_WRITE;
>> when that happens, the generic block layer fallback was losing
>> the FUA flag. The problem of losing flags unrelated to
>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
>> aa7bfbff, but back then, it did not matter because there was no
>> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(),
>> the loss of flags can impact correctness.
>>
>> +++ b/block/io.c
>> @@ -1213,7 +1213,8 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
>> qemu_iovec_init_external(&qiov, &iov, 1);
>>
>> ret = bdrv_driver_pwritev(bs, sector_num * BDRV_SECTOR_SIZE,
>> - num * BDRV_SECTOR_SIZE, &qiov, 0);
>> + num * BDRV_SECTOR_SIZE, &qiov,
>> + flags & ~BDRV_REQ_ZERO_WRITE);
>
> This is a good change, but it's only in the fallback code. If we succeed
> here:
>
> if (drv->bdrv_co_write_zeroes) {
> ret = drv->bdrv_co_write_zeroes(bs, sector_num, num, flags);
> }
>
> then we still don't get the necessary flush unless the driver's
> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
> I know, we don't have any driver that implements FUA there.
NBD will, once we get to that part of my series.
But I see what you're saying: since we already emulate FUA for all
drivers where .supported_write_flags does not include BDRV_REQ_FUA
during bdrv_driver_pwritev(), we should also emulate it here for all the
same drivers (and any driver that DOES advertise BDRV_REQ_FUA as
supported as well as a write_zeroes callback should be fixed to honor
it). I'll do that in v2, which I guess means I should post it at the
same time as my work for making .supported_write_flags a per-bds rather
than per-driver designation.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
2016-05-02 15:42 ` Eric Blake
@ 2016-05-02 17:14 ` Eric Blake
2016-05-03 7:55 ` Kevin Wolf
0 siblings, 1 reply; 6+ messages in thread
From: Eric Blake @ 2016-05-02 17:14 UTC (permalink / raw)
To: Kevin Wolf
Cc: Fam Zheng, open list:Block I/O path, qemu-stable, qemu-devel,
Max Reitz, Stefan Hajnoczi, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 2485 bytes --]
On 05/02/2016 09:42 AM, Eric Blake wrote:
> On 05/02/2016 09:35 AM, Kevin Wolf wrote:
>> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
>>> NBD has situations where it can support FUA but not ZERO_WRITE;
>>> when that happens, the generic block layer fallback was losing
>>> the FUA flag. The problem of losing flags unrelated to
>>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
>>> aa7bfbff, but back then, it did not matter because there was no
>>> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(),
>>> the loss of flags can impact correctness.
>>>
>
>> then we still don't get the necessary flush unless the driver's
>> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
>> I know, we don't have any driver that implements FUA there.
>
> NBD will, once we get to that part of my series.
And looking further, it looks like SCSI does NOT support FUA during
WRITESAME(10/16), only during WRITE(10/16). Which means we probably
want to start having both .supported_write_flags AND
.supported_write_zero flags, so that at least the iscsi driver can
properly report that it does NOT natively support FUA on a write_zeroes
request.
>
> But I see what you're saying: since we already emulate FUA for all
> drivers where .supported_write_flags does not include BDRV_REQ_FUA
> during bdrv_driver_pwritev(), we should also emulate it here for all the
> same drivers (and any driver that DOES advertise BDRV_REQ_FUA as
> supported as well as a write_zeroes callback should be fixed to honor
> it). I'll do that in v2, which I guess means I should post it at the
> same time as my work for making .supported_write_flags a per-bds rather
> than per-driver designation.
It also looks like SCSI would benefit from a per-bds
supported_write_flags, as our iscsi_modesense_sync() is setting
iscsilun->dpofua conditionally; likewise, iscsi_co_writev_flags blindly
ignores BDRV_REQ_FUA if iscsilun->dpofua is not set. Which means that
not all ISCSI devices support the FUA bit in WRITE(10/16) requests, and
that therefore we should not be blindly assuming that FUA worked on
those devices where it was not sensed. So just like NBD, iscsi has
cases where we need an automatic FUA fallback at the block layer,
regardless of whether we also need the fallback for WRITESAME(10/16).
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
2016-05-02 17:14 ` Eric Blake
@ 2016-05-03 7:55 ` Kevin Wolf
2016-05-03 12:28 ` Eric Blake
0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2016-05-03 7:55 UTC (permalink / raw)
To: Eric Blake
Cc: Fam Zheng, open list:Block I/O path, qemu-stable, qemu-devel,
Max Reitz, Stefan Hajnoczi, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 1934 bytes --]
Am 02.05.2016 um 19:14 hat Eric Blake geschrieben:
> On 05/02/2016 09:42 AM, Eric Blake wrote:
> > On 05/02/2016 09:35 AM, Kevin Wolf wrote:
> >> Am 30.04.2016 um 23:48 hat Eric Blake geschrieben:
> >>> NBD has situations where it can support FUA but not ZERO_WRITE;
> >>> when that happens, the generic block layer fallback was losing
> >>> the FUA flag. The problem of losing flags unrelated to
> >>> ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since
> >>> aa7bfbff, but back then, it did not matter because there was no
> >>> FUA flag. But ever since 93f5e6d8 added bdrv_co_writev_flags(),
> >>> the loss of flags can impact correctness.
> >>>
> >
>
> >> then we still don't get the necessary flush unless the driver's
> >> .bdrv_co_write_zeroes() implementation explicitly handles FUA. As far as
> >> I know, we don't have any driver that implements FUA there.
> >
> > NBD will, once we get to that part of my series.
>
> And looking further, it looks like SCSI does NOT support FUA during
> WRITESAME(10/16), only during WRITE(10/16). Which means we probably
> want to start having both .supported_write_flags AND
> .supported_write_zero flags, so that at least the iscsi driver can
> properly report that it does NOT natively support FUA on a write_zeroes
> request.
Hm, not sure if it makes sense, but let me write that thought down:
You're going to convert .supported_write_flags to a function anyway.
Instead of adding a second function, how about passing a set of flags
there and the function returns a subset that it can support? For write
zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and
in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE.
If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use
.bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the
most consistent interface.
Kevin
[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback
2016-05-03 7:55 ` Kevin Wolf
@ 2016-05-03 12:28 ` Eric Blake
0 siblings, 0 replies; 6+ messages in thread
From: Eric Blake @ 2016-05-03 12:28 UTC (permalink / raw)
To: Kevin Wolf
Cc: Fam Zheng, open list:Block I/O path, qemu-stable, qemu-devel,
Max Reitz, Stefan Hajnoczi, Paolo Bonzini
[-- Attachment #1: Type: text/plain, Size: 2174 bytes --]
On 05/03/2016 01:55 AM, Kevin Wolf wrote:
>> And looking further, it looks like SCSI does NOT support FUA during
>> WRITESAME(10/16), only during WRITE(10/16). Which means we probably
>> want to start having both .supported_write_flags AND
>> .supported_write_zero flags, so that at least the iscsi driver can
>> properly report that it does NOT natively support FUA on a write_zeroes
>> request.
>
> Hm, not sure if it makes sense, but let me write that thought down:
>
> You're going to convert .supported_write_flags to a function anyway.
> Instead of adding a second function, how about passing a set of flags
> there and the function returns a subset that it can support? For write
> zeroes with FUA you would pass in BDRV_REQ_ZERO_WRITE | BDRV_REQ_FUA and
> in this case the iscsi driver would return only BDRV_REQ_ZERO_WRITE.
>
> If we ever decided to get rid of .bdrv_co_write_zeroes() and instead use
> .bdrv_co_pwritev() with BDRV_REQ_ZERO_WRITE, this would probably be the
> most consistent interface.
I'm currently leaning towards two flags rather than a function (the
flags are set during .bdrv_open(), and don't change during the life of
the BDS):
bs->supported_write_flags = BDRV_REQ_FUA;
bs->supported_zero_flags = BDRV_REQ_MAY_UNMAP; // | BDRV_REQ_FUA
and keeping the BDRV_REQ_ZERO_WRITE as a special case in the block layer
that chooses between .bdrv_co_write_zeroes() or .bdrv_driver_pwrite().
MAY_UNMAP makes no sense on a normal write, just as iscsi doesn't allow
FUA on a zero write, so having the two sets of valid flags according to
operations, as well as the two driver entries, seems to make the most
sense. Then io.c, we ignore MAY_UNMAP where it is unsupported, and
emulate FUA on both write and write_zero as needed.
Of course, if we ever have a driver that can support everything through
a single entry, it could advertise supported_write_flags = BDRV_REQ_FUA
| BDRV_REQ_ZERO_WRITE | BDRV_REQ_MAY_UNMAP and provide only
.bdrv_driver_pwrite(), but I don't think that will be happening.
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-05-03 12:29 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-30 21:48 [Qemu-devel] [PATCH] block: Don't lose FUA flag during ZERO_WRITE fallback Eric Blake
2016-05-02 15:35 ` Kevin Wolf
2016-05-02 15:42 ` Eric Blake
2016-05-02 17:14 ` Eric Blake
2016-05-03 7:55 ` Kevin Wolf
2016-05-03 12:28 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).