* [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT
@ 2017-03-22 21:00 Kevin Wolf
2017-03-22 23:49 ` Fam Zheng
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Kevin Wolf @ 2017-03-22 21:00 UTC (permalink / raw)
To: qemu-block; +Cc: kwolf, qemu-devel
Success for bdrv_flush() means that all previously written data is safe
on disk. For fdatasync(), the best semantics we can hope for on Linux
(without O_DIRECT) is that all data that was written since the last call
was successfully written back. Therefore, and because we can't redo all
writes after a flush failure, we have to give up after a single
fdatasync() failure. After this failure, we would never be able to make
the promise that a successful bdrv_flush() makes.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
block/file-posix.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/block/file-posix.c b/block/file-posix.c
index 53febd3..beb7a4f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -144,6 +144,7 @@ typedef struct BDRVRawState {
bool has_write_zeroes:1;
bool discard_zeroes:1;
bool use_linux_aio:1;
+ bool page_cache_inconsistent:1;
bool has_fallocate;
bool needs_alignment;
} BDRVRawState;
@@ -824,10 +825,31 @@ static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
static ssize_t handle_aiocb_flush(RawPosixAIOData *aiocb)
{
+ BDRVRawState *s = aiocb->bs->opaque;
int ret;
+ if (s->page_cache_inconsistent) {
+ return -EIO;
+ }
+
ret = qemu_fdatasync(aiocb->aio_fildes);
if (ret == -1) {
+ /* There is no clear definition of the semantics of a failing fsync(),
+ * so we may have to assume the worst. The sad truth is that this
+ * assumption is correct for Linux. Some pages are now probably marked
+ * clean in the page cache even though they are inconsistent with the
+ * on-disk contents. The next fdatasync() call would succeed, but no
+ * further writeback attempt will be made. We can't get back to a state
+ * in which we know what is on disk (we would have to rewrite
+ * everything that was touched since the last fdatasync() at least), so
+ * make bdrv_flush() fail permanently. Given that the behaviour isn't
+ * really defined, I have little hope that other OSes are doing better.
+ *
+ * Obviously, this doesn't affect O_DIRECT, which bypasses the page
+ * cache. */
+ if ((s->open_flags & O_DIRECT) == 0) {
+ s->page_cache_inconsistent = true;
+ }
return -errno;
}
return 0;
--
2.9.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT
2017-03-22 21:00 [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT Kevin Wolf
@ 2017-03-22 23:49 ` Fam Zheng
2017-03-23 0:12 ` Eric Blake
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Fam Zheng @ 2017-03-22 23:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, qemu-devel
On Wed, 03/22 22:00, Kevin Wolf wrote:
> Success for bdrv_flush() means that all previously written data is safe
> on disk. For fdatasync(), the best semantics we can hope for on Linux
> (without O_DIRECT) is that all data that was written since the last call
> was successfully written back. Therefore, and because we can't redo all
> writes after a flush failure, we have to give up after a single
> fdatasync() failure. After this failure, we would never be able to make
> the promise that a successful bdrv_flush() makes.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/file-posix.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
>
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 53febd3..beb7a4f 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -144,6 +144,7 @@ typedef struct BDRVRawState {
> bool has_write_zeroes:1;
> bool discard_zeroes:1;
> bool use_linux_aio:1;
> + bool page_cache_inconsistent:1;
> bool has_fallocate;
> bool needs_alignment;
> } BDRVRawState;
> @@ -824,10 +825,31 @@ static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
>
> static ssize_t handle_aiocb_flush(RawPosixAIOData *aiocb)
> {
> + BDRVRawState *s = aiocb->bs->opaque;
> int ret;
>
> + if (s->page_cache_inconsistent) {
> + return -EIO;
> + }
> +
> ret = qemu_fdatasync(aiocb->aio_fildes);
> if (ret == -1) {
> + /* There is no clear definition of the semantics of a failing fsync(),
> + * so we may have to assume the worst. The sad truth is that this
> + * assumption is correct for Linux. Some pages are now probably marked
> + * clean in the page cache even though they are inconsistent with the
> + * on-disk contents. The next fdatasync() call would succeed, but no
> + * further writeback attempt will be made. We can't get back to a state
> + * in which we know what is on disk (we would have to rewrite
> + * everything that was touched since the last fdatasync() at least), so
> + * make bdrv_flush() fail permanently. Given that the behaviour isn't
> + * really defined, I have little hope that other OSes are doing better.
> + *
> + * Obviously, this doesn't affect O_DIRECT, which bypasses the page
> + * cache. */
> + if ((s->open_flags & O_DIRECT) == 0) {
> + s->page_cache_inconsistent = true;
> + }
> return -errno;
> }
> return 0;
> --
> 2.9.3
>
>
Reviewed-by: Fam Zheng <famz@redhat.com>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT
2017-03-22 21:00 [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT Kevin Wolf
2017-03-22 23:49 ` Fam Zheng
@ 2017-03-23 0:12 ` Eric Blake
2017-03-23 18:08 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-03-27 14:54 ` Max Reitz
3 siblings, 0 replies; 5+ messages in thread
From: Eric Blake @ 2017-03-23 0:12 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1007 bytes --]
On 03/22/2017 04:00 PM, Kevin Wolf wrote:
> Success for bdrv_flush() means that all previously written data is safe
> on disk. For fdatasync(), the best semantics we can hope for on Linux
> (without O_DIRECT) is that all data that was written since the last call
> was successfully written back. Therefore, and because we can't redo all
> writes after a flush failure, we have to give up after a single
> fdatasync() failure. After this failure, we would never be able to make
> the promise that a successful bdrv_flush() makes.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/file-posix.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
Makes sense for 2.9 (it doesn't change the data loss, but alerts to the
user to the knowledge of data loss a lot sooner, perhaps before things
get even worse).
Reviewed-by: Eric Blake <eblake@redhat.com>
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT
2017-03-22 21:00 [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT Kevin Wolf
2017-03-22 23:49 ` Fam Zheng
2017-03-23 0:12 ` Eric Blake
@ 2017-03-23 18:08 ` Stefan Hajnoczi
2017-03-27 14:54 ` Max Reitz
3 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2017-03-23 18:08 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 756 bytes --]
On Wed, Mar 22, 2017 at 10:00:05PM +0100, Kevin Wolf wrote:
> Success for bdrv_flush() means that all previously written data is safe
> on disk. For fdatasync(), the best semantics we can hope for on Linux
> (without O_DIRECT) is that all data that was written since the last call
> was successfully written back. Therefore, and because we can't redo all
> writes after a flush failure, we have to give up after a single
> fdatasync() failure. After this failure, we would never be able to make
> the promise that a successful bdrv_flush() makes.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/file-posix.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT
2017-03-22 21:00 [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT Kevin Wolf
` (2 preceding siblings ...)
2017-03-23 18:08 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
@ 2017-03-27 14:54 ` Max Reitz
3 siblings, 0 replies; 5+ messages in thread
From: Max Reitz @ 2017-03-27 14:54 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 786 bytes --]
On 22.03.2017 22:00, Kevin Wolf wrote:
> Success for bdrv_flush() means that all previously written data is safe
> on disk. For fdatasync(), the best semantics we can hope for on Linux
> (without O_DIRECT) is that all data that was written since the last call
> was successfully written back. Therefore, and because we can't redo all
> writes after a flush failure, we have to give up after a single
> fdatasync() failure. After this failure, we would never be able to make
> the promise that a successful bdrv_flush() makes.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> block/file-posix.c | 22 ++++++++++++++++++++++
> 1 file changed, 22 insertions(+)
Thanks, applied to my block branch for 2.9:
https://github.com/XanClic/qemu/commits/block
Max
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-03-27 14:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-22 21:00 [Qemu-devel] [PATCH for-2.9?] file-posix: Make bdrv_flush() failure permanent without O_DIRECT Kevin Wolf
2017-03-22 23:49 ` Fam Zheng
2017-03-23 0:12 ` Eric Blake
2017-03-23 18:08 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2017-03-27 14:54 ` Max Reitz
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).