From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, hreitz@redhat.com, kwolf@redhat.com,
vsementsov@yandex-team.ru, pbonzini@redhat.com,
eesposit@redhat.com, andrey.drobyshev@virtuozzo.com,
den@virtuozzo.com
Subject: [PATCH v3 1/3] block: zero data data corruption using prealloc-filter
Date: Tue, 16 Jul 2024 17:41:21 +0300 [thread overview]
Message-ID: <20240716144123.651476-2-andrey.drobyshev@virtuozzo.com> (raw)
In-Reply-To: <20240716144123.651476-1-andrey.drobyshev@virtuozzo.com>
From: "Denis V. Lunev" <den@openvz.org>
We have observed that some clusters in the QCOW2 files are zeroed
while preallocation filter is used.
We are able to trace down the following sequence when prealloc-filter
is used:
co=0x55e7cbed7680 qcow2_co_pwritev_task()
co=0x55e7cbed7680 preallocate_co_pwritev_part()
co=0x55e7cbed7680 handle_write()
co=0x55e7cbed7680 bdrv_co_do_pwrite_zeroes()
co=0x55e7cbed7680 raw_do_pwrite_zeroes()
co=0x7f9edb7fe500 do_fallocate()
Here coroutine 0x55e7cbed7680 is being blocked waiting while coroutine
0x7f9edb7fe500 will finish with fallocate of the file area. OK. It is
time to handle next coroutine, which
co=0x55e7cbee91b0 qcow2_co_pwritev_task()
co=0x55e7cbee91b0 preallocate_co_pwritev_part()
co=0x55e7cbee91b0 handle_write()
co=0x55e7cbee91b0 bdrv_co_do_pwrite_zeroes()
co=0x55e7cbee91b0 raw_do_pwrite_zeroes()
co=0x7f9edb7deb00 do_fallocate()
The trouble comes here. Coroutine 0x55e7cbed7680 has not advanced
file_end yet and coroutine 0x55e7cbee91b0 will start fallocate() for
the same area. This means that if (once fallocate is started inside
0x7f9edb7deb00) original fallocate could end and the real write will
be executed. In that case write() request is handled at the same time
as fallocate().
The patch moves s->file_lock assignment before fallocate and that is
crucial. The idea is that all subsequent requests into the area
being preallocation will be issued as just writes without fallocate
to this area and they will not proceed thanks to overlapping
requests mechanics. If preallocation will fail, we will just switch
to the normal expand-by-write behavior and that is not a problem
except performance.
Signed-off-by: Denis V. Lunev <den@openvz.org>
Tested-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
---
block/preallocate.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/block/preallocate.c b/block/preallocate.c
index d215bc5d6d..ecf0aa4baa 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -383,6 +383,13 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes,
want_merge_zero = want_merge_zero && (prealloc_start <= offset);
+ /*
+ * Assign file_end before making actual preallocation. This will ensure
+ * that next request performed while preallocation is in progress will
+ * be passed without preallocation.
+ */
+ s->file_end = prealloc_end;
+
ret = bdrv_co_pwrite_zeroes(
bs->file, prealloc_start, prealloc_end - prealloc_start,
BDRV_REQ_NO_FALLBACK | BDRV_REQ_SERIALISING | BDRV_REQ_NO_WAIT);
@@ -391,7 +398,6 @@ handle_write(BlockDriverState *bs, int64_t offset, int64_t bytes,
return false;
}
- s->file_end = prealloc_end;
return want_merge_zero;
}
--
2.39.3
next prev parent reply other threads:[~2024-07-16 14:42 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-16 14:41 [PATCH v3 0/3] Fix data corruption within preallocation Andrey Drobyshev
2024-07-16 14:41 ` Andrey Drobyshev [this message]
2024-07-18 15:51 ` [PATCH v3 1/3] block: zero data data corruption using prealloc-filter Kevin Wolf
2024-07-18 15:52 ` Denis V. Lunev
2024-07-18 19:46 ` Denis V. Lunev
2024-08-05 11:59 ` Kevin Wolf
2024-08-05 12:13 ` Denis V. Lunev
2024-07-16 14:41 ` [PATCH v3 2/3] iotests/298: add testcase for async writes with preallocation filter Andrey Drobyshev
2024-08-05 12:04 ` Kevin Wolf
2024-08-05 12:56 ` Andrey Drobyshev
2024-08-05 13:50 ` Kevin Wolf
2024-07-16 14:41 ` [PATCH v3 3/3] scripts: add filev2p.py script for mapping virtual file offsets mapping Andrey Drobyshev
2024-08-05 12:29 ` Kevin Wolf
2024-08-05 13:02 ` Andrey Drobyshev
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240716144123.651476-2-andrey.drobyshev@virtuozzo.com \
--to=andrey.drobyshev@virtuozzo.com \
--cc=den@virtuozzo.com \
--cc=eesposit@redhat.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).