* [PATCH v2 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
` (20 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
This field has been unused since commit 72373e40fbc ('block:
bdrv_reopen_multiple: refresh permissions on updated graph').
Remove it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/block.c b/block.c
index 8da89aaa62..9029ddd9ff 100644
--- a/block.c
+++ b/block.c
@@ -2115,7 +2115,6 @@ static int bdrv_fill_options(QDict **options, const char *filename,
typedef struct BlockReopenQueueEntry {
bool prepared;
- bool perms_checked;
BDRVReopenState state;
QTAILQ_ENTRY(BlockReopenQueueEntry) entry;
} BlockReopenQueueEntry;
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 02/21] preallocate: Factor out preallocate_truncate_to_real_size()
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 03/21] preallocate: Don't poll during permission updates Kevin Wolf
` (19 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
It's essentially the same code in preallocate_check_perm() and
preallocate_close(), except that the latter ignores errors.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/preallocate.c | 48 +++++++++++++++++++++------------------------
1 file changed, 22 insertions(+), 26 deletions(-)
diff --git a/block/preallocate.c b/block/preallocate.c
index 3d0f621003..3173d80534 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -162,26 +162,39 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
return 0;
}
-static void preallocate_close(BlockDriverState *bs)
+static int preallocate_truncate_to_real_size(BlockDriverState *bs, Error **errp)
{
- int ret;
BDRVPreallocateState *s = bs->opaque;
-
- if (s->data_end < 0) {
- return;
- }
+ int ret;
if (s->file_end < 0) {
s->file_end = bdrv_getlength(bs->file->bs);
if (s->file_end < 0) {
- return;
+ error_setg_errno(errp, -s->file_end, "Failed to get file length");
+ return s->file_end;
}
}
if (s->data_end < s->file_end) {
ret = bdrv_truncate(bs->file, s->data_end, true, PREALLOC_MODE_OFF, 0,
NULL);
- s->file_end = ret < 0 ? ret : s->data_end;
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Failed to drop preallocation");
+ s->file_end = ret;
+ return ret;
+ }
+ s->file_end = s->data_end;
+ }
+
+ return 0;
+}
+
+static void preallocate_close(BlockDriverState *bs)
+{
+ BDRVPreallocateState *s = bs->opaque;
+
+ if (s->data_end >= 0) {
+ preallocate_truncate_to_real_size(bs, NULL);
}
}
@@ -473,24 +486,7 @@ static int preallocate_check_perm(BlockDriverState *bs,
* We should truncate in check_perm, as in set_perm bs->file->perm will
* be already changed, and we should not violate it.
*/
- if (s->file_end < 0) {
- s->file_end = bdrv_getlength(bs->file->bs);
- if (s->file_end < 0) {
- error_setg(errp, "Failed to get file length");
- return s->file_end;
- }
- }
-
- if (s->data_end < s->file_end) {
- int ret = bdrv_truncate(bs->file, s->data_end, true,
- PREALLOC_MODE_OFF, 0, NULL);
- if (ret < 0) {
- error_setg(errp, "Failed to drop preallocation");
- s->file_end = ret;
- return ret;
- }
- s->file_end = s->data_end;
- }
+ return preallocate_truncate_to_real_size(bs, errp);
}
return 0;
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 03/21] preallocate: Don't poll during permission updates
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 01/21] block: Remove unused BlockReopenQueueEntry.perms_checked Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 02/21] preallocate: Factor out preallocate_truncate_to_real_size() Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-10-05 19:55 ` Vladimir Sementsov-Ogievskiy
2023-09-11 9:46 ` [PATCH v2 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
` (18 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
When the permission related BlockDriver callbacks are called, we are in
the middle of an operation traversing the block graph. Polling in such a
place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.
So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra preallocated
area in the image file and gives up write/resize permissions itself.
In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until the BH
has executed.
There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block/preallocate.c | 89 +++++++++++++++++++++++++++++++++++----------
1 file changed, 69 insertions(+), 20 deletions(-)
diff --git a/block/preallocate.c b/block/preallocate.c
index 3173d80534..bfb638d8b1 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -75,8 +75,14 @@ typedef struct BDRVPreallocateState {
* be invalid (< 0) when we don't have both exclusive BLK_PERM_RESIZE and
* BLK_PERM_WRITE permissions on file child.
*/
+
+ /* Gives up the resize permission on children when parents don't need it */
+ QEMUBH *drop_resize_bh;
} BDRVPreallocateState;
+static int preallocate_drop_resize(BlockDriverState *bs, Error **errp);
+static void preallocate_drop_resize_bh(void *opaque);
+
#define PREALLOCATE_OPT_PREALLOC_ALIGN "prealloc-align"
#define PREALLOCATE_OPT_PREALLOC_SIZE "prealloc-size"
static QemuOptsList runtime_opts = {
@@ -142,6 +148,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags,
* For this to work, mark them invalid.
*/
s->file_end = s->zero_start = s->data_end = -EINVAL;
+ s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs);
ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
if (ret < 0) {
@@ -193,6 +200,9 @@ static void preallocate_close(BlockDriverState *bs)
{
BDRVPreallocateState *s = bs->opaque;
+ qemu_bh_cancel(s->drop_resize_bh);
+ qemu_bh_delete(s->drop_resize_bh);
+
if (s->data_end >= 0) {
preallocate_truncate_to_real_size(bs, NULL);
}
@@ -211,6 +221,7 @@ static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue, Error **errp)
{
PreallocateOpts *opts = g_new0(PreallocateOpts, 1);
+ int ret;
if (!preallocate_absorb_opts(opts, reopen_state->options,
reopen_state->bs->file->bs, errp)) {
@@ -218,6 +229,19 @@ static int preallocate_reopen_prepare(BDRVReopenState *reopen_state,
return -EINVAL;
}
+ /*
+ * Drop the preallocation already here if reopening read-only. The child
+ * might also be reopened read-only and then scheduling a BH during the
+ * permission update is too late.
+ */
+ if ((reopen_state->flags & BDRV_O_RDWR) == 0) {
+ ret = preallocate_drop_resize(reopen_state->bs, errp);
+ if (ret < 0) {
+ g_free(opts);
+ return ret;
+ }
+ }
+
reopen_state->opaque = opts;
return 0;
@@ -475,41 +499,61 @@ preallocate_co_getlength(BlockDriverState *bs)
return ret;
}
-static int preallocate_check_perm(BlockDriverState *bs,
- uint64_t perm, uint64_t shared, Error **errp)
+static int preallocate_drop_resize(BlockDriverState *bs, Error **errp)
{
BDRVPreallocateState *s = bs->opaque;
+ int ret;
- if (s->data_end >= 0 && !can_write_resize(perm)) {
- /*
- * Lose permissions.
- * We should truncate in check_perm, as in set_perm bs->file->perm will
- * be already changed, and we should not violate it.
- */
- return preallocate_truncate_to_real_size(bs, errp);
+ if (s->data_end < 0) {
+ return 0;
+ }
+
+ /*
+ * Before switching children to be read-only, truncate them to remove
+ * the preallocation and let them have the real size.
+ */
+ ret = preallocate_truncate_to_real_size(bs, errp);
+ if (ret < 0) {
+ return ret;
}
+ /*
+ * We'll drop our permissions and will allow other users to take write and
+ * resize permissions (see preallocate_child_perm). Anyone will be able to
+ * change the child, so mark all states invalid. We'll regain control if a
+ * parent requests write access again.
+ */
+ s->data_end = s->file_end = s->zero_start = -EINVAL;
+
+ bdrv_graph_rdlock_main_loop();
+ bdrv_child_refresh_perms(bs, bs->file, NULL);
+ bdrv_graph_rdunlock_main_loop();
+
return 0;
}
+static void preallocate_drop_resize_bh(void *opaque)
+{
+ /*
+ * In case of errors, we'll simply keep the exclusive lock on the image
+ * indefinitely.
+ */
+ preallocate_drop_resize(opaque, NULL);
+}
+
static void preallocate_set_perm(BlockDriverState *bs,
uint64_t perm, uint64_t shared)
{
BDRVPreallocateState *s = bs->opaque;
if (can_write_resize(perm)) {
+ qemu_bh_cancel(s->drop_resize_bh);
if (s->data_end < 0) {
s->data_end = s->file_end = s->zero_start =
- bdrv_getlength(bs->file->bs);
+ bs->file->bs->total_sectors * BDRV_SECTOR_SIZE;
}
} else {
- /*
- * We drop our permissions, as well as allow shared
- * permissions (see preallocate_child_perm), anyone will be able to
- * change the child, so mark all states invalid. We'll regain control if
- * get good permissions back.
- */
- s->data_end = s->file_end = s->zero_start = -EINVAL;
+ qemu_bh_schedule(s->drop_resize_bh);
}
}
@@ -517,10 +561,16 @@ static void preallocate_child_perm(BlockDriverState *bs, BdrvChild *c,
BdrvChildRole role, BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared, uint64_t *nperm, uint64_t *nshared)
{
+ BDRVPreallocateState *s = bs->opaque;
+
bdrv_default_perms(bs, c, role, reopen_queue, perm, shared, nperm, nshared);
- if (can_write_resize(perm)) {
- /* This should come by default, but let's enforce: */
+ /*
+ * We need exclusive write and resize permissions on the child not only when
+ * the parent can write to it, but also after the parent gave up write
+ * permissions until preallocate_drop_resize() has completed.
+ */
+ if (can_write_resize(perm) || s->data_end != -EINVAL) {
*nperm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
/*
@@ -550,7 +600,6 @@ static BlockDriver bdrv_preallocate_filter = {
.bdrv_co_flush = preallocate_co_flush,
.bdrv_co_truncate = preallocate_co_truncate,
- .bdrv_check_perm = preallocate_check_perm,
.bdrv_set_perm = preallocate_set_perm,
.bdrv_child_perm = preallocate_child_perm,
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
2023-09-11 9:46 ` [PATCH v2 03/21] preallocate: Don't poll during permission updates Kevin Wolf
@ 2023-10-05 19:55 ` Vladimir Sementsov-Ogievskiy
2023-10-06 8:56 ` Kevin Wolf
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-05 19:55 UTC (permalink / raw)
To: Kevin Wolf, qemu-block; +Cc: stefanha, eblake, eesposit, pbonzini, qemu-devel
On 11.09.23 12:46, Kevin Wolf wrote:
> When the permission related BlockDriver callbacks are called, we are in
> the middle of an operation traversing the block graph. Polling in such a
> place is a very bad idea because the graph could change in unexpected
> ways. In the future, callers will also hold the graph lock, which is
> likely to turn polling into a deadlock.
>
> So we need to get rid of calls to functions like bdrv_getlength() or
> bdrv_truncate() there as these functions poll internally. They are
> currently used so that when no parent has write/resize permissions on
> the image any more, the preallocate filter drops the extra preallocated
> area in the image file and gives up write/resize permissions itself.
>
> In order to achieve this without polling in .bdrv_check_perm, don't
> immediately truncate the image, but only schedule a BH to do so. The
> filter keeps the write/resize permissions a bit longer now until the BH
> has executed.
>
> There is one case in which delaying doesn't work: Reopening the image
> read-only. In this case, bs->file will likely be reopened read-only,
> too, so keeping write permissions a bit longer on it doesn't work. But
> we can already cover this case in preallocate_reopen_prepare() and not
> rely on the permission updates for it.
Hmm, now I found one more "future" case.
I now try to rebase my "[PATCH v7 0/7] blockdev-replace" https://patchew.org/QEMU/20230421114102.884457-1-vsementsov@yandex-team.ru/
And it breaks after this commit.
By accident, blockdev-replace series uses exactly "preallocate" filter to test insertion/removing of filters. And removing is broken now.
Removing is done as follows:
1. We have filter inserted: disk0 -file-> filter -file-> file0
2. blockdev-replace, replaces file child of disk0, so we should get the picture*: disk0 -file-> file0 <-file- filter
3. blockdev-del filter
But step [2] fails, as now preallocate filter doesn't drop permissions during the operation (postponing this for a while) and the picture* is impossible. Permission check fails.
Hmmm... Any idea how blockdev-replace and preallocate filter should work :) ? Maybe, doing truncation in .drain_begin() will help? Will try
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
2023-10-05 19:55 ` Vladimir Sementsov-Ogievskiy
@ 2023-10-06 8:56 ` Kevin Wolf
2023-10-06 18:10 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2023-10-06 8:56 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: qemu-block, stefanha, eblake, eesposit, pbonzini, qemu-devel
Am 05.10.2023 um 21:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
> On 11.09.23 12:46, Kevin Wolf wrote:
> > When the permission related BlockDriver callbacks are called, we are in
> > the middle of an operation traversing the block graph. Polling in such a
> > place is a very bad idea because the graph could change in unexpected
> > ways. In the future, callers will also hold the graph lock, which is
> > likely to turn polling into a deadlock.
> >
> > So we need to get rid of calls to functions like bdrv_getlength() or
> > bdrv_truncate() there as these functions poll internally. They are
> > currently used so that when no parent has write/resize permissions on
> > the image any more, the preallocate filter drops the extra preallocated
> > area in the image file and gives up write/resize permissions itself.
> >
> > In order to achieve this without polling in .bdrv_check_perm, don't
> > immediately truncate the image, but only schedule a BH to do so. The
> > filter keeps the write/resize permissions a bit longer now until the BH
> > has executed.
> >
> > There is one case in which delaying doesn't work: Reopening the image
> > read-only. In this case, bs->file will likely be reopened read-only,
> > too, so keeping write permissions a bit longer on it doesn't work. But
> > we can already cover this case in preallocate_reopen_prepare() and not
> > rely on the permission updates for it.
>
> Hmm, now I found one more "future" case.
>
> I now try to rebase my "[PATCH v7 0/7] blockdev-replace"
> https://patchew.org/QEMU/20230421114102.884457-1-vsementsov@yandex-team.ru/
>
> And it breaks after this commit.
>
> By accident, blockdev-replace series uses exactly "preallocate" filter
> to test insertion/removing of filters. And removing is broken now.
>
> Removing is done as follows:
>
> 1. We have filter inserted: disk0 -file-> filter -file-> file0
>
> 2. blockdev-replace, replaces file child of disk0, so we should get the picture*: disk0 -file-> file0 <-file- filter
>
> 3. blockdev-del filter
>
>
> But step [2] fails, as now preallocate filter doesn't drop permissions
> during the operation (postponing this for a while) and the picture* is
> impossible. Permission check fails.
>
> Hmmm... Any idea how blockdev-replace and preallocate filter should
> work :) ? Maybe, doing truncation in .drain_begin() will help? Will
> try
Hm... What preallocate tries to do is really tricky...
Of course, the error is correct, this is an invalid configuration if
preallocate can still resize the image. So it would have to truncate the
file earlier, but the first time that preallocate knows of the change is
already too late to run requests.
Truncating on drain_begin feels more like a hack, but as long as it does
the job... Of course, this will have the preallocation truncated away on
events that have nothing to do with removing the filter. It's not
necessarily a disaster because preallocation is only an optimisation,
but it doesn't feel great.
Maybe let's take a step back: Which scenario is the preallocate driver
meant for and why do we even need to truncate the image file after
removing the filter? I suppose the filter doesn't make sense with raw
images because these are fixed size anyway, and pretty much any other
image format should be able to tolerate a permanently rounded up file
size. As long as you don't write to the preallocated area, it shouldn't
take space either on any sane filesystem.
Hmm, actually both VHD and VMDK can have footers, better avoid it with
those... But if truncating the image file on close is critical, what do
you do on crashes? Maybe preallocate should just not be considered
compatible with these formats?
Kevin
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
2023-10-06 8:56 ` Kevin Wolf
@ 2023-10-06 18:10 ` Vladimir Sementsov-Ogievskiy
2023-10-09 9:28 ` Denis V. Lunev
0 siblings, 1 reply; 31+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2023-10-06 18:10 UTC (permalink / raw)
To: Kevin Wolf
Cc: qemu-block, stefanha, eblake, eesposit, pbonzini, qemu-devel,
Denis V. Lunev
On 06.10.23 11:56, Kevin Wolf wrote:
> Am 05.10.2023 um 21:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> On 11.09.23 12:46, Kevin Wolf wrote:
>>> When the permission related BlockDriver callbacks are called, we are in
>>> the middle of an operation traversing the block graph. Polling in such a
>>> place is a very bad idea because the graph could change in unexpected
>>> ways. In the future, callers will also hold the graph lock, which is
>>> likely to turn polling into a deadlock.
>>>
>>> So we need to get rid of calls to functions like bdrv_getlength() or
>>> bdrv_truncate() there as these functions poll internally. They are
>>> currently used so that when no parent has write/resize permissions on
>>> the image any more, the preallocate filter drops the extra preallocated
>>> area in the image file and gives up write/resize permissions itself.
>>>
>>> In order to achieve this without polling in .bdrv_check_perm, don't
>>> immediately truncate the image, but only schedule a BH to do so. The
>>> filter keeps the write/resize permissions a bit longer now until the BH
>>> has executed.
>>>
>>> There is one case in which delaying doesn't work: Reopening the image
>>> read-only. In this case, bs->file will likely be reopened read-only,
>>> too, so keeping write permissions a bit longer on it doesn't work. But
>>> we can already cover this case in preallocate_reopen_prepare() and not
>>> rely on the permission updates for it.
>>
>> Hmm, now I found one more "future" case.
>>
>> I now try to rebase my "[PATCH v7 0/7] blockdev-replace"
>> https://patchew.org/QEMU/20230421114102.884457-1-vsementsov@yandex-team.ru/
>>
>> And it breaks after this commit.
>>
>> By accident, blockdev-replace series uses exactly "preallocate" filter
>> to test insertion/removing of filters. And removing is broken now.
>>
>> Removing is done as follows:
>>
>> 1. We have filter inserted: disk0 -file-> filter -file-> file0
>>
>> 2. blockdev-replace, replaces file child of disk0, so we should get the picture*: disk0 -file-> file0 <-file- filter
>>
>> 3. blockdev-del filter
>>
>>
>> But step [2] fails, as now preallocate filter doesn't drop permissions
>> during the operation (postponing this for a while) and the picture* is
>> impossible. Permission check fails.
>>
>> Hmmm... Any idea how blockdev-replace and preallocate filter should
>> work :) ? Maybe, doing truncation in .drain_begin() will help? Will
>> try
>
> Hm... What preallocate tries to do is really tricky...
>
> Of course, the error is correct, this is an invalid configuration if
> preallocate can still resize the image. So it would have to truncate the
> file earlier, but the first time that preallocate knows of the change is
> already too late to run requests.
>
> Truncating on drain_begin feels more like a hack, but as long as it does
> the job... Of course, this will have the preallocation truncated away on
> events that have nothing to do with removing the filter. It's not
> necessarily a disaster because preallocation is only an optimisation,
> but it doesn't feel great.
Hmm, yes, that's not good.
>
> Maybe let's take a step back: Which scenario is the preallocate driver
> meant for and why do we even need to truncate the image file after
> removing the filter? I suppose the filter doesn't make sense with raw
> images because these are fixed size anyway, and pretty much any other
> image format should be able to tolerate a permanently rounded up file
> size. As long as you don't write to the preallocated area, it shouldn't
> take space either on any sane filesystem.
>
> Hmm, actually both VHD and VMDK can have footers, better avoid it with
> those... But if truncating the image file on close is critical, what do
> you do on crashes? Maybe preallocate should just not be considered
> compatible with these formats?
>
Originally preallocate filter was made to be used with qcow2, on some proprietary storage, where:
1. Allocating of big chunk works a lot fater than allocating several smaller chunks
2. Holes are not free and/or file length is not free, so we really want to truncate the file back on close
Den, correct me if I'm wrong.
Good thing is that in this scenario we don't need to remove the filter in runtime, so there is no problem.
Now I think that the generic solution is just add a new handler .bdrv_pre_replace, so blockdev-replace may work as follows:
drain_begin
call .bdrv_pre_replace for all affected nodes
do the replace
drain_end
And prellocate filter would do truncation in this .bdrv_pre_replace handler and set some flag, that we have nothing to trunctate (the flag is automatically cleared on .drained_end handler). Then during permission update if we see that nothing-to-truncate flag, we can drop permissions immediately.
But this difficulty may be postponed, and I can just document that preallocate filter doesn't support removing in runtime (and modify the test to use another filter, or just not to remove preallocate filter).
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
2023-10-06 18:10 ` Vladimir Sementsov-Ogievskiy
@ 2023-10-09 9:28 ` Denis V. Lunev
0 siblings, 0 replies; 31+ messages in thread
From: Denis V. Lunev @ 2023-10-09 9:28 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy, Kevin Wolf
Cc: qemu-block, stefanha, eblake, eesposit, pbonzini, qemu-devel,
Denis V. Lunev
On 10/6/23 20:10, Vladimir Sementsov-Ogievskiy wrote:
> On 06.10.23 11:56, Kevin Wolf wrote:
>> Am 05.10.2023 um 21:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>> On 11.09.23 12:46, Kevin Wolf wrote:
>>>> When the permission related BlockDriver callbacks are called, we
>>>> are in
>>>> the middle of an operation traversing the block graph. Polling in
>>>> such a
>>>> place is a very bad idea because the graph could change in unexpected
>>>> ways. In the future, callers will also hold the graph lock, which is
>>>> likely to turn polling into a deadlock.
>>>>
>>>> So we need to get rid of calls to functions like bdrv_getlength() or
>>>> bdrv_truncate() there as these functions poll internally. They are
>>>> currently used so that when no parent has write/resize permissions on
>>>> the image any more, the preallocate filter drops the extra
>>>> preallocated
>>>> area in the image file and gives up write/resize permissions itself.
>>>>
>>>> In order to achieve this without polling in .bdrv_check_perm, don't
>>>> immediately truncate the image, but only schedule a BH to do so. The
>>>> filter keeps the write/resize permissions a bit longer now until
>>>> the BH
>>>> has executed.
>>>>
>>>> There is one case in which delaying doesn't work: Reopening the image
>>>> read-only. In this case, bs->file will likely be reopened read-only,
>>>> too, so keeping write permissions a bit longer on it doesn't work. But
>>>> we can already cover this case in preallocate_reopen_prepare() and not
>>>> rely on the permission updates for it.
>>>
>>> Hmm, now I found one more "future" case.
>>>
>>> I now try to rebase my "[PATCH v7 0/7] blockdev-replace"
>>> https://patchew.org/QEMU/20230421114102.884457-1-vsementsov@yandex-team.ru/
>>>
>>>
>>> And it breaks after this commit.
>>>
>>> By accident, blockdev-replace series uses exactly "preallocate" filter
>>> to test insertion/removing of filters. And removing is broken now.
>>>
>>> Removing is done as follows:
>>>
>>> 1. We have filter inserted: disk0 -file-> filter -file-> file0
>>>
>>> 2. blockdev-replace, replaces file child of disk0, so we should get
>>> the picture*: disk0 -file-> file0 <-file- filter
>>>
>>> 3. blockdev-del filter
>>>
>>>
>>> But step [2] fails, as now preallocate filter doesn't drop permissions
>>> during the operation (postponing this for a while) and the picture* is
>>> impossible. Permission check fails.
>>>
>>> Hmmm... Any idea how blockdev-replace and preallocate filter should
>>> work :) ? Maybe, doing truncation in .drain_begin() will help? Will
>>> try
>>
>> Hm... What preallocate tries to do is really tricky...
>>
>> Of course, the error is correct, this is an invalid configuration if
>> preallocate can still resize the image. So it would have to truncate the
>> file earlier, but the first time that preallocate knows of the change is
>> already too late to run requests.
>>
>> Truncating on drain_begin feels more like a hack, but as long as it does
>> the job... Of course, this will have the preallocation truncated away on
>> events that have nothing to do with removing the filter. It's not
>> necessarily a disaster because preallocation is only an optimisation,
>> but it doesn't feel great.
>
> Hmm, yes, that's not good.
>
>>
>> Maybe let's take a step back: Which scenario is the preallocate driver
>> meant for and why do we even need to truncate the image file after
>> removing the filter? I suppose the filter doesn't make sense with raw
>> images because these are fixed size anyway, and pretty much any other
>> image format should be able to tolerate a permanently rounded up file
>> size. As long as you don't write to the preallocated area, it shouldn't
>> take space either on any sane filesystem.
>>
>> Hmm, actually both VHD and VMDK can have footers, better avoid it with
>> those... But if truncating the image file on close is critical, what do
>> you do on crashes? Maybe preallocate should just not be considered
>> compatible with these formats?
>>
>
> Originally preallocate filter was made to be used with qcow2, on some
> proprietary storage, where:
>
> 1. Allocating of big chunk works a lot fater than allocating several
> smaller chunks
> 2. Holes are not free and/or file length is not free, so we really
> want to truncate the file back on close
>
> Den, correct me if I'm wrong.
1. Absolutely correct. This is true when the file attributes
are stored in a centralized place aka metadata storage
and requests to it does not scale well.
2. This is at my opinion has different meaning. We have
tried to make local storage behavior and distributed
storage behavior to be the same when VM is off, i.e.
the file should be in the same state (no free blocks
at the end of the file).
>
> Good thing is that in this scenario we don't need to remove the filter
> in runtime, so there is no problem.
>
Yes, this filter is not dynamic in that respect. It is either
here or not here.
>
> Now I think that the generic solution is just add a new handler
> .bdrv_pre_replace, so blockdev-replace may work as follows:
>
> drain_begin
>
> call .bdrv_pre_replace for all affected nodes
>
> do the replace
>
> drain_end
>
> And prellocate filter would do truncation in this .bdrv_pre_replace
> handler and set some flag, that we have nothing to trunctate (the flag
> is automatically cleared on .drained_end handler). Then during
> permission update if we see that nothing-to-truncate flag, we can drop
> permissions immediately.
>
> But this difficulty may be postponed, and I can just document that
> preallocate filter doesn't support removing in runtime (and modify the
> test to use another filter, or just not to remove preallocate filter).
>
Den
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 04/21] block: Take AioContext lock for bdrv_append() more consistently
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (2 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 03/21] preallocate: Don't poll during permission updates Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
` (17 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The documentation for bdrv_append() says that the caller must hold the
AioContext lock for bs_top. Change all callers to actually adhere to the
contract.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/unit/test-bdrv-drain.c | 3 +++
tests/unit/test-bdrv-graph-mod.c | 6 ++++++
tests/unit/test-block-iothread.c | 3 +++
3 files changed, 12 insertions(+)
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index ccc453c29e..89c8fa6780 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1359,7 +1359,10 @@ static void test_append_to_drained(void)
g_assert_cmpint(base_s->drain_count, ==, 1);
g_assert_cmpint(base->in_flight, ==, 0);
+ aio_context_acquire(qemu_get_aio_context());
bdrv_append(overlay, base, &error_abort);
+ aio_context_release(qemu_get_aio_context());
+
g_assert_cmpint(base->in_flight, ==, 0);
g_assert_cmpint(overlay->in_flight, ==, 0);
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 36eed4b464..d8503165b3 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -140,8 +140,10 @@ static void test_update_perm_tree(void)
bdrv_attach_child(filter, bs, "child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
+ aio_context_acquire(qemu_get_aio_context());
ret = bdrv_append(filter, bs, NULL);
g_assert_cmpint(ret, <, 0);
+ aio_context_release(qemu_get_aio_context());
bdrv_unref(filter);
blk_unref(root);
@@ -205,7 +207,9 @@ static void test_should_update_child(void)
g_assert(target->backing->bs == bs);
bdrv_attach_child(filter, target, "target", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
+ aio_context_acquire(qemu_get_aio_context());
bdrv_append(filter, bs, &error_abort);
+ aio_context_release(qemu_get_aio_context());
g_assert(target->backing->bs == bs);
bdrv_unref(filter);
@@ -410,7 +414,9 @@ static void test_append_greedy_filter(void)
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
+ aio_context_acquire(qemu_get_aio_context());
bdrv_append(fl, base, &error_abort);
+ aio_context_release(qemu_get_aio_context());
bdrv_unref(fl);
bdrv_unref(top);
}
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index d727a5fee8..9155547313 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -756,11 +756,14 @@ static void test_propagate_mirror(void)
&error_abort);
/* Start a mirror job */
+ aio_context_acquire(main_ctx);
mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
&error_abort);
+ aio_context_release(main_ctx);
+
WITH_JOB_LOCK_GUARD() {
job = job_get_locked("job0");
}
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 05/21] block: Introduce bdrv_schedule_unref()
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (3 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 04/21] block: Take AioContext lock for bdrv_append() more consistently Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-12 16:49 ` Stefan Hajnoczi
2023-09-11 9:46 ` [PATCH v2 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
` (16 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
bdrv_unref() is called by a lot of places that need to hold the graph
lock (it naturally happens in the context of operations that change the
graph). However, bdrv_unref() takes the graph writer lock internally, so
it can't actually be called while already holding a graph lock without
causing a deadlock.
bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
node before closing it, and draining requires that the graph is
unlocked.
The solution is to defer deleting the node until we don't hold the lock
any more and draining is possible again.
Note that keeping images open for longer than necessary can create
problems, too: You can't open an image again before it is really closed
(if image locking didn't prevent it, it would cause corruption).
Reopening an image immediately happens at least during bdrv_open() and
bdrv_co_create().
In order to solve this problem, make sure to run the deferred unref in
bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
The output of iotest 051 is updated because the additional polling
changes the order of HMP output, resulting in a new "(qemu)" prompt in
the test output that was previously on a separate line and filtered out.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
include/block/block-global-state.h | 1 +
block.c | 17 +++++++++++++++++
block/graph-lock.c | 26 +++++++++++++++++++-------
tests/qemu-iotests/051.pc.out | 6 +++---
4 files changed, 40 insertions(+), 10 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index f347199bff..e570799f85 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -224,6 +224,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
void bdrv_ref(BlockDriverState *bs);
void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
+void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
BlockDriverState *child_bs,
diff --git a/block.c b/block.c
index 9029ddd9ff..c8ac7cfac4 100644
--- a/block.c
+++ b/block.c
@@ -7044,6 +7044,23 @@ void bdrv_unref(BlockDriverState *bs)
}
}
+/*
+ * Release a BlockDriverState reference while holding the graph write lock.
+ *
+ * Calling bdrv_unref() directly is forbidden while holding the graph lock
+ * because bdrv_close() both involves polling and taking the graph lock
+ * internally. bdrv_schedule_unref() instead delays decreasing the refcount and
+ * possibly closing @bs until the graph lock is released.
+ */
+void bdrv_schedule_unref(BlockDriverState *bs)
+{
+ if (!bs) {
+ return;
+ }
+ aio_bh_schedule_oneshot(qemu_get_aio_context(),
+ (QEMUBHFunc *) bdrv_unref, bs);
+}
+
struct BdrvOpBlocker {
Error *reason;
QLIST_ENTRY(BdrvOpBlocker) list;
diff --git a/block/graph-lock.c b/block/graph-lock.c
index f357a2c0b1..58a799065f 100644
--- a/block/graph-lock.c
+++ b/block/graph-lock.c
@@ -163,17 +163,29 @@ void bdrv_graph_wrlock(BlockDriverState *bs)
void bdrv_graph_wrunlock(void)
{
GLOBAL_STATE_CODE();
- QEMU_LOCK_GUARD(&aio_context_list_lock);
assert(qatomic_read(&has_writer));
+ WITH_QEMU_LOCK_GUARD(&aio_context_list_lock) {
+ /*
+ * No need for memory barriers, this works in pair with
+ * the slow path of rdlock() and both take the lock.
+ */
+ qatomic_store_release(&has_writer, 0);
+
+ /* Wake up all coroutines that are waiting to read the graph */
+ qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
+ }
+
/*
- * No need for memory barriers, this works in pair with
- * the slow path of rdlock() and both take the lock.
+ * Run any BHs that were scheduled during the wrlock section and that
+ * callers might expect to have finished (in particular, this is important
+ * for bdrv_schedule_unref()).
+ *
+ * Do this only after restarting coroutines so that nested event loops in
+ * BHs don't deadlock if their condition relies on the coroutine making
+ * progress.
*/
- qatomic_store_release(&has_writer, 0);
-
- /* Wake up all coroutine that are waiting to read the graph */
- qemu_co_enter_all(&reader_queue, &aio_context_list_lock);
+ aio_bh_poll(qemu_get_aio_context());
}
void coroutine_fn bdrv_graph_co_rdlock(void)
diff --git a/tests/qemu-iotests/051.pc.out b/tests/qemu-iotests/051.pc.out
index 4d4af5a486..7e10c5fa1b 100644
--- a/tests/qemu-iotests/051.pc.out
+++ b/tests/qemu-iotests/051.pc.out
@@ -169,11 +169,11 @@ QEMU_PROG: -device scsi-hd,drive=disk: Device needs media, but drive is empty
Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device ide-hd,drive=disk,share-rw=on
QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device ide-hd,drive=disk,share-rw=on: Cannot change iothread of active block backend
Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,share-rw=on
QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,share-rw=on: Cannot change iothread of active block backend
Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device lsi53c895a,id=lsi0 -device scsi-hd,bus=lsi0.0,drive=disk,share-rw=on
QEMU X.Y.Z monitor - type 'help' for more information
@@ -185,7 +185,7 @@ QEMU X.Y.Z monitor - type 'help' for more information
Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on
QEMU X.Y.Z monitor - type 'help' for more information
-QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
+(qemu) QEMU_PROG: -device virtio-blk-pci,drive=disk,iothread=thread0,share-rw=on: Cannot change iothread of active block backend
Testing: -drive file=TEST_DIR/t.qcow2,if=none,node-name=disk -object iothread,id=thread0 -device virtio-scsi,iothread=thread0,id=virtio-scsi0 -device scsi-hd,bus=virtio-scsi0.0,drive=disk,share-rw=on -device virtio-scsi,id=virtio-scsi1,iothread=thread0 -device scsi-hd,bus=virtio-scsi1.0,drive=disk,share-rw=on
QEMU X.Y.Z monitor - type 'help' for more information
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 05/21] block: Introduce bdrv_schedule_unref()
2023-09-11 9:46 ` [PATCH v2 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
@ 2023-09-12 16:49 ` Stefan Hajnoczi
0 siblings, 0 replies; 31+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 16:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1823 bytes --]
On Mon, Sep 11, 2023 at 11:46:04AM +0200, Kevin Wolf wrote:
> bdrv_unref() is called by a lot of places that need to hold the graph
> lock (it naturally happens in the context of operations that change the
> graph). However, bdrv_unref() takes the graph writer lock internally, so
> it can't actually be called while already holding a graph lock without
> causing a deadlock.
>
> bdrv_unref() also can't just become GRAPH_WRLOCK because it drains the
> node before closing it, and draining requires that the graph is
> unlocked.
>
> The solution is to defer deleting the node until we don't hold the lock
> any more and draining is possible again.
>
> Note that keeping images open for longer than necessary can create
> problems, too: You can't open an image again before it is really closed
> (if image locking didn't prevent it, it would cause corruption).
> Reopening an image immediately happens at least during bdrv_open() and
> bdrv_co_create().
>
> In order to solve this problem, make sure to run the deferred unref in
> bdrv_graph_wrunlock(), i.e. the first possible place where we can drain
> again. This is also why bdrv_schedule_unref() is marked GRAPH_WRLOCK.
>
> The output of iotest 051 is updated because the additional polling
> changes the order of HMP output, resulting in a new "(qemu)" prompt in
> the test output that was previously on a separate line and filtered out.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
> include/block/block-global-state.h | 1 +
> block.c | 17 +++++++++++++++++
> block/graph-lock.c | 26 +++++++++++++++++++-------
> tests/qemu-iotests/051.pc.out | 6 +++---
> 4 files changed, 40 insertions(+), 10 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (4 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 05/21] block: Introduce bdrv_schedule_unref() Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
` (15 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Add a new wrapper type for GRAPH_WRLOCK functions that should be called
from coroutine context.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block-common.h | 4 ++++
scripts/block-coroutine-wrapper.py | 11 +++++++++++
2 files changed, 15 insertions(+)
diff --git a/include/block/block-common.h b/include/block/block-common.h
index df5ffc8d09..3bbc5d9294 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -66,10 +66,14 @@
* function. The coroutine yields after scheduling the BH and is reentered when
* the wrapped function returns.
*
+ * A no_co_wrapper_bdrv_wrlock function is a no_co_wrapper function that
+ * automatically takes the graph wrlock when calling the wrapped function.
+ *
* If the first parameter of the function is a BlockDriverState, BdrvChild or
* BlockBackend pointer, the AioContext lock for it is taken in the wrapper.
*/
#define no_co_wrapper
+#define no_co_wrapper_bdrv_wrlock
#include "block/blockjob.h"
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index d4a183db61..fa01c06567 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -71,10 +71,13 @@ def __init__(self, wrapper_type: str, return_type: str, name: str,
self.args = [ParamDecl(arg.strip()) for arg in args.split(',')]
self.create_only_co = 'mixed' not in variant
self.graph_rdlock = 'bdrv_rdlock' in variant
+ self.graph_wrlock = 'bdrv_wrlock' in variant
self.wrapper_type = wrapper_type
if wrapper_type == 'co':
+ if self.graph_wrlock:
+ raise ValueError(f"co function can't be wrlock: {self.name}")
subsystem, subname = self.name.split('_', 1)
self.target_name = f'{subsystem}_co_{subname}'
else:
@@ -250,6 +253,12 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
name = func.target_name
struct_name = func.struct_name
+ graph_lock=''
+ graph_unlock=''
+ if func.graph_wrlock:
+ graph_lock=' bdrv_graph_wrlock(NULL);'
+ graph_unlock=' bdrv_graph_wrunlock();'
+
return f"""\
/*
* Wrappers for {name}
@@ -266,9 +275,11 @@ def gen_no_co_wrapper(func: FuncDecl) -> str:
{struct_name} *s = opaque;
AioContext *ctx = {func.gen_ctx('s->')};
+{graph_lock}
aio_context_acquire(ctx);
{func.get_result}{name}({ func.gen_list('s->{name}') });
aio_context_release(ctx);
+{graph_unlock}
aio_co_wake(s->co);
}}
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 07/21] block-coroutine-wrapper: Allow arbitrary parameter names
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (5 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 06/21] block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
` (14 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Don't assume specific parameter names like 'bs' or 'blk' in the
generated code, but use the actual name.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
scripts/block-coroutine-wrapper.py | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py
index fa01c06567..685d0b4ed4 100644
--- a/scripts/block-coroutine-wrapper.py
+++ b/scripts/block-coroutine-wrapper.py
@@ -105,12 +105,13 @@ def __init__(self, wrapper_type: str, return_type: str, name: str,
def gen_ctx(self, prefix: str = '') -> str:
t = self.args[0].type
+ name = self.args[0].name
if t == 'BlockDriverState *':
- return f'bdrv_get_aio_context({prefix}bs)'
+ return f'bdrv_get_aio_context({prefix}{name})'
elif t == 'BdrvChild *':
- return f'bdrv_get_aio_context({prefix}child->bs)'
+ return f'bdrv_get_aio_context({prefix}{name}->bs)'
elif t == 'BlockBackend *':
- return f'blk_get_aio_context({prefix}blk)'
+ return f'blk_get_aio_context({prefix}{name})'
else:
return 'qemu_get_aio_context()'
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (6 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 07/21] block-coroutine-wrapper: Allow arbitrary parameter names Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
` (13 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_noperm(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/block.c b/block.c
index c8ac7cfac4..61856f5c33 100644
--- a/block.c
+++ b/block.c
@@ -91,8 +91,9 @@ static BlockDriverState *bdrv_open_inherit(const char *filename,
static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
-static void bdrv_replace_child_noperm(BdrvChild *child,
- BlockDriverState *new_bs);
+static void GRAPH_WRLOCK
+bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs);
+
static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
@@ -2387,6 +2388,8 @@ static void bdrv_replace_child_abort(void *opaque)
BlockDriverState *new_bs = s->child->bs;
GLOBAL_STATE_CODE();
+ bdrv_graph_wrlock(s->old_bs);
+
/* old_bs reference is transparently moved from @s to @s->child */
if (!s->child->bs) {
/*
@@ -2403,6 +2406,8 @@ static void bdrv_replace_child_abort(void *opaque)
}
assert(s->child->quiesced_parent);
bdrv_replace_child_noperm(s->child, s->old_bs);
+
+ bdrv_graph_wrunlock();
bdrv_unref(new_bs);
}
@@ -2439,7 +2444,10 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
if (new_bs) {
bdrv_ref(new_bs);
}
+
+ bdrv_graph_wrlock(new_bs);
bdrv_replace_child_noperm(child, new_bs);
+ bdrv_graph_wrunlock();
/* old_bs reference is transparently moved from @child to @s */
}
@@ -2858,8 +2866,8 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
* If @new_bs is non-NULL, the parent of @child must already be drained through
* @child and the caller must hold the AioContext lock for @new_bs.
*/
-static void bdrv_replace_child_noperm(BdrvChild *child,
- BlockDriverState *new_bs)
+static void GRAPH_WRLOCK
+bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs)
{
BlockDriverState *old_bs = child->bs;
int new_bs_quiesce_counter;
@@ -2894,8 +2902,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
assert(bdrv_get_aio_context(old_bs) == bdrv_get_aio_context(new_bs));
}
- /* TODO Pull this up into the callers to avoid polling here */
- bdrv_graph_wrlock(new_bs);
if (old_bs) {
if (child->klass->detach) {
child->klass->detach(child);
@@ -2911,7 +2917,6 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
child->klass->attach(child);
}
}
- bdrv_graph_wrunlock();
/*
* If the parent was drained through this BdrvChild previously, but new_bs
@@ -2952,7 +2957,10 @@ static void bdrv_attach_child_common_abort(void *opaque)
BlockDriverState *bs = s->child->bs;
GLOBAL_STATE_CODE();
+
+ bdrv_graph_wrlock(NULL);
bdrv_replace_child_noperm(s->child, NULL);
+ bdrv_graph_wrunlock();
if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
@@ -3080,8 +3088,10 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
* a problem, we already did this), but it will still poll until the parent
* is fully quiesced, so it will not be negatively affected either.
*/
+ bdrv_graph_wrlock(child_bs);
bdrv_parent_drained_begin_single(new_child);
bdrv_replace_child_noperm(new_child, child_bs);
+ bdrv_graph_wrunlock();
BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
*s = (BdrvAttachChildCommonState) {
@@ -3225,7 +3235,9 @@ void bdrv_root_unref_child(BdrvChild *child)
BlockDriverState *child_bs = child->bs;
GLOBAL_STATE_CODE();
+ bdrv_graph_wrlock(NULL);
bdrv_replace_child_noperm(child, NULL);
+ bdrv_graph_wrunlock();
bdrv_child_free(child);
if (child_bs) {
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (7 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 08/21] block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
` (12 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_replace_child_tran(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
While a graph lock is held, polling is not allowed. Therefore draining
the necessary nodes can no longer be done in bdrv_remove_child() and
bdrv_replace_node_noperm(), but the callers must already make sure that
they are drained.
Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_attach_child_noperm(), which currently
requires to be called unlocked. Once it changes, the transaction
callbacks can be changed, too.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 78 ++++++++++++++++++++++++++++++++++++---------------------
1 file changed, 50 insertions(+), 28 deletions(-)
diff --git a/block.c b/block.c
index 61856f5c33..0973b91d98 100644
--- a/block.c
+++ b/block.c
@@ -94,7 +94,8 @@ static bool bdrv_recurse_has_child(BlockDriverState *bs,
static void GRAPH_WRLOCK
bdrv_replace_child_noperm(BdrvChild *child, BlockDriverState *new_bs);
-static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
+static void GRAPH_WRLOCK
+bdrv_remove_child(BdrvChild *child, Transaction *tran);
static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue,
@@ -2427,8 +2428,9 @@ static TransactionActionDrv bdrv_replace_child_drv = {
*
* The function doesn't update permissions, caller is responsible for this.
*/
-static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
- Transaction *tran)
+static void GRAPH_WRLOCK
+bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
+ Transaction *tran)
{
BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
@@ -2445,9 +2447,7 @@ static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
bdrv_ref(new_bs);
}
- bdrv_graph_wrlock(new_bs);
bdrv_replace_child_noperm(child, new_bs);
- bdrv_graph_wrunlock();
/* old_bs reference is transparently moved from @child to @s */
}
@@ -3439,8 +3439,14 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
}
if (child) {
+ bdrv_drained_begin(child->bs);
+ bdrv_graph_wrlock(NULL);
+
bdrv_unset_inherits_from(parent_bs, child, tran);
bdrv_remove_child(child, tran);
+
+ bdrv_graph_wrunlock();
+ bdrv_drained_end(child->bs);
}
if (!child_bs) {
@@ -5133,7 +5139,7 @@ void bdrv_close_all(void)
assert(QTAILQ_EMPTY(&all_bdrv_states));
}
-static bool should_update_child(BdrvChild *c, BlockDriverState *to)
+static bool GRAPH_RDLOCK should_update_child(BdrvChild *c, BlockDriverState *to)
{
GQueue *queue;
GHashTable *found;
@@ -5222,45 +5228,41 @@ static TransactionActionDrv bdrv_remove_child_drv = {
.commit = bdrv_remove_child_commit,
};
-/* Function doesn't update permissions, caller is responsible for this. */
-static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
+/*
+ * Function doesn't update permissions, caller is responsible for this.
+ *
+ * @child->bs (if non-NULL) must be drained.
+ */
+static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
{
if (!child) {
return;
}
if (child->bs) {
- BlockDriverState *bs = child->bs;
- bdrv_drained_begin(bs);
+ assert(child->quiesced_parent);
bdrv_replace_child_tran(child, NULL, tran);
- bdrv_drained_end(bs);
}
tran_add(tran, &bdrv_remove_child_drv, child);
}
-static void undrain_on_clean_cb(void *opaque)
-{
- bdrv_drained_end(opaque);
-}
-
-static TransactionActionDrv undrain_on_clean = {
- .clean = undrain_on_clean_cb,
-};
-
-static int bdrv_replace_node_noperm(BlockDriverState *from,
- BlockDriverState *to,
- bool auto_skip, Transaction *tran,
- Error **errp)
+/*
+ * Both @from and @to (if non-NULL) must be drained. @to must be kept drained
+ * until the transaction is completed.
+ */
+static int GRAPH_WRLOCK
+bdrv_replace_node_noperm(BlockDriverState *from,
+ BlockDriverState *to,
+ bool auto_skip, Transaction *tran,
+ Error **errp)
{
BdrvChild *c, *next;
GLOBAL_STATE_CODE();
- bdrv_drained_begin(from);
- bdrv_drained_begin(to);
- tran_add(tran, &undrain_on_clean, from);
- tran_add(tran, &undrain_on_clean, to);
+ assert(from->quiesce_counter);
+ assert(to->quiesce_counter);
QLIST_FOREACH_SAFE(c, &from->parents, next_parent, next) {
assert(c->bs == from);
@@ -5323,6 +5325,9 @@ static int bdrv_replace_node_common(BlockDriverState *from,
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
assert(bdrv_get_aio_context(from) == bdrv_get_aio_context(to));
bdrv_drained_begin(from);
+ bdrv_drained_begin(to);
+
+ bdrv_graph_wrlock(to);
/*
* Do the replacement without permission update.
@@ -5336,6 +5341,7 @@ static int bdrv_replace_node_common(BlockDriverState *from,
}
if (detach_subchain) {
+ /* to_cow_parent is already drained because from is drained */
bdrv_remove_child(bdrv_filter_or_cow_child(to_cow_parent), tran);
}
@@ -5350,8 +5356,10 @@ static int bdrv_replace_node_common(BlockDriverState *from,
ret = 0;
out:
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_drained_end(to);
bdrv_drained_end(from);
bdrv_unref(from);
@@ -5395,6 +5403,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
BdrvChild *child;
Transaction *tran = tran_new();
AioContext *old_context, *new_context = NULL;
+ bool drained = false;
GLOBAL_STATE_CODE();
@@ -5423,7 +5432,13 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
aio_context_acquire(new_context);
}
+ bdrv_drained_begin(bs_new);
+ bdrv_drained_begin(bs_top);
+ drained = true;
+
+ bdrv_graph_wrlock(bs_new);
ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
+ bdrv_graph_wrunlock();
if (ret < 0) {
goto out;
}
@@ -5436,6 +5451,11 @@ out:
bdrv_refresh_limits(bs_top, NULL, NULL);
bdrv_graph_rdunlock_main_loop();
+ if (drained) {
+ bdrv_drained_end(bs_top);
+ bdrv_drained_end(bs_new);
+ }
+
if (new_context && old_context != new_context) {
aio_context_release(new_context);
aio_context_acquire(old_context);
@@ -5458,6 +5478,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
bdrv_ref(old_bs);
bdrv_drained_begin(old_bs);
bdrv_drained_begin(new_bs);
+ bdrv_graph_wrlock(new_bs);
bdrv_replace_child_tran(child, new_bs, tran);
@@ -5465,6 +5486,7 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
refresh_list = g_slist_prepend(refresh_list, new_bs);
ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (8 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 09/21] block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 11/21] block: Call transaction callbacks with lock held Kevin Wolf
` (11 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Note that the transaction callbacks still take the lock internally, so
tran_finalize() must be called without the lock held. This is because
bdrv_append() also calls bdrv_replace_node_noperm(), which currently
requires the transaction callbacks to be called unlocked. In the next
step, both of them can be switched to locked tran_finalize() calls
together.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 133 +++++++++++++++++++++++++++++++------------------
block/stream.c | 20 ++++++--
2 files changed, 100 insertions(+), 53 deletions(-)
diff --git a/block.c b/block.c
index 0973b91d98..f6e7cf4fb9 100644
--- a/block.c
+++ b/block.c
@@ -3004,13 +3004,14 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
* @child_bs can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
*/
-static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
- const char *child_name,
- const BdrvChildClass *child_class,
- BdrvChildRole child_role,
- uint64_t perm, uint64_t shared_perm,
- void *opaque,
- Transaction *tran, Error **errp)
+static BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child_common(BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildClass *child_class,
+ BdrvChildRole child_role,
+ uint64_t perm, uint64_t shared_perm,
+ void *opaque,
+ Transaction *tran, Error **errp)
{
BdrvChild *new_child;
AioContext *parent_ctx, *new_child_ctx;
@@ -3088,10 +3089,8 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
* a problem, we already did this), but it will still poll until the parent
* is fully quiesced, so it will not be negatively affected either.
*/
- bdrv_graph_wrlock(child_bs);
bdrv_parent_drained_begin_single(new_child);
bdrv_replace_child_noperm(new_child, child_bs);
- bdrv_graph_wrunlock();
BdrvAttachChildCommonState *s = g_new(BdrvAttachChildCommonState, 1);
*s = (BdrvAttachChildCommonState) {
@@ -3116,13 +3115,14 @@ static BdrvChild *bdrv_attach_child_common(BlockDriverState *child_bs,
* @child_bs can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
*/
-static BdrvChild *bdrv_attach_child_noperm(BlockDriverState *parent_bs,
- BlockDriverState *child_bs,
- const char *child_name,
- const BdrvChildClass *child_class,
- BdrvChildRole child_role,
- Transaction *tran,
- Error **errp)
+static BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child_noperm(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildClass *child_class,
+ BdrvChildRole child_role,
+ Transaction *tran,
+ Error **errp)
{
uint64_t perm, shared_perm;
@@ -3167,6 +3167,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
GLOBAL_STATE_CODE();
+ bdrv_graph_wrlock(child_bs);
+
child = bdrv_attach_child_common(child_bs, child_name, child_class,
child_role, perm, shared_perm, opaque,
tran, errp);
@@ -3178,6 +3180,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
ret = bdrv_refresh_perms(child_bs, tran, errp);
out:
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
bdrv_unref(child_bs);
@@ -3209,6 +3212,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
GLOBAL_STATE_CODE();
+ bdrv_graph_wrlock(child_bs);
+
child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class, child_role, tran, errp);
if (!child) {
@@ -3222,6 +3227,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
}
out:
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
bdrv_unref(child_bs);
@@ -3379,16 +3385,20 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
* Sets the bs->backing or bs->file link of a BDS. A new reference is created;
* callers which don't need their own reference any more must call bdrv_unref().
*
+ * If the respective child is already present (i.e. we're detaching a node),
+ * that child node must be drained.
+ *
* Function doesn't update permissions, caller is responsible for this.
*
* The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
* @child_bs can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
*/
-static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
- BlockDriverState *child_bs,
- bool is_backing,
- Transaction *tran, Error **errp)
+static int GRAPH_WRLOCK
+bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ bool is_backing,
+ Transaction *tran, Error **errp)
{
bool update_inherits_from =
bdrv_inherits_from_recursive(child_bs, parent_bs);
@@ -3439,14 +3449,9 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
}
if (child) {
- bdrv_drained_begin(child->bs);
- bdrv_graph_wrlock(NULL);
-
+ assert(child->bs->quiesce_counter);
bdrv_unset_inherits_from(parent_bs, child, tran);
bdrv_remove_child(child, tran);
-
- bdrv_graph_wrunlock();
- bdrv_drained_end(child->bs);
}
if (!child_bs) {
@@ -3471,9 +3476,7 @@ static int bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
}
out:
- bdrv_graph_rdlock_main_loop();
bdrv_refresh_limits(parent_bs, tran, NULL);
- bdrv_graph_rdunlock_main_loop();
return 0;
}
@@ -3482,10 +3485,14 @@ out:
* The caller must hold the AioContext lock for @backing_hd. Both @bs and
* @backing_hd can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
+ *
+ * If a backing child is already present (i.e. we're detaching a node), that
+ * child node must be drained.
*/
-static int bdrv_set_backing_noperm(BlockDriverState *bs,
- BlockDriverState *backing_hd,
- Transaction *tran, Error **errp)
+static int GRAPH_WRLOCK
+bdrv_set_backing_noperm(BlockDriverState *bs,
+ BlockDriverState *backing_hd,
+ Transaction *tran, Error **errp)
{
GLOBAL_STATE_CODE();
return bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
@@ -3500,6 +3507,10 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
GLOBAL_STATE_CODE();
assert(bs->quiesce_counter > 0);
+ if (bs->backing) {
+ assert(bs->backing->bs->quiesce_counter > 0);
+ }
+ bdrv_graph_wrlock(backing_hd);
ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp);
if (ret < 0) {
@@ -3508,6 +3519,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
ret = bdrv_refresh_perms(bs, tran, errp);
out:
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
return ret;
}
@@ -3515,12 +3527,15 @@ out:
int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd,
Error **errp)
{
+ BlockDriverState *drain_bs = bs->backing ? bs->backing->bs : bs;
int ret;
GLOBAL_STATE_CODE();
- bdrv_drained_begin(bs);
+ bdrv_ref(drain_bs);
+ bdrv_drained_begin(drain_bs);
ret = bdrv_set_backing_hd_drained(bs, backing_hd, errp);
- bdrv_drained_end(bs);
+ bdrv_drained_end(drain_bs);
+ bdrv_unref(drain_bs);
return ret;
}
@@ -4597,6 +4612,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
abort:
tran_abort(tran);
+
QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
if (bs_entry->prepared) {
ctx = bdrv_get_aio_context(bs_entry->state.bs);
@@ -4746,6 +4762,11 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
reopen_state->old_file_bs = old_child_bs;
}
+ if (old_child_bs) {
+ bdrv_ref(old_child_bs);
+ bdrv_drained_begin(old_child_bs);
+ }
+
old_ctx = bdrv_get_aio_context(bs);
ctx = bdrv_get_aio_context(new_child_bs);
if (old_ctx != ctx) {
@@ -4753,14 +4774,23 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
aio_context_acquire(ctx);
}
+ bdrv_graph_wrlock(new_child_bs);
+
ret = bdrv_set_file_or_backing_noperm(bs, new_child_bs, is_backing,
tran, errp);
+ bdrv_graph_wrunlock();
+
if (old_ctx != ctx) {
aio_context_release(ctx);
aio_context_acquire(old_ctx);
}
+ if (old_child_bs) {
+ bdrv_drained_end(old_child_bs);
+ bdrv_unref(old_child_bs);
+ }
+
return ret;
}
@@ -5403,13 +5433,28 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
BdrvChild *child;
Transaction *tran = tran_new();
AioContext *old_context, *new_context = NULL;
- bool drained = false;
GLOBAL_STATE_CODE();
assert(!bs_new->backing);
old_context = bdrv_get_aio_context(bs_top);
+ bdrv_drained_begin(bs_top);
+
+ /*
+ * bdrv_drained_begin() requires that only the AioContext of the drained
+ * node is locked, and at this point it can still differ from the AioContext
+ * of bs_top.
+ */
+ new_context = bdrv_get_aio_context(bs_new);
+ aio_context_release(old_context);
+ aio_context_acquire(new_context);
+ bdrv_drained_begin(bs_new);
+ aio_context_release(new_context);
+ aio_context_acquire(old_context);
+ new_context = NULL;
+
+ bdrv_graph_wrlock(bs_top);
child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
&child_of_bds, bdrv_backing_role(bs_new),
@@ -5420,10 +5465,9 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
}
/*
- * bdrv_attach_child_noperm could change the AioContext of bs_top.
- * bdrv_replace_node_noperm calls bdrv_drained_begin, so let's temporarily
- * hold the new AioContext, since bdrv_drained_begin calls BDRV_POLL_WHILE
- * that assumes the new lock is taken.
+ * bdrv_attach_child_noperm could change the AioContext of bs_top and
+ * bs_new, but at least they are in the same AioContext now. This is the
+ * AioContext that we need to lock for the rest of the function.
*/
new_context = bdrv_get_aio_context(bs_top);
@@ -5432,29 +5476,22 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
aio_context_acquire(new_context);
}
- bdrv_drained_begin(bs_new);
- bdrv_drained_begin(bs_top);
- drained = true;
-
- bdrv_graph_wrlock(bs_new);
ret = bdrv_replace_node_noperm(bs_top, bs_new, true, tran, errp);
- bdrv_graph_wrunlock();
if (ret < 0) {
goto out;
}
ret = bdrv_refresh_perms(bs_new, tran, errp);
out:
+ bdrv_graph_wrunlock();
tran_finalize(tran, ret);
bdrv_graph_rdlock_main_loop();
bdrv_refresh_limits(bs_top, NULL, NULL);
bdrv_graph_rdunlock_main_loop();
- if (drained) {
- bdrv_drained_end(bs_top);
- bdrv_drained_end(bs_new);
- }
+ bdrv_drained_end(bs_top);
+ bdrv_drained_end(bs_new);
if (new_context && old_context != new_context) {
aio_context_release(new_context);
diff --git a/block/stream.c b/block/stream.c
index e522bbdec5..e4da214f1f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -54,6 +54,7 @@ static int stream_prepare(Job *job)
{
StreamBlockJob *s = container_of(job, StreamBlockJob, common.job);
BlockDriverState *unfiltered_bs = bdrv_skip_filters(s->target_bs);
+ BlockDriverState *unfiltered_bs_cow = bdrv_cow_bs(unfiltered_bs);
BlockDriverState *base;
BlockDriverState *unfiltered_base;
Error *local_err = NULL;
@@ -64,13 +65,18 @@ static int stream_prepare(Job *job)
s->cor_filter_bs = NULL;
/*
- * bdrv_set_backing_hd() requires that unfiltered_bs is drained. Drain
- * already here and use bdrv_set_backing_hd_drained() instead because
- * the polling during drained_begin() might change the graph, and if we do
- * this only later, we may end up working with the wrong base node (or it
- * might even have gone away by the time we want to use it).
+ * bdrv_set_backing_hd() requires that the unfiltered_bs and the COW child
+ * of unfiltered_bs is drained. Drain already here and use
+ * bdrv_set_backing_hd_drained() instead because the polling during
+ * drained_begin() might change the graph, and if we do this only later, we
+ * may end up working with the wrong base node (or it might even have gone
+ * away by the time we want to use it).
*/
bdrv_drained_begin(unfiltered_bs);
+ if (unfiltered_bs_cow) {
+ bdrv_ref(unfiltered_bs_cow);
+ bdrv_drained_begin(unfiltered_bs_cow);
+ }
base = bdrv_filter_or_cow_bs(s->above_base);
unfiltered_base = bdrv_skip_filters(base);
@@ -100,6 +106,10 @@ static int stream_prepare(Job *job)
}
out:
+ if (unfiltered_bs_cow) {
+ bdrv_drained_end(unfiltered_bs_cow);
+ bdrv_unref(unfiltered_bs_cow);
+ }
bdrv_drained_end(unfiltered_bs);
return ret;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 11/21] block: Call transaction callbacks with lock held
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (9 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 10/21] block: Mark bdrv_attach_child_common() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
` (10 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
In previous patches, we changed some transactionable functions to be
marked as GRAPH_WRLOCK, but required that tran_finalize() is still
called without the lock. This was because all callbacks that can be in
the same transaction need to follow the same convention.
Now that we don't have conflicting requirements any more, we can switch
all of the transaction callbacks to be declared GRAPH_WRLOCK, too, and
call tran_finalize() with the lock held.
Document for each of these transactionable functions that the lock needs
to be held when completing the transaction, and make sure that all
callers down to the place where the transaction is finalised actually
have the writer lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 61 +++++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 44 insertions(+), 17 deletions(-)
diff --git a/block.c b/block.c
index f6e7cf4fb9..f06de58a3b 100644
--- a/block.c
+++ b/block.c
@@ -2375,21 +2375,21 @@ typedef struct BdrvReplaceChildState {
BlockDriverState *old_bs;
} BdrvReplaceChildState;
-static void bdrv_replace_child_commit(void *opaque)
+static void GRAPH_WRLOCK bdrv_replace_child_commit(void *opaque)
{
BdrvReplaceChildState *s = opaque;
GLOBAL_STATE_CODE();
- bdrv_unref(s->old_bs);
+ bdrv_schedule_unref(s->old_bs);
}
-static void bdrv_replace_child_abort(void *opaque)
+static void GRAPH_WRLOCK bdrv_replace_child_abort(void *opaque)
{
BdrvReplaceChildState *s = opaque;
BlockDriverState *new_bs = s->child->bs;
GLOBAL_STATE_CODE();
- bdrv_graph_wrlock(s->old_bs);
+ assert_bdrv_graph_writable();
/* old_bs reference is transparently moved from @s to @s->child */
if (!s->child->bs) {
@@ -2408,7 +2408,6 @@ static void bdrv_replace_child_abort(void *opaque)
assert(s->child->quiesced_parent);
bdrv_replace_child_noperm(s->child, s->old_bs);
- bdrv_graph_wrunlock();
bdrv_unref(new_bs);
}
@@ -2426,6 +2425,9 @@ static TransactionActionDrv bdrv_replace_child_drv = {
* Both @child->bs and @new_bs (if non-NULL) must be drained. @new_bs must be
* kept drained until the transaction is completed.
*
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
* The function doesn't update permissions, caller is responsible for this.
*/
static void GRAPH_WRLOCK
@@ -2951,16 +2953,15 @@ typedef struct BdrvAttachChildCommonState {
AioContext *old_child_ctx;
} BdrvAttachChildCommonState;
-static void bdrv_attach_child_common_abort(void *opaque)
+static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
{
BdrvAttachChildCommonState *s = opaque;
BlockDriverState *bs = s->child->bs;
GLOBAL_STATE_CODE();
+ assert_bdrv_graph_writable();
- bdrv_graph_wrlock(NULL);
bdrv_replace_child_noperm(s->child, NULL);
- bdrv_graph_wrunlock();
if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
@@ -2984,7 +2985,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
tran_commit(tran);
}
- bdrv_unref(bs);
+ bdrv_schedule_unref(bs);
bdrv_child_free(s->child);
}
@@ -2998,6 +2999,9 @@ static TransactionActionDrv bdrv_attach_child_common_drv = {
*
* Function doesn't update permissions, caller is responsible for this.
*
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
* Returns new created child.
*
* The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
@@ -3114,6 +3118,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
* The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
* @child_bs can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
*/
static BdrvChild * GRAPH_WRLOCK
bdrv_attach_child_noperm(BlockDriverState *parent_bs,
@@ -3180,8 +3187,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
ret = bdrv_refresh_perms(child_bs, tran, errp);
out:
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_graph_wrunlock();
bdrv_unref(child_bs);
@@ -3227,8 +3234,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
}
out:
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_graph_wrunlock();
bdrv_unref(child_bs);
@@ -3393,6 +3400,9 @@ static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
* The caller must hold the AioContext lock for @child_bs. Both @parent_bs and
* @child_bs can move to a different AioContext in this function. Callers must
* make sure that their AioContext locking is still correct after this.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
*/
static int GRAPH_WRLOCK
bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
@@ -3488,6 +3498,9 @@ out:
*
* If a backing child is already present (i.e. we're detaching a node), that
* child node must be drained.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
*/
static int GRAPH_WRLOCK
bdrv_set_backing_noperm(BlockDriverState *bs,
@@ -3519,8 +3532,8 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
ret = bdrv_refresh_perms(bs, tran, errp);
out:
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_graph_wrunlock();
return ret;
}
@@ -4594,7 +4607,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
aio_context_release(ctx);
}
+ bdrv_graph_wrlock(NULL);
tran_commit(tran);
+ bdrv_graph_wrunlock();
QTAILQ_FOREACH_REVERSE(bs_entry, bs_queue, entry) {
BlockDriverState *bs = bs_entry->state.bs;
@@ -4611,7 +4626,9 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
goto cleanup;
abort:
+ bdrv_graph_wrlock(NULL);
tran_abort(tran);
+ bdrv_graph_wrunlock();
QTAILQ_FOREACH_SAFE(bs_entry, bs_queue, entry, next) {
if (bs_entry->prepared) {
@@ -4678,6 +4695,9 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
* true and reopen_state->new_backing_bs contains a pointer to the new
* backing BlockDriverState (or NULL).
*
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
+ *
* Return 0 on success, otherwise return < 0 and set @errp.
*
* The caller must hold the AioContext lock of @reopen_state->bs.
@@ -4811,6 +4831,9 @@ static int bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
* commit() for any other BDS that have been left in a prepare() state
*
* The caller must hold the AioContext lock of @reopen_state->bs.
+ *
+ * After calling this function, the transaction @change_child_tran may only be
+ * completed while holding a writer lock for the graph.
*/
static int bdrv_reopen_prepare(BDRVReopenState *reopen_state,
BlockReopenQueue *queue,
@@ -5262,6 +5285,9 @@ static TransactionActionDrv bdrv_remove_child_drv = {
* Function doesn't update permissions, caller is responsible for this.
*
* @child->bs (if non-NULL) must be drained.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
*/
static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
{
@@ -5280,6 +5306,9 @@ static void GRAPH_WRLOCK bdrv_remove_child(BdrvChild *child, Transaction *tran)
/*
* Both @from and @to (if non-NULL) must be drained. @to must be kept drained
* until the transaction is completed.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a writer lock for the graph.
*/
static int GRAPH_WRLOCK
bdrv_replace_node_noperm(BlockDriverState *from,
@@ -5386,8 +5415,8 @@ static int bdrv_replace_node_common(BlockDriverState *from,
ret = 0;
out:
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_graph_wrunlock();
bdrv_drained_end(to);
bdrv_drained_end(from);
@@ -5483,12 +5512,10 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
ret = bdrv_refresh_perms(bs_new, tran, errp);
out:
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
- bdrv_graph_rdlock_main_loop();
bdrv_refresh_limits(bs_top, NULL, NULL);
- bdrv_graph_rdunlock_main_loop();
+ bdrv_graph_wrunlock();
bdrv_drained_end(bs_top);
bdrv_drained_end(bs_new);
@@ -5523,10 +5550,10 @@ int bdrv_replace_child_bs(BdrvChild *child, BlockDriverState *new_bs,
refresh_list = g_slist_prepend(refresh_list, new_bs);
ret = bdrv_list_refresh_perms(refresh_list, NULL, tran, errp);
- bdrv_graph_wrunlock();
tran_finalize(tran, ret);
+ bdrv_graph_wrunlock();
bdrv_drained_end(old_bs);
bdrv_drained_end(new_bs);
bdrv_unref(old_bs);
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (10 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 11/21] block: Call transaction callbacks with lock held Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
` (9 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_attach_child_common(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block-global-state.h | 14 ++++++++------
block.c | 7 +++----
block/quorum.c | 2 ++
block/replication.c | 6 ++++++
tests/unit/test-bdrv-drain.c | 14 ++++++++++++++
tests/unit/test-bdrv-graph-mod.c | 10 ++++++++++
6 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index e570799f85..eb12a35439 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -226,12 +226,14 @@ void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
-BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
- BlockDriverState *child_bs,
- const char *child_name,
- const BdrvChildClass *child_class,
- BdrvChildRole child_role,
- Error **errp);
+
+BdrvChild * GRAPH_WRLOCK
+bdrv_attach_child(BlockDriverState *parent_bs,
+ BlockDriverState *child_bs,
+ const char *child_name,
+ const BdrvChildClass *child_class,
+ BdrvChildRole child_role,
+ Error **errp);
bool bdrv_op_is_blocked(BlockDriverState *bs, BlockOpType op, Error **errp);
void bdrv_op_block(BlockDriverState *bs, BlockOpType op, Error *reason);
diff --git a/block.c b/block.c
index f06de58a3b..369023872a 100644
--- a/block.c
+++ b/block.c
@@ -3219,8 +3219,6 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
GLOBAL_STATE_CODE();
- bdrv_graph_wrlock(child_bs);
-
child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
child_class, child_role, tran, errp);
if (!child) {
@@ -3235,9 +3233,8 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
out:
tran_finalize(tran, ret);
- bdrv_graph_wrunlock();
- bdrv_unref(child_bs);
+ bdrv_schedule_unref(child_bs);
return ret < 0 ? NULL : child;
}
@@ -3758,11 +3755,13 @@ BdrvChild *bdrv_open_child(const char *filename,
return NULL;
}
+ bdrv_graph_wrlock(NULL);
ctx = bdrv_get_aio_context(bs);
aio_context_acquire(ctx);
child = bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
errp);
aio_context_release(ctx);
+ bdrv_graph_wrunlock();
return child;
}
diff --git a/block/quorum.c b/block/quorum.c
index f28758cf2b..def0539fda 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1094,8 +1094,10 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
/* We can safely add the child now */
bdrv_ref(child_bs);
+ bdrv_graph_wrlock(child_bs);
child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
BDRV_CHILD_DATA, errp);
+ bdrv_graph_wrunlock();
if (child == NULL) {
s->next_child_index--;
goto out;
diff --git a/block/replication.c b/block/replication.c
index ea4bf1aa80..eec9819625 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -542,12 +542,15 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
return;
}
+ bdrv_graph_wrlock(bs);
+
bdrv_ref(hidden_disk->bs);
s->hidden_disk = bdrv_attach_child(bs, hidden_disk->bs, "hidden disk",
&child_of_bds, BDRV_CHILD_DATA,
&local_err);
if (local_err) {
error_propagate(errp, local_err);
+ bdrv_graph_wrunlock();
aio_context_release(aio_context);
return;
}
@@ -558,10 +561,13 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
BDRV_CHILD_DATA, &local_err);
if (local_err) {
error_propagate(errp, local_err);
+ bdrv_graph_wrunlock();
aio_context_release(aio_context);
return;
}
+ bdrv_graph_wrunlock();
+
/* start backup job now */
error_setg(&s->blocker,
"Block device is in use by internal backup job");
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 89c8fa6780..6d33249656 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1055,8 +1055,10 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
+ bdrv_graph_wrunlock();
/* This child will be the one to pass to requests through to, and
* it will stall until a drain occurs */
@@ -1064,17 +1066,21 @@ static void do_test_delete_by_drain(bool detach_instead_of_delete,
&error_abort);
child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
/* Takes our reference to child_bs */
+ bdrv_graph_wrlock(NULL);
tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
&child_of_bds,
BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
&error_abort);
+ bdrv_graph_wrunlock();
/* This child is just there to be deleted
* (for detach_instead_of_delete == true) */
null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | BDRV_O_PROTOCOL,
&error_abort);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
+ bdrv_graph_wrunlock();
blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
blk_insert_bs(blk, bs, &error_abort);
@@ -1159,9 +1165,11 @@ static void detach_indirect_bh(void *opaque)
bdrv_unref_child(data->parent_b, data->child_b);
bdrv_ref(data->c);
+ bdrv_graph_wrlock(NULL);
data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
&child_of_bds, BDRV_CHILD_DATA,
&error_abort);
+ bdrv_graph_wrunlock();
}
static void detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1258,6 +1266,7 @@ static void test_detach_indirect(bool by_parent_cb)
/* Set child relationships */
bdrv_ref(b);
bdrv_ref(a);
+ bdrv_graph_wrlock(NULL);
child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_of_bds,
@@ -1267,6 +1276,7 @@ static void test_detach_indirect(bool by_parent_cb)
bdrv_attach_child(parent_a, a, "PA-A",
by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
BDRV_CHILD_DATA, &error_abort);
+ bdrv_graph_wrunlock();
g_assert_cmpint(parent_a->refcnt, ==, 1);
g_assert_cmpint(parent_b->refcnt, ==, 1);
@@ -1685,6 +1695,7 @@ static void test_drop_intermediate_poll(void)
* Establish the chain last, so the chain links are the first
* elements in the BDS.parents lists
*/
+ bdrv_graph_wrlock(NULL);
for (i = 0; i < 3; i++) {
if (i) {
/* Takes the reference to chain[i - 1] */
@@ -1692,6 +1703,7 @@ static void test_drop_intermediate_poll(void)
&chain_child_class, BDRV_CHILD_COW, &error_abort);
}
}
+ bdrv_graph_wrunlock();
job = block_job_create("job", &test_simple_job_driver, NULL, job_node,
0, BLK_PERM_ALL, 0, 0, NULL, NULL, &error_abort);
@@ -1936,8 +1948,10 @@ static void do_test_replace_child_mid_drain(int old_drain_count,
new_child_bs->total_sectors = 1;
bdrv_ref(old_child_bs);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(parent_bs, old_child_bs, "child", &child_of_bds,
BDRV_CHILD_COW, &error_abort);
+ bdrv_graph_wrunlock();
parent_s->setup_completed = true;
for (i = 0; i < old_drain_count; i++) {
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index d8503165b3..621e616655 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -137,8 +137,10 @@ static void test_update_perm_tree(void)
blk_insert_bs(root, bs, &error_abort);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(filter, bs, "child", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
+ bdrv_graph_wrunlock();
aio_context_acquire(qemu_get_aio_context());
ret = bdrv_append(filter, bs, NULL);
@@ -205,8 +207,10 @@ static void test_should_update_child(void)
bdrv_set_backing_hd(target, bs, &error_abort);
g_assert(target->backing->bs == bs);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(filter, target, "target", &child_of_bds,
BDRV_CHILD_DATA, &error_abort);
+ bdrv_graph_wrunlock();
aio_context_acquire(qemu_get_aio_context());
bdrv_append(filter, bs, &error_abort);
aio_context_release(qemu_get_aio_context());
@@ -236,6 +240,7 @@ static void test_parallel_exclusive_write(void)
*/
bdrv_ref(base);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(top, fl1, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
@@ -245,6 +250,7 @@ static void test_parallel_exclusive_write(void)
bdrv_attach_child(fl2, base, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
+ bdrv_graph_wrunlock();
bdrv_replace_node(fl1, fl2, &error_abort);
@@ -349,6 +355,7 @@ static void test_parallel_perm_update(void)
*/
bdrv_ref(base);
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(top, ws, "file", &child_of_bds, BDRV_CHILD_DATA,
&error_abort);
c_fl1 = bdrv_attach_child(ws, fl1, "first", &child_of_bds,
@@ -361,6 +368,7 @@ static void test_parallel_perm_update(void)
bdrv_attach_child(fl2, base, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
+ bdrv_graph_wrunlock();
/* Select fl1 as first child to be active */
s->selected = c_fl1;
@@ -410,9 +418,11 @@ static void test_append_greedy_filter(void)
BlockDriverState *base = no_perm_node("base");
BlockDriverState *fl = exclusive_writer_node("fl1");
+ bdrv_graph_wrlock(NULL);
bdrv_attach_child(top, base, "backing", &child_of_bds,
BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
&error_abort);
+ bdrv_graph_wrunlock();
aio_context_acquire(qemu_get_aio_context());
bdrv_append(fl, base, &error_abort);
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (11 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 12/21] block: Mark bdrv_attach_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
` (8 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block_int-common.h | 6 ++---
include/block/block_int-global-state.h | 8 +++---
include/sysemu/block-backend-global-state.h | 4 +--
block.c | 28 +++++++++++++--------
block/block-backend.c | 26 ++++++++++++++-----
block/crypto.c | 6 +++--
block/mirror.c | 8 ++++++
block/vmdk.c | 2 ++
tests/unit/test-bdrv-graph-mod.c | 4 +++
9 files changed, 66 insertions(+), 26 deletions(-)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 85be256c09..fda9d8b5c8 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -311,7 +311,7 @@ struct BlockDriver {
*/
void (*bdrv_cancel_in_flight)(BlockDriverState *bs);
- int (*bdrv_inactivate)(BlockDriverState *bs);
+ int GRAPH_RDLOCK_PTR (*bdrv_inactivate)(BlockDriverState *bs);
int (*bdrv_snapshot_create)(BlockDriverState *bs,
QEMUSnapshotInfo *sn_info);
@@ -944,8 +944,8 @@ struct BdrvChildClass {
* when migration is completing) and it can start/stop requesting
* permissions and doing I/O on it.
*/
- void (*activate)(BdrvChild *child, Error **errp);
- int (*inactivate)(BdrvChild *child);
+ void GRAPH_RDLOCK_PTR (*activate)(BdrvChild *child, Error **errp);
+ int GRAPH_RDLOCK_PTR (*inactivate)(BdrvChild *child);
void GRAPH_WRLOCK_PTR (*attach)(BdrvChild *child);
void GRAPH_WRLOCK_PTR (*detach)(BdrvChild *child);
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index da5fb31089..bebcc08bce 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -212,8 +212,9 @@ void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
* bdrv_child_refresh_perms() instead and make the parent's
* .bdrv_child_perm() implementation return the correct values.
*/
-int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
- Error **errp);
+int GRAPH_RDLOCK
+bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
+ Error **errp);
/**
* Calls bs->drv->bdrv_child_perm() and updates the child's permission
@@ -223,7 +224,8 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, uint64_t shared,
* values than before, but which will not result in the block layer
* automatically refreshing the permissions.
*/
-int bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
+int GRAPH_RDLOCK
+bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
bool GRAPH_RDLOCK bdrv_recurse_can_replace(BlockDriverState *bs,
BlockDriverState *to_replace);
diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h
index 184e667ebd..d5f675493a 100644
--- a/include/sysemu/block-backend-global-state.h
+++ b/include/sysemu/block-backend-global-state.h
@@ -61,8 +61,8 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp);
int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp);
bool bdrv_has_blk(BlockDriverState *bs);
bool bdrv_is_root_node(BlockDriverState *bs);
-int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
- Error **errp);
+int GRAPH_UNLOCKED blk_set_perm(BlockBackend *blk, uint64_t perm,
+ uint64_t shared_perm, Error **errp);
void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm);
void blk_iostatus_enable(BlockBackend *blk);
diff --git a/block.c b/block.c
index 369023872a..6720bc4f8a 100644
--- a/block.c
+++ b/block.c
@@ -2202,7 +2202,8 @@ static bool bdrv_a_allow_b(BdrvChild *a, BdrvChild *b, Error **errp)
return false;
}
-static bool bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
+static bool GRAPH_RDLOCK
+bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
{
BdrvChild *a, *b;
GLOBAL_STATE_CODE();
@@ -2255,8 +2256,8 @@ static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
* simplest way to satisfy this criteria: use only result of
* bdrv_topological_dfs() or NULL as @list parameter.
*/
-static GSList *bdrv_topological_dfs(GSList *list, GHashTable *found,
- BlockDriverState *bs)
+static GSList * GRAPH_RDLOCK
+bdrv_topological_dfs(GSList *list, GHashTable *found, BlockDriverState *bs)
{
BdrvChild *child;
g_autoptr(GHashTable) local_found = NULL;
@@ -2532,8 +2533,9 @@ static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
* @list is a product of bdrv_topological_dfs() (may be called several times) -
* a topologically sorted subgraph.
*/
-static int bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q,
- Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
+ Error **errp)
{
int ret;
BlockDriverState *bs;
@@ -2560,8 +2562,9 @@ static int bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q,
* topologically sorted. It's not a problem if some node occurs in the @list
* several times.
*/
-static int bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q,
- Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
+ Error **errp)
{
g_autoptr(GHashTable) found = g_hash_table_new(NULL, NULL);
g_autoptr(GSList) refresh_list = NULL;
@@ -2621,8 +2624,8 @@ char *bdrv_perm_names(uint64_t perm)
/* @tran is allowed to be NULL. In this case no rollback is possible */
-static int bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran,
- Error **errp)
+static int GRAPH_RDLOCK
+bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran, Error **errp)
{
int ret;
Transaction *local_tran = NULL;
@@ -3247,7 +3250,6 @@ void bdrv_root_unref_child(BdrvChild *child)
GLOBAL_STATE_CODE();
bdrv_graph_wrlock(NULL);
bdrv_replace_child_noperm(child, NULL);
- bdrv_graph_wrunlock();
bdrv_child_free(child);
if (child_bs) {
@@ -3266,6 +3268,7 @@ void bdrv_root_unref_child(BdrvChild *child)
NULL);
}
+ bdrv_graph_wrunlock();
bdrv_unref(child_bs);
}
@@ -4585,7 +4588,10 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, Error **errp)
* reconfiguring the fd and that's why it does it in raw_check_perm(), not
* in raw_reopen_prepare() which is called with "old" permissions.
*/
+ bdrv_graph_rdlock_main_loop();
ret = bdrv_list_refresh_perms(refresh_list, bs_queue, tran, errp);
+ bdrv_graph_rdunlock_main_loop();
+
if (ret < 0) {
goto abort;
}
@@ -6833,6 +6839,7 @@ int bdrv_activate(BlockDriverState *bs, Error **errp)
BdrvDirtyBitmap *bm;
GLOBAL_STATE_CODE();
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
if (!bs->drv) {
return -ENOMEDIUM;
@@ -6963,6 +6970,7 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
uint64_t cumulative_perms, cumulative_shared_perms;
GLOBAL_STATE_CODE();
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
if (!bs->drv) {
return -ENOMEDIUM;
diff --git a/block/block-backend.c b/block/block-backend.c
index 4009ed5fed..8d0282a5d9 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -121,6 +121,10 @@ static QTAILQ_HEAD(, BlockBackend) block_backends =
static QTAILQ_HEAD(, BlockBackend) monitor_block_backends =
QTAILQ_HEAD_INITIALIZER(monitor_block_backends);
+static int coroutine_mixed_fn GRAPH_RDLOCK
+blk_set_perm_locked(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+ Error **errp);
+
static void blk_root_inherit_options(BdrvChildRole role, bool parent_is_format,
int *child_flags, QDict *child_options,
int parent_flags, QDict *parent_options)
@@ -186,7 +190,7 @@ static void blk_vm_state_changed(void *opaque, bool running, RunState state)
*
* If an error is returned, the VM cannot be allowed to be resumed.
*/
-static void blk_root_activate(BdrvChild *child, Error **errp)
+static void GRAPH_RDLOCK blk_root_activate(BdrvChild *child, Error **errp)
{
BlockBackend *blk = child->opaque;
Error *local_err = NULL;
@@ -207,7 +211,7 @@ static void blk_root_activate(BdrvChild *child, Error **errp)
*/
saved_shared_perm = blk->shared_perm;
- blk_set_perm(blk, blk->perm, BLK_PERM_ALL, &local_err);
+ blk_set_perm_locked(blk, blk->perm, BLK_PERM_ALL, &local_err);
if (local_err) {
error_propagate(errp, local_err);
blk->disable_perm = true;
@@ -226,7 +230,7 @@ static void blk_root_activate(BdrvChild *child, Error **errp)
return;
}
- blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
+ blk_set_perm_locked(blk, blk->perm, blk->shared_perm, &local_err);
if (local_err) {
error_propagate(errp, local_err);
blk->disable_perm = true;
@@ -259,7 +263,7 @@ static bool blk_can_inactivate(BlockBackend *blk)
return blk->force_allow_inactivate;
}
-static int blk_root_inactivate(BdrvChild *child)
+static int GRAPH_RDLOCK blk_root_inactivate(BdrvChild *child)
{
BlockBackend *blk = child->opaque;
@@ -953,8 +957,9 @@ int blk_replace_bs(BlockBackend *blk, BlockDriverState *new_bs, Error **errp)
/*
* Sets the permission bitmasks that the user of the BlockBackend needs.
*/
-int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
- Error **errp)
+static int coroutine_mixed_fn GRAPH_RDLOCK
+blk_set_perm_locked(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+ Error **errp)
{
int ret;
GLOBAL_STATE_CODE();
@@ -972,6 +977,15 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
return 0;
}
+int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
+ Error **errp)
+{
+ GLOBAL_STATE_CODE();
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
+ return blk_set_perm_locked(blk, perm, shared_perm, errp);
+}
+
void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
{
GLOBAL_STATE_CODE();
diff --git a/block/crypto.c b/block/crypto.c
index 6ee8d46d30..c9c9a39fa3 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -777,7 +777,7 @@ block_crypto_get_specific_info_luks(BlockDriverState *bs, Error **errp)
return spec_info;
}
-static int
+static int GRAPH_RDLOCK
block_crypto_amend_prepare(BlockDriverState *bs, Error **errp)
{
BlockCrypto *crypto = bs->opaque;
@@ -793,7 +793,7 @@ block_crypto_amend_prepare(BlockDriverState *bs, Error **errp)
return ret;
}
-static void
+static void GRAPH_RDLOCK
block_crypto_amend_cleanup(BlockDriverState *bs)
{
BlockCrypto *crypto = bs->opaque;
@@ -841,6 +841,8 @@ block_crypto_amend_options_luks(BlockDriverState *bs,
QCryptoBlockAmendOptions *amend_options = NULL;
int ret = -EINVAL;
+ assume_graph_lock(); /* FIXME */
+
assert(crypto);
assert(crypto->block);
diff --git a/block/mirror.c b/block/mirror.c
index aae4bebbb6..3cc0757a03 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -702,8 +702,12 @@ static int mirror_exit_common(Job *job)
* mirror_top_bs from now on, so keep it drained. */
bdrv_drained_begin(mirror_top_bs);
bs_opaque->stop = true;
+
+ bdrv_graph_rdlock_main_loop();
bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
&error_abort);
+ bdrv_graph_rdunlock_main_loop();
+
if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
BlockDriverState *backing = s->is_none_mode ? src : s->base;
BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
@@ -1670,6 +1674,8 @@ static BlockJob *mirror_start_job(
uint64_t target_perms, target_shared_perms;
int ret;
+ GLOBAL_STATE_CODE();
+
if (granularity == 0) {
granularity = bdrv_get_default_bitmap_granularity(target);
}
@@ -1906,8 +1912,10 @@ fail:
}
bs_opaque->stop = true;
+ bdrv_graph_rdlock_main_loop();
bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
&error_abort);
+ bdrv_graph_rdunlock_main_loop();
bdrv_replace_node(mirror_top_bs, mirror_top_bs->backing->bs, &error_abort);
bdrv_unref(mirror_top_bs);
diff --git a/block/vmdk.c b/block/vmdk.c
index 58ce290e9c..84185b73a2 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1309,6 +1309,8 @@ static int vmdk_open(BlockDriverState *bs, QDict *options, int flags,
BDRVVmdkState *s = bs->opaque;
uint32_t magic;
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
if (ret < 0) {
return ret;
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index 621e616655..8609f7f42b 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -372,6 +372,9 @@ static void test_parallel_perm_update(void)
/* Select fl1 as first child to be active */
s->selected = c_fl1;
+
+ bdrv_graph_rdlock_main_loop();
+
bdrv_child_refresh_perms(top, top->children.lh_first, &error_abort);
assert(c_fl1->perm & BLK_PERM_WRITE);
@@ -391,6 +394,7 @@ static void test_parallel_perm_update(void)
assert(c_fl1->perm & BLK_PERM_WRITE);
assert(!(c_fl2->perm & BLK_PERM_WRITE));
+ bdrv_graph_rdunlock_main_loop();
bdrv_unref(top);
}
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 14/21] block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (12 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 13/21] block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
` (7 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The function reads the parents list, so it needs to hold the graph lock.
This happens to result in BlockDriver.bdrv_set_perm() to be called with
the graph lock held. For consistency, make it the same for all of the
BlockDriver callbacks for updating permissions and annotate the function
pointers with GRAPH_RDLOCK_PTR.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block_int-common.h | 9 ++++---
include/block/block_int-global-state.h | 4 +--
block.c | 35 ++++++++++++++++++++------
blockdev.c | 6 +++++
4 files changed, 40 insertions(+), 14 deletions(-)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index fda9d8b5c8..f82c14fb9c 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -413,8 +413,8 @@ struct BlockDriver {
* If both conditions are met, 0 is returned. Otherwise, -errno is returned
* and errp is set to an error describing the conflict.
*/
- int (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
- uint64_t shared, Error **errp);
+ int GRAPH_RDLOCK_PTR (*bdrv_check_perm)(BlockDriverState *bs, uint64_t perm,
+ uint64_t shared, Error **errp);
/**
* Called to inform the driver that the set of cumulative set of used
@@ -426,7 +426,8 @@ struct BlockDriver {
* This function is only invoked after bdrv_check_perm(), so block drivers
* may rely on preparations made in their .bdrv_check_perm implementation.
*/
- void (*bdrv_set_perm)(BlockDriverState *bs, uint64_t perm, uint64_t shared);
+ void GRAPH_RDLOCK_PTR (*bdrv_set_perm)(
+ BlockDriverState *bs, uint64_t perm, uint64_t shared);
/*
* Called to inform the driver that after a previous bdrv_check_perm()
@@ -436,7 +437,7 @@ struct BlockDriver {
* This function can be called even for nodes that never saw a
* bdrv_check_perm() call. It is a no-op then.
*/
- void (*bdrv_abort_perm_update)(BlockDriverState *bs);
+ void GRAPH_RDLOCK_PTR (*bdrv_abort_perm_update)(BlockDriverState *bs);
/**
* Returns in @nperm and @nshared the permissions that the driver for @bs
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index bebcc08bce..e2304db58b 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -204,8 +204,8 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
void *opaque, Error **errp);
void bdrv_root_unref_child(BdrvChild *child);
-void bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
- uint64_t *shared_perm);
+void GRAPH_RDLOCK bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
+ uint64_t *shared_perm);
/**
* Sets a BdrvChild's permissions. Avoid if the parent is a BDS; use
diff --git a/block.c b/block.c
index 6720bc4f8a..186efda70f 100644
--- a/block.c
+++ b/block.c
@@ -2320,7 +2320,7 @@ static void bdrv_child_set_perm(BdrvChild *c, uint64_t perm,
tran_add(tran, &bdrv_child_set_pem_drv, s);
}
-static void bdrv_drv_set_perm_commit(void *opaque)
+static void GRAPH_RDLOCK bdrv_drv_set_perm_commit(void *opaque)
{
BlockDriverState *bs = opaque;
uint64_t cumulative_perms, cumulative_shared_perms;
@@ -2333,7 +2333,7 @@ static void bdrv_drv_set_perm_commit(void *opaque)
}
}
-static void bdrv_drv_set_perm_abort(void *opaque)
+static void GRAPH_RDLOCK bdrv_drv_set_perm_abort(void *opaque)
{
BlockDriverState *bs = opaque;
GLOBAL_STATE_CODE();
@@ -2348,9 +2348,13 @@ TransactionActionDrv bdrv_drv_set_perm_drv = {
.commit = bdrv_drv_set_perm_commit,
};
-static int bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm,
- uint64_t shared_perm, Transaction *tran,
- Error **errp)
+/*
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
+ */
+static int GRAPH_RDLOCK
+bdrv_drv_set_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared_perm,
+ Transaction *tran, Error **errp)
{
GLOBAL_STATE_CODE();
if (!bs->drv) {
@@ -2457,9 +2461,13 @@ bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
/*
* Refresh permissions in @bs subtree. The function is intended to be called
* after some graph modification that was done without permission update.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
*/
-static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
- Transaction *tran, Error **errp)
+static int GRAPH_RDLOCK
+bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
+ Transaction *tran, Error **errp)
{
BlockDriver *drv = bs->drv;
BdrvChild *c;
@@ -2532,6 +2540,9 @@ static int bdrv_node_refresh_perm(BlockDriverState *bs, BlockReopenQueue *q,
/*
* @list is a product of bdrv_topological_dfs() (may be called several times) -
* a topologically sorted subgraph.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
*/
static int GRAPH_RDLOCK
bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
@@ -2561,6 +2572,9 @@ bdrv_do_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
* @list is any list of nodes. List is completed by all subtrees and
* topologically sorted. It's not a problem if some node occurs in the @list
* several times.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
*/
static int GRAPH_RDLOCK
bdrv_list_refresh_perms(GSList *list, BlockReopenQueue *q, Transaction *tran,
@@ -2623,7 +2637,12 @@ char *bdrv_perm_names(uint64_t perm)
}
-/* @tran is allowed to be NULL. In this case no rollback is possible */
+/*
+ * @tran is allowed to be NULL. In this case no rollback is possible.
+ *
+ * After calling this function, the transaction @tran may only be completed
+ * while holding a reader lock for the graph.
+ */
static int GRAPH_RDLOCK
bdrv_refresh_perms(BlockDriverState *bs, Transaction *tran, Error **errp)
{
diff --git a/blockdev.c b/blockdev.c
index e6eba61484..372eaf198c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1378,6 +1378,9 @@ static void external_snapshot_action(TransactionAction *action,
AioContext *aio_context;
uint64_t perm, shared;
+ /* TODO We'll eventually have to take a writer lock in this function */
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
tran_add(tran, &external_snapshot_drv, state);
/* 'blockdev-snapshot' and 'blockdev-snapshot-sync' have similar
@@ -2521,6 +2524,9 @@ void qmp_block_commit(const char *job_id, const char *device,
int job_flags = JOB_DEFAULT;
uint64_t top_perm, top_shared;
+ /* TODO We'll eventually have to take a writer lock in this function */
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
if (!has_speed) {
speed = 0;
}
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (13 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 14/21] block: Mark bdrv_get_cumulative_perm() " Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
` (6 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
This adds GRAPH_RDLOCK annotations to declare that callers of
bdrv_child_perm() need to hold a reader lock for the graph because
some implementations access the children list of a node.
The callers of bdrv_child_perm() conveniently already hold the lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block_int-common.h | 10 +++++-----
block.c | 11 ++++++-----
block/copy-before-write.c | 10 +++++-----
3 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index f82c14fb9c..3feb67ec4a 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -451,11 +451,11 @@ struct BlockDriver {
* permissions, but those that will be needed after applying the
* @reopen_queue.
*/
- void (*bdrv_child_perm)(BlockDriverState *bs, BdrvChild *c,
- BdrvChildRole role,
- BlockReopenQueue *reopen_queue,
- uint64_t parent_perm, uint64_t parent_shared,
- uint64_t *nperm, uint64_t *nshared);
+ void GRAPH_RDLOCK_PTR (*bdrv_child_perm)(
+ BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+ BlockReopenQueue *reopen_queue,
+ uint64_t parent_perm, uint64_t parent_shared,
+ uint64_t *nperm, uint64_t *nshared);
/**
* Register/unregister a buffer for I/O. For example, when the driver is
diff --git a/block.c b/block.c
index 186efda70f..0f7f78f8de 100644
--- a/block.c
+++ b/block.c
@@ -2228,11 +2228,12 @@ bdrv_parent_perms_conflict(BlockDriverState *bs, Error **errp)
return false;
}
-static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
- BdrvChild *c, BdrvChildRole role,
- BlockReopenQueue *reopen_queue,
- uint64_t parent_perm, uint64_t parent_shared,
- uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
+ BdrvChild *c, BdrvChildRole role,
+ BlockReopenQueue *reopen_queue,
+ uint64_t parent_perm, uint64_t parent_shared,
+ uint64_t *nperm, uint64_t *nshared)
{
assert(bs->drv && bs->drv->bdrv_child_perm);
GLOBAL_STATE_CODE();
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 9a0e2b69d9..aeaff3bb82 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -341,11 +341,11 @@ static void cbw_refresh_filename(BlockDriverState *bs)
bs->file->bs->filename);
}
-static void cbw_child_perm(BlockDriverState *bs, BdrvChild *c,
- BdrvChildRole role,
- BlockReopenQueue *reopen_queue,
- uint64_t perm, uint64_t shared,
- uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+cbw_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+ BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
{
if (!(role & BDRV_CHILD_FILTERED)) {
/*
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (14 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 15/21] block: Mark bdrv_child_perm() GRAPH_RDLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
` (5 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
block.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 0f7f78f8de..06d2a1b256 100644
--- a/block.c
+++ b/block.c
@@ -3371,7 +3371,8 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
}
-static void bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
+static void GRAPH_RDLOCK
+bdrv_parent_cb_change_media(BlockDriverState *bs, bool load)
{
BdrvChild *c;
GLOBAL_STATE_CODE();
@@ -3969,6 +3970,9 @@ bdrv_open_inherit(const char *filename, const char *reference, QDict *options,
GLOBAL_STATE_CODE();
assert(!qemu_in_coroutine());
+ /* TODO We'll eventually have to take a writer lock in this function */
+ GRAPH_RDLOCK_GUARD_MAINLOOP();
+
if (reference) {
bool options_non_empty = options ? qdict_size(options) : false;
qobject_unref(options);
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (15 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 16/21] block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-12 16:49 ` Stefan Hajnoczi
2023-09-11 9:46 ` [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
` (4 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
block.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block.c b/block.c
index 06d2a1b256..e024a6ccec 100644
--- a/block.c
+++ b/block.c
@@ -5938,9 +5938,11 @@ int bdrv_drop_intermediate(BlockDriverState *top, BlockDriverState *base,
backing_file_str = base->filename;
}
+ bdrv_graph_rdlock_main_loop();
QLIST_FOREACH(c, &top->parents, next_parent) {
updated_children = g_slist_prepend(updated_children, c);
}
+ bdrv_graph_rdunlock_main_loop();
/*
* It seems correct to pass detach_subchain=true here, but it triggers
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate()
2023-09-11 9:46 ` [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
@ 2023-09-12 16:49 ` Stefan Hajnoczi
0 siblings, 0 replies; 31+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 16:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 370 bytes --]
On Mon, Sep 11, 2023 at 11:46:16AM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> block.c | 2 ++
> 1 file changed, 2 insertions(+)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context()
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (16 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 17/21] block: Take graph rdlock in bdrv_drop_intermediate() Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-12 16:49 ` Stefan Hajnoczi
2023-09-11 9:46 ` [PATCH v2 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
` (3 subsequent siblings)
21 siblings, 1 reply; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The function reads the parents list, so it needs to hold the graph lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
block.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block.c b/block.c
index e024a6ccec..8e589bb2e4 100644
--- a/block.c
+++ b/block.c
@@ -7688,17 +7688,21 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
return true;
}
+ bdrv_graph_rdlock_main_loop();
QLIST_FOREACH(c, &bs->parents, next_parent) {
if (!bdrv_parent_change_aio_context(c, ctx, visited, tran, errp)) {
+ bdrv_graph_rdunlock_main_loop();
return false;
}
}
QLIST_FOREACH(c, &bs->children, next) {
if (!bdrv_child_change_aio_context(c, ctx, visited, tran, errp)) {
+ bdrv_graph_rdunlock_main_loop();
return false;
}
}
+ bdrv_graph_rdunlock_main_loop();
state = g_new(BdrvStateSetAioContext, 1);
*state = (BdrvStateSetAioContext) {
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context()
2023-09-11 9:46 ` [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
@ 2023-09-12 16:49 ` Stefan Hajnoczi
0 siblings, 0 replies; 31+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 16:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 372 bytes --]
On Mon, Sep 11, 2023 at 11:46:17AM +0200, Kevin Wolf wrote:
> The function reads the parents list, so it needs to hold the graph lock.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
> block.c | 4 ++++
> 1 file changed, 4 insertions(+)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v2 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (17 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 18/21] block: Take graph rdlock in bdrv_change_aio_context() Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
` (2 subsequent siblings)
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_root_unref_child(). These callers will
typically already hold the graph lock once the locking work is
completed, which means that they can't call functions that take it
internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block_int-global-state.h | 2 +-
block.c | 6 +++---
block/block-backend.c | 3 +++
blockjob.c | 2 ++
4 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index e2304db58b..074b677838 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -202,7 +202,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
BdrvChildRole child_role,
uint64_t perm, uint64_t shared_perm,
void *opaque, Error **errp);
-void bdrv_root_unref_child(BdrvChild *child);
+void GRAPH_WRLOCK bdrv_root_unref_child(BdrvChild *child);
void GRAPH_RDLOCK bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
uint64_t *shared_perm);
diff --git a/block.c b/block.c
index 8e589bb2e4..9ea8333a28 100644
--- a/block.c
+++ b/block.c
@@ -3268,7 +3268,6 @@ void bdrv_root_unref_child(BdrvChild *child)
BlockDriverState *child_bs = child->bs;
GLOBAL_STATE_CODE();
- bdrv_graph_wrlock(NULL);
bdrv_replace_child_noperm(child, NULL);
bdrv_child_free(child);
@@ -3288,8 +3287,7 @@ void bdrv_root_unref_child(BdrvChild *child)
NULL);
}
- bdrv_graph_wrunlock();
- bdrv_unref(child_bs);
+ bdrv_schedule_unref(child_bs);
}
typedef struct BdrvSetInheritsFrom {
@@ -3366,8 +3364,10 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
return;
}
+ bdrv_graph_wrlock(NULL);
bdrv_unset_inherits_from(parent, child, NULL);
bdrv_root_unref_child(child);
+ bdrv_graph_wrunlock();
}
diff --git a/block/block-backend.c b/block/block-backend.c
index 8d0282a5d9..c2636f4351 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -915,7 +915,10 @@ void blk_remove_bs(BlockBackend *blk)
blk_drain(blk);
root = blk->root;
blk->root = NULL;
+
+ bdrv_graph_wrlock(NULL);
bdrv_root_unref_child(root);
+ bdrv_graph_wrunlock();
}
/*
diff --git a/blockjob.c b/blockjob.c
index 25fe8e625d..58c5d64539 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,6 +198,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
* one to make sure that such a concurrent access does not attempt
* to process an already freed BdrvChild.
*/
+ bdrv_graph_wrlock(NULL);
while (job->nodes) {
GSList *l = job->nodes;
BdrvChild *c = l->data;
@@ -209,6 +210,7 @@ void block_job_remove_all_bdrv(BlockJob *job)
g_slist_free_1(l);
}
+ bdrv_graph_wrunlock();
}
bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs)
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (18 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 19/21] block: Mark bdrv_root_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-11 9:46 ` [PATCH v2 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
2023-09-12 16:49 ` [PATCH v2 00/21] Graph locking part 4 (node management) Stefan Hajnoczi
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
Instead of taking the writer lock internally, require callers to already
hold it when calling bdrv_unref_child(). These callers will typically
already hold the graph lock once the locking work is completed, which
means that they can't call functions that take it internally.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block-global-state.h | 7 ++++++-
block.c | 11 +++++++----
block/blklogwrites.c | 4 ++++
block/blkverify.c | 2 ++
block/qcow2.c | 4 +++-
block/quorum.c | 6 ++++++
block/replication.c | 3 +++
block/snapshot.c | 2 ++
block/vmdk.c | 11 +++++++++++
tests/unit/test-bdrv-drain.c | 8 ++++++--
10 files changed, 50 insertions(+), 8 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index eb12a35439..0f6df8f1a2 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -225,7 +225,12 @@ void bdrv_ref(BlockDriverState *bs);
void no_coroutine_fn bdrv_unref(BlockDriverState *bs);
void coroutine_fn no_co_wrapper bdrv_co_unref(BlockDriverState *bs);
void GRAPH_WRLOCK bdrv_schedule_unref(BlockDriverState *bs);
-void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+
+void GRAPH_WRLOCK
+bdrv_unref_child(BlockDriverState *parent, BdrvChild *child);
+
+void coroutine_fn no_co_wrapper_bdrv_wrlock
+bdrv_co_unref_child(BlockDriverState *parent, BdrvChild *child);
BdrvChild * GRAPH_WRLOCK
bdrv_attach_child(BlockDriverState *parent_bs,
diff --git a/block.c b/block.c
index 9ea8333a28..e7f349b25c 100644
--- a/block.c
+++ b/block.c
@@ -1701,7 +1701,9 @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name,
open_failed:
bs->drv = NULL;
if (bs->file != NULL) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, bs->file);
+ bdrv_graph_wrunlock();
assert(!bs->file);
}
g_free(bs->opaque);
@@ -3331,8 +3333,9 @@ static void bdrv_set_inherits_from(BlockDriverState *bs,
* @root that point to @root, where necessary.
* @tran is allowed to be NULL. In this case no rollback is possible
*/
-static void bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
- Transaction *tran)
+static void GRAPH_WRLOCK
+bdrv_unset_inherits_from(BlockDriverState *root, BdrvChild *child,
+ Transaction *tran)
{
BdrvChild *c;
@@ -3364,10 +3367,8 @@ void bdrv_unref_child(BlockDriverState *parent, BdrvChild *child)
return;
}
- bdrv_graph_wrlock(NULL);
bdrv_unset_inherits_from(parent, child, NULL);
bdrv_root_unref_child(child);
- bdrv_graph_wrunlock();
}
@@ -5164,9 +5165,11 @@ static void bdrv_close(BlockDriverState *bs)
bs->drv = NULL;
}
+ bdrv_graph_wrlock(NULL);
QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
bdrv_unref_child(bs, child);
}
+ bdrv_graph_wrunlock();
assert(!bs->backing);
assert(!bs->file);
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3ea7141cb5..a0d70729bb 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -251,7 +251,9 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict *options, int flags,
ret = 0;
fail_log:
if (ret < 0) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, s->log_file);
+ bdrv_graph_wrunlock();
s->log_file = NULL;
}
fail:
@@ -263,8 +265,10 @@ static void blk_log_writes_close(BlockDriverState *bs)
{
BDRVBlkLogWritesState *s = bs->opaque;
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, s->log_file);
s->log_file = NULL;
+ bdrv_graph_wrunlock();
}
static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/blkverify.c b/block/blkverify.c
index 7326461f30..dae9716a26 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -151,8 +151,10 @@ static void blkverify_close(BlockDriverState *bs)
{
BDRVBlkverifyState *s = bs->opaque;
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, s->test_file);
s->test_file = NULL;
+ bdrv_graph_wrunlock();
}
static int64_t coroutine_fn GRAPH_RDLOCK
diff --git a/block/qcow2.c b/block/qcow2.c
index b48cd9ce63..071004b302 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1880,7 +1880,7 @@ qcow2_do_open(BlockDriverState *bs, QDict *options, int flags,
g_free(s->image_data_file);
if (open_data_file && has_data_file(bs)) {
bdrv_graph_co_rdunlock();
- bdrv_unref_child(bs, s->data_file);
+ bdrv_co_unref_child(bs, s->data_file);
bdrv_graph_co_rdlock();
s->data_file = NULL;
}
@@ -2790,7 +2790,9 @@ static void qcow2_do_close(BlockDriverState *bs, bool close_data_file)
g_free(s->image_backing_format);
if (close_data_file && has_data_file(bs)) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, s->data_file);
+ bdrv_graph_wrunlock();
s->data_file = NULL;
}
diff --git a/block/quorum.c b/block/quorum.c
index def0539fda..620a50ba2c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1037,12 +1037,14 @@ static int quorum_open(BlockDriverState *bs, QDict *options, int flags,
close_exit:
/* cleanup on error */
+ bdrv_graph_wrlock(NULL);
for (i = 0; i < s->num_children; i++) {
if (!opened[i]) {
continue;
}
bdrv_unref_child(bs, s->children[i]);
}
+ bdrv_graph_wrunlock();
g_free(s->children);
g_free(opened);
exit:
@@ -1055,9 +1057,11 @@ static void quorum_close(BlockDriverState *bs)
BDRVQuorumState *s = bs->opaque;
int i;
+ bdrv_graph_wrlock(NULL);
for (i = 0; i < s->num_children; i++) {
bdrv_unref_child(bs, s->children[i]);
}
+ bdrv_graph_wrunlock();
g_free(s->children);
}
@@ -1147,7 +1151,9 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
memmove(&s->children[i], &s->children[i + 1],
(s->num_children - i - 1) * sizeof(BdrvChild *));
s->children = g_renew(BdrvChild *, s->children, --s->num_children);
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, child);
+ bdrv_graph_wrunlock();
quorum_refresh_flags(bs);
bdrv_drained_end(bs);
diff --git a/block/replication.c b/block/replication.c
index eec9819625..dd166d2d82 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -672,10 +672,13 @@ static void replication_done(void *opaque, int ret)
if (ret == 0) {
s->stage = BLOCK_REPLICATION_DONE;
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, s->secondary_disk);
s->secondary_disk = NULL;
bdrv_unref_child(bs, s->hidden_disk);
s->hidden_disk = NULL;
+ bdrv_graph_wrunlock();
+
s->error = 0;
} else {
s->stage = BLOCK_REPLICATION_FAILOVER_FAILED;
diff --git a/block/snapshot.c b/block/snapshot.c
index e22ac3eac6..b86b5b24ad 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -281,7 +281,9 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
}
/* .bdrv_open() will re-attach it */
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, fallback);
+ bdrv_graph_wrunlock();
ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
open_ret = drv->bdrv_open(bs, options, bs->open_flags, &local_err);
diff --git a/block/vmdk.c b/block/vmdk.c
index 84185b73a2..78baa04c0c 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -272,6 +272,7 @@ static void vmdk_free_extents(BlockDriverState *bs)
BDRVVmdkState *s = bs->opaque;
VmdkExtent *e;
+ bdrv_graph_wrlock(NULL);
for (i = 0; i < s->num_extents; i++) {
e = &s->extents[i];
g_free(e->l1_table);
@@ -282,6 +283,8 @@ static void vmdk_free_extents(BlockDriverState *bs)
bdrv_unref_child(bs, e->file);
}
}
+ bdrv_graph_wrunlock();
+
g_free(s->extents);
}
@@ -1220,7 +1223,9 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
ret = vmdk_add_extent(bs, extent_file, true, sectors,
0, 0, 0, 0, 0, &extent, errp);
if (ret < 0) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, extent_file);
+ bdrv_graph_wrunlock();
goto out;
}
extent->flat_start_offset = flat_offset << 9;
@@ -1235,20 +1240,26 @@ static int vmdk_parse_extents(const char *desc, BlockDriverState *bs,
}
g_free(buf);
if (ret) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, extent_file);
+ bdrv_graph_wrunlock();
goto out;
}
extent = &s->extents[s->num_extents - 1];
} else if (!strcmp(type, "SESPARSE")) {
ret = vmdk_open_se_sparse(bs, extent_file, bs->open_flags, errp);
if (ret) {
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, extent_file);
+ bdrv_graph_wrunlock();
goto out;
}
extent = &s->extents[s->num_extents - 1];
} else {
error_setg(errp, "Unsupported extent type '%s'", type);
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(bs, extent_file);
+ bdrv_graph_wrunlock();
ret = -ENOTSUP;
goto out;
}
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 6d33249656..b040a73bb9 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -967,9 +967,12 @@ typedef struct BDRVTestTopState {
static void bdrv_test_top_close(BlockDriverState *bs)
{
BdrvChild *c, *next_c;
+
+ bdrv_graph_wrlock(NULL);
QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
bdrv_unref_child(bs, c);
}
+ bdrv_graph_wrunlock();
}
static int coroutine_fn GRAPH_RDLOCK
@@ -1024,7 +1027,7 @@ static void coroutine_fn test_co_delete_by_drain(void *opaque)
} else {
BdrvChild *c, *next_c;
QLIST_FOREACH_SAFE(c, &bs->children, next, next_c) {
- bdrv_unref_child(bs, c);
+ bdrv_co_unref_child(bs, c);
}
}
@@ -1162,10 +1165,11 @@ static void detach_indirect_bh(void *opaque)
struct detach_by_parent_data *data = opaque;
bdrv_dec_in_flight(data->child_b->bs);
+
+ bdrv_graph_wrlock(NULL);
bdrv_unref_child(data->parent_b, data->child_b);
bdrv_ref(data->c);
- bdrv_graph_wrlock(NULL);
data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
&child_of_bds, BDRV_CHILD_DATA,
&error_abort);
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v2 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (19 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 20/21] block: Mark bdrv_unref_child() GRAPH_WRLOCK Kevin Wolf
@ 2023-09-11 9:46 ` Kevin Wolf
2023-09-12 16:49 ` [PATCH v2 00/21] Graph locking part 4 (node management) Stefan Hajnoczi
21 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-11 9:46 UTC (permalink / raw)
To: qemu-block
Cc: kwolf, stefanha, eblake, eesposit, pbonzini, vsementsov,
qemu-devel
The functions read the parents list in the generic block layer, so we
need to hold the graph lock already there. The BlockDriver
implementations actually modify the graph, so it has to be a writer
lock.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
include/block/block-global-state.h | 8 +++++---
include/block/block_int-common.h | 9 +++++----
block/quorum.c | 23 ++++++-----------------
blockdev.c | 17 +++++++++++------
4 files changed, 27 insertions(+), 30 deletions(-)
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index 0f6df8f1a2..f31660c7b1 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -276,9 +276,11 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
int bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
-void bdrv_add_child(BlockDriverState *parent, BlockDriverState *child,
- Error **errp);
-void bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
+void GRAPH_WRLOCK
+bdrv_add_child(BlockDriverState *parent, BlockDriverState *child, Error **errp);
+
+void GRAPH_WRLOCK
+bdrv_del_child(BlockDriverState *parent, BdrvChild *child, Error **errp);
/**
*
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 3feb67ec4a..2ca3758cb8 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -393,10 +393,11 @@ struct BlockDriver {
*/
int (*bdrv_probe_geometry)(BlockDriverState *bs, HDGeometry *geo);
- void (*bdrv_add_child)(BlockDriverState *parent, BlockDriverState *child,
- Error **errp);
- void (*bdrv_del_child)(BlockDriverState *parent, BdrvChild *child,
- Error **errp);
+ void GRAPH_WRLOCK_PTR (*bdrv_add_child)(
+ BlockDriverState *parent, BlockDriverState *child, Error **errp);
+
+ void GRAPH_WRLOCK_PTR (*bdrv_del_child)(
+ BlockDriverState *parent, BdrvChild *child, Error **errp);
/**
* Informs the block driver that a permission change is intended. The
diff --git a/block/quorum.c b/block/quorum.c
index 620a50ba2c..05220cab7f 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1066,8 +1066,8 @@ static void quorum_close(BlockDriverState *bs)
g_free(s->children);
}
-static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
- Error **errp)
+static void GRAPH_WRLOCK
+quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs, Error **errp)
{
BDRVQuorumState *s = bs->opaque;
BdrvChild *child;
@@ -1093,29 +1093,22 @@ static void quorum_add_child(BlockDriverState *bs, BlockDriverState *child_bs,
}
s->next_child_index++;
- bdrv_drained_begin(bs);
-
/* We can safely add the child now */
bdrv_ref(child_bs);
- bdrv_graph_wrlock(child_bs);
child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
BDRV_CHILD_DATA, errp);
- bdrv_graph_wrunlock();
if (child == NULL) {
s->next_child_index--;
- goto out;
+ return;
}
s->children = g_renew(BdrvChild *, s->children, s->num_children + 1);
s->children[s->num_children++] = child;
quorum_refresh_flags(bs);
-
-out:
- bdrv_drained_end(bs);
}
-static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
- Error **errp)
+static void GRAPH_WRLOCK
+quorum_del_child(BlockDriverState *bs, BdrvChild *child, Error **errp)
{
BDRVQuorumState *s = bs->opaque;
char indexstr[INDEXSTR_LEN];
@@ -1145,18 +1138,14 @@ static void quorum_del_child(BlockDriverState *bs, BdrvChild *child,
s->next_child_index--;
}
- bdrv_drained_begin(bs);
-
/* We can safely remove this child now */
memmove(&s->children[i], &s->children[i + 1],
(s->num_children - i - 1) * sizeof(BdrvChild *));
s->children = g_renew(BdrvChild *, s->children, --s->num_children);
- bdrv_graph_wrlock(NULL);
+
bdrv_unref_child(bs, child);
- bdrv_graph_wrunlock();
quorum_refresh_flags(bs);
- bdrv_drained_end(bs);
}
static void quorum_gather_child_options(BlockDriverState *bs, QDict *target,
diff --git a/blockdev.c b/blockdev.c
index 372eaf198c..325b7a3bef 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3545,8 +3545,8 @@ out:
aio_context_release(aio_context);
}
-static BdrvChild *bdrv_find_child(BlockDriverState *parent_bs,
- const char *child_name)
+static BdrvChild * GRAPH_RDLOCK
+bdrv_find_child(BlockDriverState *parent_bs, const char *child_name)
{
BdrvChild *child;
@@ -3565,9 +3565,11 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
BlockDriverState *parent_bs, *new_bs = NULL;
BdrvChild *p_child;
+ bdrv_graph_wrlock(NULL);
+
parent_bs = bdrv_lookup_bs(parent, parent, errp);
if (!parent_bs) {
- return;
+ goto out;
}
if (!child == !node) {
@@ -3576,7 +3578,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
} else {
error_setg(errp, "Either child or node must be specified");
}
- return;
+ goto out;
}
if (child) {
@@ -3584,7 +3586,7 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
if (!p_child) {
error_setg(errp, "Node '%s' does not have child '%s'",
parent, child);
- return;
+ goto out;
}
bdrv_del_child(parent_bs, p_child, errp);
}
@@ -3593,10 +3595,13 @@ void qmp_x_blockdev_change(const char *parent, const char *child,
new_bs = bdrv_find_node(node);
if (!new_bs) {
error_setg(errp, "Node '%s' not found", node);
- return;
+ goto out;
}
bdrv_add_child(parent_bs, new_bs, errp);
}
+
+out:
+ bdrv_graph_wrunlock();
}
BlockJobInfoList *qmp_query_block_jobs(Error **errp)
--
2.41.0
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/21] Graph locking part 4 (node management)
2023-09-11 9:45 [PATCH v2 00/21] Graph locking part 4 (node management) Kevin Wolf
` (20 preceding siblings ...)
2023-09-11 9:46 ` [PATCH v2 21/21] block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK Kevin Wolf
@ 2023-09-12 16:49 ` Stefan Hajnoczi
2023-09-14 13:12 ` Kevin Wolf
21 siblings, 1 reply; 31+ messages in thread
From: Stefan Hajnoczi @ 2023-09-12 16:49 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 3641 bytes --]
On Mon, Sep 11, 2023 at 11:45:59AM +0200, Kevin Wolf wrote:
> The previous parts of the graph locking changes focussed mostly on the
> BlockDriver side and taking reader locks while performing I/O. This
> series focusses more on the functions managing the graph structure, i.e
> adding, removing and replacing nodes and updating their permissions.
>
> Many of these places actually need to take the writer lock to avoid
> readers seeing an inconsistent half-updated graph state. Therefore
> taking the writer lock is now moved from the very low-level function
> bdrv_replace_child_noperm() into its more high level callers.
>
> v2:
> - Patch 5: Improved comments, added one for bdrv_schedule_unref()
>
> Kevin Wolf (21):
> block: Remove unused BlockReopenQueueEntry.perms_checked
> preallocate: Factor out preallocate_truncate_to_real_size()
> preallocate: Don't poll during permission updates
> block: Take AioContext lock for bdrv_append() more consistently
> block: Introduce bdrv_schedule_unref()
> block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
> block-coroutine-wrapper: Allow arbitrary parameter names
> block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
> block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
> block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
> block: Call transaction callbacks with lock held
> block: Mark bdrv_attach_child() GRAPH_WRLOCK
> block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
> block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
> block: Mark bdrv_child_perm() GRAPH_RDLOCK
> block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
> block: Take graph rdlock in bdrv_drop_intermediate()
> block: Take graph rdlock in bdrv_change_aio_context()
> block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
> block: Mark bdrv_unref_child() GRAPH_WRLOCK
> block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
>
> include/block/block-common.h | 4 +
> include/block/block-global-state.h | 30 +-
> include/block/block_int-common.h | 34 +-
> include/block/block_int-global-state.h | 14 +-
> include/sysemu/block-backend-global-state.h | 4 +-
> block.c | 348 ++++++++++++++------
> block/blklogwrites.c | 4 +
> block/blkverify.c | 2 +
> block/block-backend.c | 29 +-
> block/copy-before-write.c | 10 +-
> block/crypto.c | 6 +-
> block/graph-lock.c | 26 +-
> block/mirror.c | 8 +
> block/preallocate.c | 133 +++++---
> block/qcow2.c | 4 +-
> block/quorum.c | 23 +-
> block/replication.c | 9 +
> block/snapshot.c | 2 +
> block/stream.c | 20 +-
> block/vmdk.c | 13 +
> blockdev.c | 23 +-
> blockjob.c | 2 +
> tests/unit/test-bdrv-drain.c | 23 +-
> tests/unit/test-bdrv-graph-mod.c | 20 ++
> tests/unit/test-block-iothread.c | 3 +
> scripts/block-coroutine-wrapper.py | 18 +-
> tests/qemu-iotests/051.pc.out | 6 +-
> 27 files changed, 591 insertions(+), 227 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v2 00/21] Graph locking part 4 (node management)
2023-09-12 16:49 ` [PATCH v2 00/21] Graph locking part 4 (node management) Stefan Hajnoczi
@ 2023-09-14 13:12 ` Kevin Wolf
0 siblings, 0 replies; 31+ messages in thread
From: Kevin Wolf @ 2023-09-14 13:12 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-block, eblake, eesposit, pbonzini, vsementsov, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 3881 bytes --]
Am 12.09.2023 um 18:49 hat Stefan Hajnoczi geschrieben:
> On Mon, Sep 11, 2023 at 11:45:59AM +0200, Kevin Wolf wrote:
> > The previous parts of the graph locking changes focussed mostly on the
> > BlockDriver side and taking reader locks while performing I/O. This
> > series focusses more on the functions managing the graph structure, i.e
> > adding, removing and replacing nodes and updating their permissions.
> >
> > Many of these places actually need to take the writer lock to avoid
> > readers seeing an inconsistent half-updated graph state. Therefore
> > taking the writer lock is now moved from the very low-level function
> > bdrv_replace_child_noperm() into its more high level callers.
> >
> > v2:
> > - Patch 5: Improved comments, added one for bdrv_schedule_unref()
> >
> > Kevin Wolf (21):
> > block: Remove unused BlockReopenQueueEntry.perms_checked
> > preallocate: Factor out preallocate_truncate_to_real_size()
> > preallocate: Don't poll during permission updates
> > block: Take AioContext lock for bdrv_append() more consistently
> > block: Introduce bdrv_schedule_unref()
> > block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
> > block-coroutine-wrapper: Allow arbitrary parameter names
> > block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
> > block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
> > block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
> > block: Call transaction callbacks with lock held
> > block: Mark bdrv_attach_child() GRAPH_WRLOCK
> > block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
> > block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
> > block: Mark bdrv_child_perm() GRAPH_RDLOCK
> > block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
> > block: Take graph rdlock in bdrv_drop_intermediate()
> > block: Take graph rdlock in bdrv_change_aio_context()
> > block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
> > block: Mark bdrv_unref_child() GRAPH_WRLOCK
> > block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
> >
> > include/block/block-common.h | 4 +
> > include/block/block-global-state.h | 30 +-
> > include/block/block_int-common.h | 34 +-
> > include/block/block_int-global-state.h | 14 +-
> > include/sysemu/block-backend-global-state.h | 4 +-
> > block.c | 348 ++++++++++++++------
> > block/blklogwrites.c | 4 +
> > block/blkverify.c | 2 +
> > block/block-backend.c | 29 +-
> > block/copy-before-write.c | 10 +-
> > block/crypto.c | 6 +-
> > block/graph-lock.c | 26 +-
> > block/mirror.c | 8 +
> > block/preallocate.c | 133 +++++---
> > block/qcow2.c | 4 +-
> > block/quorum.c | 23 +-
> > block/replication.c | 9 +
> > block/snapshot.c | 2 +
> > block/stream.c | 20 +-
> > block/vmdk.c | 13 +
> > blockdev.c | 23 +-
> > blockjob.c | 2 +
> > tests/unit/test-bdrv-drain.c | 23 +-
> > tests/unit/test-bdrv-graph-mod.c | 20 ++
> > tests/unit/test-block-iothread.c | 3 +
> > scripts/block-coroutine-wrapper.py | 18 +-
> > tests/qemu-iotests/051.pc.out | 6 +-
> > 27 files changed, 591 insertions(+), 227 deletions(-)
>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Thanks, applied to the block branch.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 31+ messages in thread