* [PATCH 0/2] block/preallocate: fix image truncation logic @ 2024-10-09 13:58 Denis V. Lunev via 2024-10-09 13:58 ` [PATCH 1/2] preallocate: do not allow to change BDS permission improperly Denis V. Lunev via 2024-10-09 13:58 ` [PATCH 2/2] block/preallocate: fix image truncation logic Denis V. Lunev via 0 siblings, 2 replies; 7+ messages in thread From: Denis V. Lunev via @ 2024-10-09 13:58 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Denis V . Lunev, Andrey Drobyshev, Vladimir Sementsov-Ogievskiy, Kevin Wolf Recent QEMU changes around preallocate_set_perm mandates that it is not possible to poll on aio_context inside this function anymore. Thus truncate operation has been moved inside bottom half. This bottom half is scheduled from preallocate_set_perm() and that is all. This approach proven to be problematic in a lot of places once additional operations are executed over preallocate filter in production. The code validates that permissions have been really changed just after the call to the set operation. All permissions operations or block driver graph changes are performed inside the quiscent state in terms of the block layer. This means that there are no in-flight packets which is guaranteed by the passing through bdrv_drain() section. The idea is that we should effectively disable preallocate filter inside bdrv_drain() and unblock permission changes. This section is definitely not on the hot path and additional single truncate operation will not hurt. Unfortunately bdrv_drain_begin() callback according to the documentation also disallow waiting inside. Thus original approach with the bottom half is not changed. bdrv_drain_begin() schedules the operation and in order to ensure that it has been really executed before completion of the section increments the amount of in-flight requests. In addition to this we should disable lifting WRITE permission when truncate() operation is not fully completed yet. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/2] preallocate: do not allow to change BDS permission improperly 2024-10-09 13:58 [PATCH 0/2] block/preallocate: fix image truncation logic Denis V. Lunev via @ 2024-10-09 13:58 ` Denis V. Lunev via 2024-10-09 14:54 ` Andrey Drobyshev 2024-10-09 13:58 ` [PATCH 2/2] block/preallocate: fix image truncation logic Denis V. Lunev via 1 sibling, 1 reply; 7+ messages in thread From: Denis V. Lunev via @ 2024-10-09 13:58 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Denis V. Lunev, Andrey Drobyshev, Vladimir Sementsov-Ogievskiy, Kevin Wolf RW permissions could not be lifted from the preallocation filter if truncate operation has not been finished. In the other case this would mean WRITE operation (image truncate) called after the return from inactivate call. This is definitely a contract violation. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> --- block/preallocate.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/block/preallocate.c b/block/preallocate.c index bfb638d8b1..1cf854966c 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -581,6 +581,17 @@ static void preallocate_child_perm(BlockDriverState *bs, BdrvChild *c, } } +static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, + uint64_t shared, Error **errp) +{ + BDRVPreallocateState *s = bs->opaque; + if (!can_write_resize(perm) && s->data_end != -EINVAL) { + error_setg_errno(errp, EPERM, "Write access is required for truncate"); + return -EPERM; + } + return 0; +} + static BlockDriver bdrv_preallocate_filter = { .format_name = "preallocate", .instance_size = sizeof(BDRVPreallocateState), @@ -602,6 +613,7 @@ static BlockDriver bdrv_preallocate_filter = { .bdrv_set_perm = preallocate_set_perm, .bdrv_child_perm = preallocate_child_perm, + .bdrv_check_perm = preallocate_check_perm, .is_filter = true, }; -- 2.43.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 1/2] preallocate: do not allow to change BDS permission improperly 2024-10-09 13:58 ` [PATCH 1/2] preallocate: do not allow to change BDS permission improperly Denis V. Lunev via @ 2024-10-09 14:54 ` Andrey Drobyshev 0 siblings, 0 replies; 7+ messages in thread From: Andrey Drobyshev @ 2024-10-09 14:54 UTC (permalink / raw) To: Denis V. Lunev, qemu-devel Cc: qemu-block, Vladimir Sementsov-Ogievskiy, Kevin Wolf On 10/9/24 4:58 PM, Denis V. Lunev wrote: > RW permissions could not be lifted from the preallocation filter if > truncate operation has not been finished. In the other case this would > mean WRITE operation (image truncate) called after the return from > inactivate call. This is definitely a contract violation. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> > CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > CC: Kevin Wolf <kwolf@redhat.com> > --- > block/preallocate.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/block/preallocate.c b/block/preallocate.c > index bfb638d8b1..1cf854966c 100644 > --- a/block/preallocate.c > +++ b/block/preallocate.c > @@ -581,6 +581,17 @@ static void preallocate_child_perm(BlockDriverState *bs, BdrvChild *c, > } > } > > +static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, > + uint64_t shared, Error **errp) > +{ > + BDRVPreallocateState *s = bs->opaque; > + if (!can_write_resize(perm) && s->data_end != -EINVAL) { > + error_setg_errno(errp, EPERM, "Write access is required for truncate"); > + return -EPERM; > + } > + return 0; > +} > + > static BlockDriver bdrv_preallocate_filter = { > .format_name = "preallocate", > .instance_size = sizeof(BDRVPreallocateState), > @@ -602,6 +613,7 @@ static BlockDriver bdrv_preallocate_filter = { > > .bdrv_set_perm = preallocate_set_perm, > .bdrv_child_perm = preallocate_child_perm, > + .bdrv_check_perm = preallocate_check_perm, > > .is_filter = true, > }; Reviewed-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] block/preallocate: fix image truncation logic 2024-10-09 13:58 [PATCH 0/2] block/preallocate: fix image truncation logic Denis V. Lunev via 2024-10-09 13:58 ` [PATCH 1/2] preallocate: do not allow to change BDS permission improperly Denis V. Lunev via @ 2024-10-09 13:58 ` Denis V. Lunev via 2024-10-09 14:54 ` Andrey Drobyshev 1 sibling, 1 reply; 7+ messages in thread From: Denis V. Lunev via @ 2024-10-09 13:58 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Denis V. Lunev, Andrey Drobyshev, Vladimir Sementsov-Ogievskiy, Kevin Wolf Recent QEMU changes around preallocate_set_perm mandates that it is not possible to poll on aio_context inside this function anymore. Thus truncate operation has been moved inside bottom half. This bottom half is scheduled from preallocate_set_perm() and that is all. This approach proven to be problematic in a lot of places once additional operations are executed over preallocate filter in production. The code validates that permissions have been really changed just after the call to the set operation. All permissions operations or block driver graph changes are performed inside the quiscent state in terms of the block layer. This means that there are no in-flight packets which is guaranteed by the passing through bdrv_drain() section. The idea is that we should effectively disable preallocate filter inside bdrv_drain() and unblock permission changes. This section is definitely not on the hot path and additional single truncate operation will not hurt. Unfortunately bdrv_drain_begin() callback according to the documentation also disallow waiting inside. Thus original approach with the bottom half is not changed. bdrv_drain_begin() schedules the operation and in order to ensure that it has been really executed before completion of the section increments the amount of in-flight requests. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> --- block/preallocate.c | 38 ++++++++++++++++++++++++++++++++++---- tests/qemu-iotests/298 | 6 ++++-- 2 files changed, 38 insertions(+), 6 deletions(-) diff --git a/block/preallocate.c b/block/preallocate.c index 1cf854966c..d78ef0b045 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -78,6 +78,7 @@ typedef struct BDRVPreallocateState { /* Gives up the resize permission on children when parents don't need it */ QEMUBH *drop_resize_bh; + bool drop_resize_armed; } BDRVPreallocateState; static int preallocate_drop_resize(BlockDriverState *bs, Error **errp); @@ -149,6 +150,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags, */ s->file_end = s->zero_start = s->data_end = -EINVAL; s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs); + s->drop_resize_armed = false; ret = bdrv_open_file_child(NULL, options, "file", bs, errp); if (ret < 0) { @@ -200,7 +202,7 @@ static void preallocate_close(BlockDriverState *bs) { BDRVPreallocateState *s = bs->opaque; - qemu_bh_cancel(s->drop_resize_bh); + assert(!s->drop_resize_armed); qemu_bh_delete(s->drop_resize_bh); if (s->data_end >= 0) { @@ -504,6 +506,8 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) BDRVPreallocateState *s = bs->opaque; int ret; + s->drop_resize_armed = false; + if (s->data_end < 0) { return 0; } @@ -534,11 +538,15 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) static void preallocate_drop_resize_bh(void *opaque) { + BlockDriverState *bs = opaque; + /* * In case of errors, we'll simply keep the exclusive lock on the image * indefinitely. */ - preallocate_drop_resize(opaque, NULL); + preallocate_drop_resize(bs, NULL); + + bdrv_dec_in_flight(bs); } static void preallocate_set_perm(BlockDriverState *bs, @@ -547,13 +555,13 @@ static void preallocate_set_perm(BlockDriverState *bs, BDRVPreallocateState *s = bs->opaque; if (can_write_resize(perm)) { - qemu_bh_cancel(s->drop_resize_bh); if (s->data_end < 0) { s->data_end = s->file_end = s->zero_start = bs->file->bs->total_sectors * BDRV_SECTOR_SIZE; } } else { - qemu_bh_schedule(s->drop_resize_bh); + assert(!s->drop_resize_armed); + assert(s->data_end < 0); } } @@ -592,6 +600,26 @@ static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, return 0; } +static void preallocate_drain_begin(BlockDriverState *bs) +{ + BDRVPreallocateState *s = bs->opaque; + + if (s->data_end < 0) { + return; + } + if (s->drop_resize_armed) { + return; + } + if (s->data_end == s->file_end) { + s->file_end = s->zero_start = s->data_end = -EINVAL; + return; + } + + s->drop_resize_armed = true; + bdrv_inc_in_flight(bs); + qemu_bh_schedule(s->drop_resize_bh); +} + static BlockDriver bdrv_preallocate_filter = { .format_name = "preallocate", .instance_size = sizeof(BDRVPreallocateState), @@ -600,6 +628,8 @@ static BlockDriver bdrv_preallocate_filter = { .bdrv_open = preallocate_open, .bdrv_close = preallocate_close, + .bdrv_drain_begin = preallocate_drain_begin, + .bdrv_reopen_prepare = preallocate_reopen_prepare, .bdrv_reopen_commit = preallocate_reopen_commit, .bdrv_reopen_abort = preallocate_reopen_abort, diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 index 9e75ac6975..41f12685a7 100755 --- a/tests/qemu-iotests/298 +++ b/tests/qemu-iotests/298 @@ -94,8 +94,10 @@ class TestPreallocateFilter(TestPreallocateBase): self.assert_qmp(result, 'return', {}) self.complete_and_wait() - # commit of new megabyte should trigger preallocation - self.check_big() + # commit of new megabyte should trigger preallocation, but drain + # will make file smaller + self.check_small() + def test_reopen_opts(self): result = self.vm.qmp('blockdev-reopen', options=[{ -- 2.43.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] block/preallocate: fix image truncation logic 2024-10-09 13:58 ` [PATCH 2/2] block/preallocate: fix image truncation logic Denis V. Lunev via @ 2024-10-09 14:54 ` Andrey Drobyshev 2024-10-09 14:54 ` Denis V. Lunev 0 siblings, 1 reply; 7+ messages in thread From: Andrey Drobyshev @ 2024-10-09 14:54 UTC (permalink / raw) To: Denis V. Lunev, qemu-devel Cc: qemu-block, Vladimir Sementsov-Ogievskiy, Kevin Wolf On 10/9/24 4:58 PM, Denis V. Lunev wrote: > Recent QEMU changes around preallocate_set_perm mandates that it is not > possible to poll on aio_context inside this function anymore. Thus > truncate operation has been moved inside bottom half. This bottom half > is scheduled from preallocate_set_perm() and that is all. > > This approach proven to be problematic in a lot of places once > additional operations are executed over preallocate filter in > production. The code validates that permissions have been really changed > just after the call to the set operation. > > All permissions operations or block driver graph changes are performed > inside the quiscent state in terms of the block layer. This means that > there are no in-flight packets which is guaranteed by the passing > through bdrv_drain() section. > > The idea is that we should effectively disable preallocate filter inside > bdrv_drain() and unblock permission changes. This section is definitely > not on the hot path and additional single truncate operation will not > hurt. > > Unfortunately bdrv_drain_begin() callback according to the documentation > also disallow waiting inside. Thus original approach with the bottom > half is not changed. bdrv_drain_begin() schedules the operation and in > order to ensure that it has been really executed before completion of > the section increments the amount of in-flight requests. > > Signed-off-by: Denis V. Lunev <den@openvz.org> > CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> > CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> > CC: Kevin Wolf <kwolf@redhat.com> > --- > block/preallocate.c | 38 ++++++++++++++++++++++++++++++++++---- > tests/qemu-iotests/298 | 6 ++++-- > 2 files changed, 38 insertions(+), 6 deletions(-) > > diff --git a/block/preallocate.c b/block/preallocate.c > index 1cf854966c..d78ef0b045 100644 > --- a/block/preallocate.c > +++ b/block/preallocate.c > @@ -78,6 +78,7 @@ typedef struct BDRVPreallocateState { > > /* Gives up the resize permission on children when parents don't need it */ > QEMUBH *drop_resize_bh; > + bool drop_resize_armed; > } BDRVPreallocateState; > > static int preallocate_drop_resize(BlockDriverState *bs, Error **errp); > @@ -149,6 +150,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags, > */ > s->file_end = s->zero_start = s->data_end = -EINVAL; > s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs); > + s->drop_resize_armed = false; > > ret = bdrv_open_file_child(NULL, options, "file", bs, errp); > if (ret < 0) { > @@ -200,7 +202,7 @@ static void preallocate_close(BlockDriverState *bs) > { > BDRVPreallocateState *s = bs->opaque; > > - qemu_bh_cancel(s->drop_resize_bh); > + assert(!s->drop_resize_armed); > qemu_bh_delete(s->drop_resize_bh); > > if (s->data_end >= 0) { > @@ -504,6 +506,8 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) > BDRVPreallocateState *s = bs->opaque; > int ret; > > + s->drop_resize_armed = false; > + > if (s->data_end < 0) { > return 0; > } > @@ -534,11 +538,15 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) > > static void preallocate_drop_resize_bh(void *opaque) > { > + BlockDriverState *bs = opaque; > + > /* > * In case of errors, we'll simply keep the exclusive lock on the image > * indefinitely. > */ > - preallocate_drop_resize(opaque, NULL); > + preallocate_drop_resize(bs, NULL); > + > + bdrv_dec_in_flight(bs); > } > > static void preallocate_set_perm(BlockDriverState *bs, > @@ -547,13 +555,13 @@ static void preallocate_set_perm(BlockDriverState *bs, > BDRVPreallocateState *s = bs->opaque; > > if (can_write_resize(perm)) { > - qemu_bh_cancel(s->drop_resize_bh); > if (s->data_end < 0) { > s->data_end = s->file_end = s->zero_start = > bs->file->bs->total_sectors * BDRV_SECTOR_SIZE; > } > } else { > - qemu_bh_schedule(s->drop_resize_bh); > + assert(!s->drop_resize_armed); > + assert(s->data_end < 0); > } > } > > @@ -592,6 +600,26 @@ static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, > return 0; > } > > +static void preallocate_drain_begin(BlockDriverState *bs) > +{ > + BDRVPreallocateState *s = bs->opaque; > + > + if (s->data_end < 0) { > + return; > + } > + if (s->drop_resize_armed) { > + return; > + } > + if (s->data_end == s->file_end) { > + s->file_end = s->zero_start = s->data_end = -EINVAL; > + return; > + } > + > + s->drop_resize_armed = true; > + bdrv_inc_in_flight(bs); > + qemu_bh_schedule(s->drop_resize_bh); > +} > + > static BlockDriver bdrv_preallocate_filter = { > .format_name = "preallocate", > .instance_size = sizeof(BDRVPreallocateState), > @@ -600,6 +628,8 @@ static BlockDriver bdrv_preallocate_filter = { > .bdrv_open = preallocate_open, > .bdrv_close = preallocate_close, > > + .bdrv_drain_begin = preallocate_drain_begin, > + > .bdrv_reopen_prepare = preallocate_reopen_prepare, > .bdrv_reopen_commit = preallocate_reopen_commit, > .bdrv_reopen_abort = preallocate_reopen_abort, > diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 > index 9e75ac6975..41f12685a7 100755 > --- a/tests/qemu-iotests/298 > +++ b/tests/qemu-iotests/298 > @@ -94,8 +94,10 @@ class TestPreallocateFilter(TestPreallocateBase): > self.assert_qmp(result, 'return', {}) > self.complete_and_wait() > > - # commit of new megabyte should trigger preallocation > - self.check_big() > + # commit of new megabyte should trigger preallocation, but drain > + # will make file smaller > + self.check_small() > + > > def test_reopen_opts(self): > result = self.vm.qmp('blockdev-reopen', options=[{ This patch doesn't seem to be applying cleanly to the current master branch ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/2] block/preallocate: fix image truncation logic 2024-10-09 14:54 ` Andrey Drobyshev @ 2024-10-09 14:54 ` Denis V. Lunev 0 siblings, 0 replies; 7+ messages in thread From: Denis V. Lunev @ 2024-10-09 14:54 UTC (permalink / raw) To: Andrey Drobyshev, Denis V. Lunev, qemu-devel Cc: qemu-block, Vladimir Sementsov-Ogievskiy, Kevin Wolf On 10/9/24 16:54, Andrey Drobyshev wrote: > On 10/9/24 4:58 PM, Denis V. Lunev wrote: >> Recent QEMU changes around preallocate_set_perm mandates that it is not >> possible to poll on aio_context inside this function anymore. Thus >> truncate operation has been moved inside bottom half. This bottom half >> is scheduled from preallocate_set_perm() and that is all. >> >> This approach proven to be problematic in a lot of places once >> additional operations are executed over preallocate filter in >> production. The code validates that permissions have been really changed >> just after the call to the set operation. >> >> All permissions operations or block driver graph changes are performed >> inside the quiscent state in terms of the block layer. This means that >> there are no in-flight packets which is guaranteed by the passing >> through bdrv_drain() section. >> >> The idea is that we should effectively disable preallocate filter inside >> bdrv_drain() and unblock permission changes. This section is definitely >> not on the hot path and additional single truncate operation will not >> hurt. >> >> Unfortunately bdrv_drain_begin() callback according to the documentation >> also disallow waiting inside. Thus original approach with the bottom >> half is not changed. bdrv_drain_begin() schedules the operation and in >> order to ensure that it has been really executed before completion of >> the section increments the amount of in-flight requests. >> >> Signed-off-by: Denis V. Lunev <den@openvz.org> >> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> >> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> >> CC: Kevin Wolf <kwolf@redhat.com> >> --- >> block/preallocate.c | 38 ++++++++++++++++++++++++++++++++++---- >> tests/qemu-iotests/298 | 6 ++++-- >> 2 files changed, 38 insertions(+), 6 deletions(-) >> >> diff --git a/block/preallocate.c b/block/preallocate.c >> index 1cf854966c..d78ef0b045 100644 >> --- a/block/preallocate.c >> +++ b/block/preallocate.c >> @@ -78,6 +78,7 @@ typedef struct BDRVPreallocateState { >> >> /* Gives up the resize permission on children when parents don't need it */ >> QEMUBH *drop_resize_bh; >> + bool drop_resize_armed; >> } BDRVPreallocateState; >> >> static int preallocate_drop_resize(BlockDriverState *bs, Error **errp); >> @@ -149,6 +150,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags, >> */ >> s->file_end = s->zero_start = s->data_end = -EINVAL; >> s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs); >> + s->drop_resize_armed = false; >> >> ret = bdrv_open_file_child(NULL, options, "file", bs, errp); >> if (ret < 0) { >> @@ -200,7 +202,7 @@ static void preallocate_close(BlockDriverState *bs) >> { >> BDRVPreallocateState *s = bs->opaque; >> >> - qemu_bh_cancel(s->drop_resize_bh); >> + assert(!s->drop_resize_armed); >> qemu_bh_delete(s->drop_resize_bh); >> >> if (s->data_end >= 0) { >> @@ -504,6 +506,8 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) >> BDRVPreallocateState *s = bs->opaque; >> int ret; >> >> + s->drop_resize_armed = false; >> + >> if (s->data_end < 0) { >> return 0; >> } >> @@ -534,11 +538,15 @@ static int preallocate_drop_resize(BlockDriverState *bs, Error **errp) >> >> static void preallocate_drop_resize_bh(void *opaque) >> { >> + BlockDriverState *bs = opaque; >> + >> /* >> * In case of errors, we'll simply keep the exclusive lock on the image >> * indefinitely. >> */ >> - preallocate_drop_resize(opaque, NULL); >> + preallocate_drop_resize(bs, NULL); >> + >> + bdrv_dec_in_flight(bs); >> } >> >> static void preallocate_set_perm(BlockDriverState *bs, >> @@ -547,13 +555,13 @@ static void preallocate_set_perm(BlockDriverState *bs, >> BDRVPreallocateState *s = bs->opaque; >> >> if (can_write_resize(perm)) { >> - qemu_bh_cancel(s->drop_resize_bh); >> if (s->data_end < 0) { >> s->data_end = s->file_end = s->zero_start = >> bs->file->bs->total_sectors * BDRV_SECTOR_SIZE; >> } >> } else { >> - qemu_bh_schedule(s->drop_resize_bh); >> + assert(!s->drop_resize_armed); >> + assert(s->data_end < 0); >> } >> } >> >> @@ -592,6 +600,26 @@ static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, >> return 0; >> } >> >> +static void preallocate_drain_begin(BlockDriverState *bs) >> +{ >> + BDRVPreallocateState *s = bs->opaque; >> + >> + if (s->data_end < 0) { >> + return; >> + } >> + if (s->drop_resize_armed) { >> + return; >> + } >> + if (s->data_end == s->file_end) { >> + s->file_end = s->zero_start = s->data_end = -EINVAL; >> + return; >> + } >> + >> + s->drop_resize_armed = true; >> + bdrv_inc_in_flight(bs); >> + qemu_bh_schedule(s->drop_resize_bh); >> +} >> + >> static BlockDriver bdrv_preallocate_filter = { >> .format_name = "preallocate", >> .instance_size = sizeof(BDRVPreallocateState), >> @@ -600,6 +628,8 @@ static BlockDriver bdrv_preallocate_filter = { >> .bdrv_open = preallocate_open, >> .bdrv_close = preallocate_close, >> >> + .bdrv_drain_begin = preallocate_drain_begin, >> + >> .bdrv_reopen_prepare = preallocate_reopen_prepare, >> .bdrv_reopen_commit = preallocate_reopen_commit, >> .bdrv_reopen_abort = preallocate_reopen_abort, >> diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 >> index 9e75ac6975..41f12685a7 100755 >> --- a/tests/qemu-iotests/298 >> +++ b/tests/qemu-iotests/298 >> @@ -94,8 +94,10 @@ class TestPreallocateFilter(TestPreallocateBase): >> self.assert_qmp(result, 'return', {}) >> self.complete_and_wait() >> >> - # commit of new megabyte should trigger preallocation >> - self.check_big() >> + # commit of new megabyte should trigger preallocation, but drain >> + # will make file smaller >> + self.check_small() >> + >> >> def test_reopen_opts(self): >> result = self.vm.qmp('blockdev-reopen', options=[{ > This patch doesn't seem to be applying cleanly to the current master branch this is my fault. Thanks. I will resend. Den ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v2 0/2] block/preallocate: fix image truncation logic @ 2024-10-09 15:37 Denis V. Lunev via 2024-10-09 15:37 ` [PATCH 2/2] " Denis V. Lunev via 0 siblings, 1 reply; 7+ messages in thread From: Denis V. Lunev via @ 2024-10-09 15:37 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Denis V . Lunev, Andrey Drobyshev, Vladimir Sementsov-Ogievskiy, Kevin Wolf Recent QEMU changes around preallocate_set_perm mandates that it is not possible to poll on aio_context inside this function anymore. Thus truncate operation has been moved inside bottom half. This bottom half is scheduled from preallocate_set_perm() and that is all. This approach proven to be problematic in a lot of places once additional operations are executed over preallocate filter in production. The code validates that permissions have been really changed just after the call to the set operation. All permissions operations or block driver graph changes are performed inside the quiscent state in terms of the block layer. This means that there are no in-flight packets which is guaranteed by the passing through bdrv_drain() section. The idea is that we should effectively disable preallocate filter inside bdrv_drain() and unblock permission changes. This section is definitely not on the hot path and additional single truncate operation will not hurt. Unfortunately bdrv_drain_begin() callback according to the documentation also disallow waiting inside. Thus original approach with the bottom half is not changed. bdrv_drain_begin() schedules the operation and in order to ensure that it has been really executed before completion of the section increments the amount of in-flight requests. In addition to this we should disable lifting WRITE permission when truncate() operation is not fully completed yet. Changes from v1: - rebased to the latest master Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/2] block/preallocate: fix image truncation logic 2024-10-09 15:37 [PATCH v2 0/2] " Denis V. Lunev via @ 2024-10-09 15:37 ` Denis V. Lunev via 0 siblings, 0 replies; 7+ messages in thread From: Denis V. Lunev via @ 2024-10-09 15:37 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Denis V. Lunev, Andrey Drobyshev, Vladimir Sementsov-Ogievskiy, Kevin Wolf Recent QEMU changes around preallocate_set_perm mandates that it is not possible to poll on aio_context inside this function anymore. Thus truncate operation has been moved inside bottom half. This bottom half is scheduled from preallocate_set_perm() and that is all. This approach proven to be problematic in a lot of places once additional operations are executed over preallocate filter in production. The code validates that permissions have been really changed just after the call to the set operation. All permissions operations or block driver graph changes are performed inside the quiscent state in terms of the block layer. This means that there are no in-flight packets which is guaranteed by the passing through bdrv_drain() section. The idea is that we should effectively disable preallocate filter inside bdrv_drain() and unblock permission changes. This section is definitely not on the hot path and additional single truncate operation will not hurt. Unfortunately bdrv_drain_begin() callback according to the documentation also disallow waiting inside. Thus original approach with the bottom half is not changed. bdrv_drain_begin() schedules the operation and in order to ensure that it has been really executed before completion of the section increments the amount of in-flight requests. Signed-off-by: Denis V. Lunev <den@openvz.org> CC: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com> CC: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> CC: Kevin Wolf <kwolf@redhat.com> --- block/preallocate.c | 42 ++++++++++++++++++++++++++++++++++++++---- tests/qemu-iotests/298 | 6 ++++-- 2 files changed, 42 insertions(+), 6 deletions(-) diff --git a/block/preallocate.c b/block/preallocate.c index 1016c511cb..16a92a2e0d 100644 --- a/block/preallocate.c +++ b/block/preallocate.c @@ -78,6 +78,7 @@ typedef struct BDRVPreallocateState { /* Gives up the resize permission on children when parents don't need it */ QEMUBH *drop_resize_bh; + bool drop_resize_armed; } BDRVPreallocateState; static int preallocate_drop_resize(BlockDriverState *bs, Error **errp); @@ -151,6 +152,7 @@ static int preallocate_open(BlockDriverState *bs, QDict *options, int flags, */ s->file_end = s->zero_start = s->data_end = -EINVAL; s->drop_resize_bh = qemu_bh_new(preallocate_drop_resize_bh, bs); + s->drop_resize_armed = false; ret = bdrv_open_file_child(NULL, options, "file", bs, errp); if (ret < 0) { @@ -208,7 +210,7 @@ static void preallocate_close(BlockDriverState *bs) GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); - qemu_bh_cancel(s->drop_resize_bh); + assert(!s->drop_resize_armed); qemu_bh_delete(s->drop_resize_bh); if (s->data_end >= 0) { @@ -516,6 +518,8 @@ preallocate_drop_resize(BlockDriverState *bs, Error **errp) BDRVPreallocateState *s = bs->opaque; int ret; + s->drop_resize_armed = false; + if (s->data_end < 0) { return 0; } @@ -544,6 +548,12 @@ preallocate_drop_resize(BlockDriverState *bs, Error **errp) static void preallocate_drop_resize_bh(void *opaque) { + BlockDriverState *bs = opaque; + + /* + * In case of errors, we'll simply keep the exclusive lock on the image + * indefinitely. + */ GLOBAL_STATE_CODE(); GRAPH_RDLOCK_GUARD_MAINLOOP(); @@ -551,7 +561,9 @@ static void preallocate_drop_resize_bh(void *opaque) * In case of errors, we'll simply keep the exclusive lock on the image * indefinitely. */ - preallocate_drop_resize(opaque, NULL); + preallocate_drop_resize(bs, NULL); + + bdrv_dec_in_flight(bs); } static void GRAPH_RDLOCK @@ -560,13 +572,13 @@ preallocate_set_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared) BDRVPreallocateState *s = bs->opaque; if (can_write_resize(perm)) { - qemu_bh_cancel(s->drop_resize_bh); if (s->data_end < 0) { s->data_end = s->file_end = s->zero_start = bs->file->bs->total_sectors * BDRV_SECTOR_SIZE; } } else { - qemu_bh_schedule(s->drop_resize_bh); + assert(!s->drop_resize_armed); + assert(s->data_end < 0); } } @@ -605,6 +617,26 @@ static int preallocate_check_perm(BlockDriverState *bs, uint64_t perm, return 0; } +static void preallocate_drain_begin(BlockDriverState *bs) +{ + BDRVPreallocateState *s = bs->opaque; + + if (s->data_end < 0) { + return; + } + if (s->drop_resize_armed) { + return; + } + if (s->data_end == s->file_end) { + s->file_end = s->zero_start = s->data_end = -EINVAL; + return; + } + + s->drop_resize_armed = true; + bdrv_inc_in_flight(bs); + qemu_bh_schedule(s->drop_resize_bh); +} + static BlockDriver bdrv_preallocate_filter = { .format_name = "preallocate", .instance_size = sizeof(BDRVPreallocateState), @@ -613,6 +645,8 @@ static BlockDriver bdrv_preallocate_filter = { .bdrv_open = preallocate_open, .bdrv_close = preallocate_close, + .bdrv_drain_begin = preallocate_drain_begin, + .bdrv_reopen_prepare = preallocate_reopen_prepare, .bdrv_reopen_commit = preallocate_reopen_commit, .bdrv_reopen_abort = preallocate_reopen_abort, diff --git a/tests/qemu-iotests/298 b/tests/qemu-iotests/298 index 09c9290711..fe03d29802 100755 --- a/tests/qemu-iotests/298 +++ b/tests/qemu-iotests/298 @@ -92,8 +92,10 @@ class TestPreallocateFilter(TestPreallocateBase): self.vm.cmd('block-commit', device='overlay') self.complete_and_wait() - # commit of new megabyte should trigger preallocation - self.check_big() + # commit of new megabyte should trigger preallocation, but drain + # will make file smaller + self.check_small() + def test_reopen_opts(self): self.vm.cmd('blockdev-reopen', options=[{ -- 2.43.5 ^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-10-09 15:40 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-10-09 13:58 [PATCH 0/2] block/preallocate: fix image truncation logic Denis V. Lunev via 2024-10-09 13:58 ` [PATCH 1/2] preallocate: do not allow to change BDS permission improperly Denis V. Lunev via 2024-10-09 14:54 ` Andrey Drobyshev 2024-10-09 13:58 ` [PATCH 2/2] block/preallocate: fix image truncation logic Denis V. Lunev via 2024-10-09 14:54 ` Andrey Drobyshev 2024-10-09 14:54 ` Denis V. Lunev -- strict thread matches above, loose matches on Subject: below -- 2024-10-09 15:37 [PATCH v2 0/2] " Denis V. Lunev via 2024-10-09 15:37 ` [PATCH 2/2] " Denis V. Lunev via
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).