* [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs @ 2017-11-29 17:56 Alberto Garcia 2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia 0 siblings, 1 reply; 6+ messages in thread From: Alberto Garcia @ 2017-11-29 17:56 UTC (permalink / raw) To: qemu-devel Cc: Alberto Garcia, qemu-block, Kevin Wolf, Max Reitz, Jeff Cody, Anton Nefedov Hi, this patch fixes the crash reported by Anton Nefedov here: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00159.html I can reproduce it easily with the change he mentions there, or by tweaking iotest 030 as I show here: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html I'm not convinced that this is the best solution, though. As Fam says the block layer is getting complex and I think this can be solved in a different way if the code is properly rewritten. Even with this solution I think it would make sense to assert that the block job's pause count is always 0 when the job is about to be destroyed and perhaps keep a reference while it's being paused. But that's a bigger change and we're close to the release so I opted for this more conservative approach. Regards, Berto Alberto Garcia (1): blockjob: Make block_job_pause_all() keep a reference to the jobs blockjob.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) -- 2.11.0 ^ permalink raw reply [flat|nested] 6+ messages in thread
* [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs 2017-11-29 17:56 [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs Alberto Garcia @ 2017-11-29 17:56 ` Alberto Garcia 2017-11-30 12:27 ` Kevin Wolf 0 siblings, 1 reply; 6+ messages in thread From: Alberto Garcia @ 2017-11-29 17:56 UTC (permalink / raw) To: qemu-devel Cc: Alberto Garcia, qemu-block, Kevin Wolf, Max Reitz, Jeff Cody, Anton Nefedov Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are pausing all block jobs during bdrv_reopen_multiple() to prevent any of them from finishing and removing nodes from the graph while they are being reopened. It turns out that pausing a block job doesn't necessarily prevent it from finishing: a paused block job can still run its exit function from the main loop and call block_job_completed(). The mirror block job in particular always goes to the main loop while it is paused (by virtue of the bdrv_drained_begin() call in mirror_run()). Destroying a paused block job during bdrv_reopen_multiple() has two consequences: 1) The references to the nodes involved in the job are released, possibly destroying some of them. If those nodes were in the reopen queue this would trigger the problem originally described in commit 40840e419be, crashing QEMU. 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would not be doing all necessary bdrv_parent_drained_end() calls. I can reproduce problem 1) easily with iotest 030 by increasing STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking the iotest like in this example: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html This patch keeps an additional reference to all block jobs between block_job_pause_all() and block_job_resume_all(), guaranteeing that they are kept alive. Signed-off-by: Alberto Garcia <berto@igalia.com> --- blockjob.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/blockjob.c b/blockjob.c index 0ed50b953b..715c2c2680 100644 --- a/blockjob.c +++ b/blockjob.c @@ -730,6 +730,7 @@ void block_job_pause_all(void) AioContext *aio_context = blk_get_aio_context(job->blk); aio_context_acquire(aio_context); + block_job_ref(job); block_job_pause(job); aio_context_release(aio_context); } @@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job) void block_job_resume_all(void) { - BlockJob *job = NULL; - while ((job = block_job_next(job))) { + BlockJob *job, *next; + + QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) { AioContext *aio_context = blk_get_aio_context(job->blk); aio_context_acquire(aio_context); block_job_resume(job); + block_job_unref(job); aio_context_release(aio_context); } } -- 2.11.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs 2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia @ 2017-11-30 12:27 ` Kevin Wolf 2017-11-30 14:35 ` Alberto Garcia 0 siblings, 1 reply; 6+ messages in thread From: Kevin Wolf @ 2017-11-30 12:27 UTC (permalink / raw) To: Alberto Garcia Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov Am 29.11.2017 um 18:56 hat Alberto Garcia geschrieben: > Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are > pausing all block jobs during bdrv_reopen_multiple() to prevent any of > them from finishing and removing nodes from the graph while they are > being reopened. > > It turns out that pausing a block job doesn't necessarily prevent it > from finishing: a paused block job can still run its exit function > from the main loop and call block_job_completed(). The mirror block > job in particular always goes to the main loop while it is paused (by > virtue of the bdrv_drained_begin() call in mirror_run()). > > Destroying a paused block job during bdrv_reopen_multiple() has two > consequences: > > 1) The references to the nodes involved in the job are released, > possibly destroying some of them. If those nodes were in the > reopen queue this would trigger the problem originally described > in commit 40840e419be, crashing QEMU. This specific problem could be avoided by making the BDS reference in the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child() and bdrv_unref() only at the end of bdrv_reopen_multiple(). > 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would > not be doing all necessary bdrv_parent_drained_end() calls. If I understand correctly, you don't have a reproducer here. I'm not convinced that this one actually exists because the functions that do the graph modifications (specficically bdrv_replace_child_noperm), automatically drain and undrain nodes as necessary. > I can reproduce problem 1) easily with iotest 030 by increasing > STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking > the iotest like in this example: > > https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html > > This patch keeps an additional reference to all block jobs between > block_job_pause_all() and block_job_resume_all(), guaranteeing that > they are kept alive. This might be okay as a stopgap solution if this is a real problem in practice because we don't have a better solution right now. However, I haven't seen any sign of there being an -rc4, so we're probably too late for 2.11 anyway. It's certainly not a full solution because keeping a reference to a block job does not prevent it from completing, but only from being freed. Most block jobs do graph modifications, including dropping the references to nodes, already when they complete, not only when they are freed. Now, while the above might have sounded like we have an easy solution in bdrv_reopen(), I see another problem that looks quite tough: 3) bdrv_reopen_queue() is a recursive operation that involves all children that inherited options. If the graph changes between the bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the calculated options (and today possibly permissions) don't actually match the graph any more. The requirement we really have is that between bdrv_reopen_queue() and bdrv_reopen_multiple() no graph changes are made. Calling bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late. Maybe that actually gives us another relatively simple solution: We need to push up the drain into the callers, and possibly just assert that the nodes are drained in the reopen functions. Kevin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs 2017-11-30 12:27 ` Kevin Wolf @ 2017-11-30 14:35 ` Alberto Garcia 2017-11-30 14:43 ` Kevin Wolf 0 siblings, 1 reply; 6+ messages in thread From: Alberto Garcia @ 2017-11-30 14:35 UTC (permalink / raw) To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote: >> Destroying a paused block job during bdrv_reopen_multiple() has two >> consequences: >> >> 1) The references to the nodes involved in the job are released, >> possibly destroying some of them. If those nodes were in the >> reopen queue this would trigger the problem originally described >> in commit 40840e419be, crashing QEMU. > > This specific problem could be avoided by making the BDS reference in > the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child() > and bdrv_unref() only at the end of bdrv_reopen_multiple(). That is correct. >> 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would >> not be doing all necessary bdrv_parent_drained_end() calls. > > If I understand correctly, you don't have a reproducer here. That's unfortunately not correct. You can use the very test case that I mentioned in the commit message: https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html With that one, QEMU master crashes easily because of problem (1). If I hold strong references in the reopen queue as you mentioned, the test case hangs because of problem (2). > It's certainly not a full solution because keeping a reference to a > block job does not prevent it from completing, but only from being > freed. Most block jobs do graph modifications, including dropping the > references to nodes, already when they complete, not only when they > are freed. Yes but the block job itself holds additional references (thanks to block_job_add_bdrv()). > Now, while the above might have sounded like we have an easy solution > in bdrv_reopen(), I see another problem that looks quite tough: > > 3) bdrv_reopen_queue() is a recursive operation that involves all > children that inherited options. If the graph changes between the > bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the > calculated options (and today possibly permissions) don't actually > match the graph any more. > > The requirement we really have is that between bdrv_reopen_queue() > and bdrv_reopen_multiple() no graph changes are made. Calling > bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late. Yes I agree that that should be the requirement. Berto ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs 2017-11-30 14:35 ` Alberto Garcia @ 2017-11-30 14:43 ` Kevin Wolf 2017-11-30 14:53 ` Jeff Cody 0 siblings, 1 reply; 6+ messages in thread From: Kevin Wolf @ 2017-11-30 14:43 UTC (permalink / raw) To: Alberto Garcia Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov Am 30.11.2017 um 15:35 hat Alberto Garcia geschrieben: > On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote: > > >> Destroying a paused block job during bdrv_reopen_multiple() has two > >> consequences: > >> > >> 1) The references to the nodes involved in the job are released, > >> possibly destroying some of them. If those nodes were in the > >> reopen queue this would trigger the problem originally described > >> in commit 40840e419be, crashing QEMU. > > > > This specific problem could be avoided by making the BDS reference in > > the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child() > > and bdrv_unref() only at the end of bdrv_reopen_multiple(). > > That is correct. > > >> 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would > >> not be doing all necessary bdrv_parent_drained_end() calls. > > > > If I understand correctly, you don't have a reproducer here. > > That's unfortunately not correct. > > You can use the very test case that I mentioned in the commit message: > > https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html > > With that one, QEMU master crashes easily because of problem (1). If I > hold strong references in the reopen queue as you mentioned, the test > case hangs because of problem (2). Ok, thanks. I'll try to play with this a bit myself later. > > It's certainly not a full solution because keeping a reference to a > > block job does not prevent it from completing, but only from being > > freed. Most block jobs do graph modifications, including dropping the > > references to nodes, already when they complete, not only when they > > are freed. > > Yes but the block job itself holds additional references (thanks to > block_job_add_bdrv()). Mirror and commit call block_job_remove_all_bdrv() during their completion. So yes, it does help for streaming, but not for all block jobs. Kevin ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs 2017-11-30 14:43 ` Kevin Wolf @ 2017-11-30 14:53 ` Jeff Cody 0 siblings, 0 replies; 6+ messages in thread From: Jeff Cody @ 2017-11-30 14:53 UTC (permalink / raw) To: Kevin Wolf Cc: Alberto Garcia, qemu-devel, qemu-block, Max Reitz, Anton Nefedov On Thu, Nov 30, 2017 at 03:43:35PM +0100, Kevin Wolf wrote: > Am 30.11.2017 um 15:35 hat Alberto Garcia geschrieben: > > On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote: > > > > >> Destroying a paused block job during bdrv_reopen_multiple() has two > > >> consequences: > > >> > > >> 1) The references to the nodes involved in the job are released, > > >> possibly destroying some of them. If those nodes were in the > > >> reopen queue this would trigger the problem originally described > > >> in commit 40840e419be, crashing QEMU. > > > > > > This specific problem could be avoided by making the BDS reference in > > > the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child() > > > and bdrv_unref() only at the end of bdrv_reopen_multiple(). > > > > That is correct. > > > > >> 2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would > > >> not be doing all necessary bdrv_parent_drained_end() calls. > > > > > > If I understand correctly, you don't have a reproducer here. > > > > That's unfortunately not correct. > > > > You can use the very test case that I mentioned in the commit message: > > > > https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html > > > > With that one, QEMU master crashes easily because of problem (1). If I > > hold strong references in the reopen queue as you mentioned, the test > > case hangs because of problem (2). > > Ok, thanks. I'll try to play with this a bit myself later. > Another data point: I'm able to reproduce that crash, by both increasing STREAM_BUFFER_SIZE as mentioned, and using the new test case, on -rc3. > > > It's certainly not a full solution because keeping a reference to a > > > block job does not prevent it from completing, but only from being > > > freed. Most block jobs do graph modifications, including dropping the > > > references to nodes, already when they complete, not only when they > > > are freed. > > > > Yes but the block job itself holds additional references (thanks to > > block_job_add_bdrv()). > > Mirror and commit call block_job_remove_all_bdrv() during their > completion. So yes, it does help for streaming, but not for all block > jobs. > > Kevin ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-11-30 14:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-11-29 17:56 [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs Alberto Garcia 2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia 2017-11-30 12:27 ` Kevin Wolf 2017-11-30 14:35 ` Alberto Garcia 2017-11-30 14:43 ` Kevin Wolf 2017-11-30 14:53 ` Jeff Cody
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).