[Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
@ 2017-11-29 17:56 Alberto Garcia
  2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia
  0 siblings, 1 reply; 6+ messages in thread
From: Alberto Garcia @ 2017-11-29 17:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, qemu-block, Kevin Wolf, Max Reitz, Jeff Cody,
	Anton Nefedov

Hi,

this patch fixes the crash reported by Anton Nefedov here:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00159.html

I can reproduce it easily with the change he mentions there, or by
tweaking iotest 030 as I show here:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html

I'm not convinced that this is the best solution, though. As Fam says
the block layer is getting complex and I think this can be solved in a
different way if the code is properly rewritten. Even with this
solution I think it would make sense to assert that the block job's
pause count is always 0 when the job is about to be destroyed and
perhaps keep a reference while it's being paused. But that's a bigger
change and we're close to the release so I opted for this more
conservative approach.

Regards,

Berto

Alberto Garcia (1):
  blockjob: Make block_job_pause_all() keep a reference to the jobs

 blockjob.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

-- 
2.11.0

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
  2017-11-29 17:56 [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs Alberto Garcia
@ 2017-11-29 17:56 ` Alberto Garcia
  2017-11-30 12:27   ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Alberto Garcia @ 2017-11-29 17:56 UTC (permalink / raw)
  To: qemu-devel
  Cc: Alberto Garcia, qemu-block, Kevin Wolf, Max Reitz, Jeff Cody,
	Anton Nefedov

Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
pausing all block jobs during bdrv_reopen_multiple() to prevent any of
them from finishing and removing nodes from the graph while they are
being reopened.

It turns out that pausing a block job doesn't necessarily prevent it
from finishing: a paused block job can still run its exit function
from the main loop and call block_job_completed(). The mirror block
job in particular always goes to the main loop while it is paused (by
virtue of the bdrv_drained_begin() call in mirror_run()).

Destroying a paused block job during bdrv_reopen_multiple() has two
consequences:

   1) The references to the nodes involved in the job are released,
      possibly destroying some of them. If those nodes were in the
      reopen queue this would trigger the problem originally described
      in commit 40840e419be, crashing QEMU.

   2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
      not be doing all necessary bdrv_parent_drained_end() calls.

I can reproduce problem 1) easily with iotest 030 by increasing
STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
the iotest like in this example:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html

This patch keeps an additional reference to all block jobs between
block_job_pause_all() and block_job_resume_all(), guaranteeing that
they are kept alive.

Signed-off-by: Alberto Garcia <berto@igalia.com>
---
 blockjob.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 0ed50b953b..715c2c2680 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -730,6 +730,7 @@ void block_job_pause_all(void)
         AioContext *aio_context = blk_get_aio_context(job->blk);
 
         aio_context_acquire(aio_context);
+        block_job_ref(job);
         block_job_pause(job);
         aio_context_release(aio_context);
     }
@@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
 
 void block_job_resume_all(void)
 {
-    BlockJob *job = NULL;
-    while ((job = block_job_next(job))) {
+    BlockJob *job, *next;
+
+    QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) {
         AioContext *aio_context = blk_get_aio_context(job->blk);
 
         aio_context_acquire(aio_context);
         block_job_resume(job);
+        block_job_unref(job);
         aio_context_release(aio_context);
     }
 }
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
  2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia
@ 2017-11-30 12:27   ` Kevin Wolf
  2017-11-30 14:35     ` Alberto Garcia
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2017-11-30 12:27 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov

Am 29.11.2017 um 18:56 hat Alberto Garcia geschrieben:
> Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
> pausing all block jobs during bdrv_reopen_multiple() to prevent any of
> them from finishing and removing nodes from the graph while they are
> being reopened.
> 
> It turns out that pausing a block job doesn't necessarily prevent it
> from finishing: a paused block job can still run its exit function
> from the main loop and call block_job_completed(). The mirror block
> job in particular always goes to the main loop while it is paused (by
> virtue of the bdrv_drained_begin() call in mirror_run()).
> 
> Destroying a paused block job during bdrv_reopen_multiple() has two
> consequences:
> 
>    1) The references to the nodes involved in the job are released,
>       possibly destroying some of them. If those nodes were in the
>       reopen queue this would trigger the problem originally described
>       in commit 40840e419be, crashing QEMU.

This specific problem could be avoided by making the BDS reference in
the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
and bdrv_unref() only at the end of bdrv_reopen_multiple().

>    2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
>       not be doing all necessary bdrv_parent_drained_end() calls.

If I understand correctly, you don't have a reproducer here. I'm not
convinced that this one actually exists because the functions that do
the graph modifications (specficically bdrv_replace_child_noperm),
automatically drain and undrain nodes as necessary.

> I can reproduce problem 1) easily with iotest 030 by increasing
> STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
> the iotest like in this example:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
> 
> This patch keeps an additional reference to all block jobs between
> block_job_pause_all() and block_job_resume_all(), guaranteeing that
> they are kept alive.

This might be okay as a stopgap solution if this is a real problem in
practice because we don't have a better solution right now. However,
I haven't seen any sign of there being an -rc4, so we're probably too
late for 2.11 anyway.

It's certainly not a full solution because keeping a reference to a
block job does not prevent it from completing, but only from being
freed. Most block jobs do graph modifications, including dropping the
references to nodes, already when they complete, not only when they are
freed.

Now, while the above might have sounded like we have an easy solution in
bdrv_reopen(), I see another problem that looks quite tough:

3) bdrv_reopen_queue() is a recursive operation that involves all
   children that inherited options. If the graph changes between the
   bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the
   calculated options (and today possibly permissions) don't actually
   match the graph any more.

   The requirement we really have is that between bdrv_reopen_queue()
   and bdrv_reopen_multiple() no graph changes are made. Calling
   bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late.

Maybe that actually gives us another relatively simple solution: We need
to push up the drain into the callers, and possibly just assert that the
nodes are drained in the reopen functions.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
  2017-11-30 12:27   ` Kevin Wolf
@ 2017-11-30 14:35     ` Alberto Garcia
  2017-11-30 14:43       ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Alberto Garcia @ 2017-11-30 14:35 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov

On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote:

>> Destroying a paused block job during bdrv_reopen_multiple() has two
>> consequences:
>> 
>>    1) The references to the nodes involved in the job are released,
>>       possibly destroying some of them. If those nodes were in the
>>       reopen queue this would trigger the problem originally described
>>       in commit 40840e419be, crashing QEMU.
>
> This specific problem could be avoided by making the BDS reference in
> the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
> and bdrv_unref() only at the end of bdrv_reopen_multiple().

That is correct.

>>    2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
>>       not be doing all necessary bdrv_parent_drained_end() calls.
>
> If I understand correctly, you don't have a reproducer here.

That's unfortunately not correct.

You can use the very test case that I mentioned in the commit message:

   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html

With that one, QEMU master crashes easily because of problem (1). If I
hold strong references in the reopen queue as you mentioned, the test
case hangs because of problem (2).

> It's certainly not a full solution because keeping a reference to a
> block job does not prevent it from completing, but only from being
> freed. Most block jobs do graph modifications, including dropping the
> references to nodes, already when they complete, not only when they
> are freed.

Yes but the block job itself holds additional references (thanks to
block_job_add_bdrv()).

> Now, while the above might have sounded like we have an easy solution
> in bdrv_reopen(), I see another problem that looks quite tough:
>
> 3) bdrv_reopen_queue() is a recursive operation that involves all
>    children that inherited options. If the graph changes between the
>    bdrv_reopen_queue() call and the end of bdrv_reopen_multiple(), the
>    calculated options (and today possibly permissions) don't actually
>    match the graph any more.
>
>    The requirement we really have is that between bdrv_reopen_queue()
>    and bdrv_reopen_multiple() no graph changes are made. Calling
>    bdrv_drain_all_begin() in bdrv_reopen_multiple() is too late.

Yes I agree that that should be the requirement.

Berto

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
  2017-11-30 14:35     ` Alberto Garcia
@ 2017-11-30 14:43       ` Kevin Wolf
  2017-11-30 14:53         ` Jeff Cody
  0 siblings, 1 reply; 6+ messages in thread
From: Kevin Wolf @ 2017-11-30 14:43 UTC (permalink / raw)
  To: Alberto Garcia
  Cc: qemu-devel, qemu-block, Max Reitz, Jeff Cody, Anton Nefedov

Am 30.11.2017 um 15:35 hat Alberto Garcia geschrieben:
> On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote:
> 
> >> Destroying a paused block job during bdrv_reopen_multiple() has two
> >> consequences:
> >> 
> >>    1) The references to the nodes involved in the job are released,
> >>       possibly destroying some of them. If those nodes were in the
> >>       reopen queue this would trigger the problem originally described
> >>       in commit 40840e419be, crashing QEMU.
> >
> > This specific problem could be avoided by making the BDS reference in
> > the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
> > and bdrv_unref() only at the end of bdrv_reopen_multiple().
> 
> That is correct.
> 
> >>    2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
> >>       not be doing all necessary bdrv_parent_drained_end() calls.
> >
> > If I understand correctly, you don't have a reproducer here.
> 
> That's unfortunately not correct.
> 
> You can use the very test case that I mentioned in the commit message:
> 
>    https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
> 
> With that one, QEMU master crashes easily because of problem (1). If I
> hold strong references in the reopen queue as you mentioned, the test
> case hangs because of problem (2).

Ok, thanks. I'll try to play with this a bit myself later.

> > It's certainly not a full solution because keeping a reference to a
> > block job does not prevent it from completing, but only from being
> > freed. Most block jobs do graph modifications, including dropping the
> > references to nodes, already when they complete, not only when they
> > are freed.
> 
> Yes but the block job itself holds additional references (thanks to
> block_job_add_bdrv()).

Mirror and commit call block_job_remove_all_bdrv() during their
completion. So yes, it does help for streaming, but not for all block
jobs.

Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH for-2.11 1/1] blockjob: Make block_job_pause_all() keep a reference to the jobs
  2017-11-30 14:43       ` Kevin Wolf
@ 2017-11-30 14:53         ` Jeff Cody
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Cody @ 2017-11-30 14:53 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Alberto Garcia, qemu-devel, qemu-block, Max Reitz, Anton Nefedov

On Thu, Nov 30, 2017 at 03:43:35PM +0100, Kevin Wolf wrote:
> Am 30.11.2017 um 15:35 hat Alberto Garcia geschrieben:
> > On Thu 30 Nov 2017 01:27:32 PM CET, Kevin Wolf wrote:
> > 
> > >> Destroying a paused block job during bdrv_reopen_multiple() has two
> > >> consequences:
> > >> 
> > >>    1) The references to the nodes involved in the job are released,
> > >>       possibly destroying some of them. If those nodes were in the
> > >>       reopen queue this would trigger the problem originally described
> > >>       in commit 40840e419be, crashing QEMU.
> > >
> > > This specific problem could be avoided by making the BDS reference in
> > > the reopen queue strong, i.e. bdrv_ref() in bdrv_reopen_queue_child()
> > > and bdrv_unref() only at the end of bdrv_reopen_multiple().
> > 
> > That is correct.
> > 
> > >>    2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
> > >>       not be doing all necessary bdrv_parent_drained_end() calls.
> > >
> > > If I understand correctly, you don't have a reproducer here.
> > 
> > That's unfortunately not correct.
> > 
> > You can use the very test case that I mentioned in the commit message:
> > 
> >    https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
> > 
> > With that one, QEMU master crashes easily because of problem (1). If I
> > hold strong references in the reopen queue as you mentioned, the test
> > case hangs because of problem (2).
> 
> Ok, thanks. I'll try to play with this a bit myself later.
> 

Another data point: I'm able to reproduce that crash, by both increasing
STREAM_BUFFER_SIZE as mentioned, and using the new test case, on -rc3.


> > > It's certainly not a full solution because keeping a reference to a
> > > block job does not prevent it from completing, but only from being
> > > freed. Most block jobs do graph modifications, including dropping the
> > > references to nodes, already when they complete, not only when they
> > > are freed.
> > 
> > Yes but the block job itself holds additional references (thanks to
> > block_job_add_bdrv()).
> 
> Mirror and commit call block_job_remove_all_bdrv() during their
> completion. So yes, it does help for streaming, but not for all block
> jobs.
> 
> Kevin

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-11-30 14:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-11-29 17:56 [Qemu-devel] [PATCH for-2.11 0/1] blockjob: Make block_job_pause_all() keep a reference to the jobs Alberto Garcia
2017-11-29 17:56 ` [Qemu-devel] [PATCH for-2.11 1/1] " Alberto Garcia
2017-11-30 12:27   ` Kevin Wolf
2017-11-30 14:35     ` Alberto Garcia
2017-11-30 14:43       ` Kevin Wolf
2017-11-30 14:53         ` Jeff Cody

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).