qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: Fam Zheng <famz@redhat.com>
Cc: Qemu-block <qemu-block@nongnu.org>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration)
Date: Fri, 10 Nov 2017 18:48:53 +0100	[thread overview]
Message-ID: <46b65050-d4f1-496c-abff-4228f57aaa2b@redhat.com> (raw)
In-Reply-To: <20171110023604.GA4849@lemon>

[-- Attachment #1: Type: text/plain, Size: 2103 bytes --]

On 2017-11-10 03:36, Fam Zheng wrote:
> On Thu, 11/09 20:31, Max Reitz wrote:
>> On 2017-11-09 16:30, Fam Zheng wrote:
>>> On Thu, 11/09 16:14, Max Reitz wrote:

[...]

>>>> *sigh*
>>>>
>>>> OK, I'll look into it...
>>>
>>> OK, I'll let you.. Just one more thing: could it relate to the use-after-free
>>> bug reported on block_job_defer_to_main_loop()?
>>>
>>> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01144.html
>>
>> Thanks for the heads-up; I think it's a different issue, though.
>>
>> What appears to be happening is that the mirror job completes and then
>> drains its BDS.  While that is happening, a bdrv_drain_all() comes in
>> from block_migration_cleanup().
>>
>> That now tries to drain the mirror node.  However, that node cannot be
>> drained until the job is truly gone now, so that is what's happening:
>> mirror_exit() is called, it cleans up, destroys the mirror node, and
>> returns.
>>
>> Now bdrv_drain_all() can go on, specifically the BDRV_POLL_WHILE() on
>> the mirror node.  However, oops, that node is gone now...  So that's
>> where the issue seems to be. :-/
>>
>> Maybe all that we need to do is wrap the bdrv_drain_recurse() call in
>> bdrv_drain_all_begin() in a bdrv_ref()/bdrv_unref() pair?  Having run
>> 194 for a couple of minutes, that seems to indeed work -- until it dies
>> because of an invalid BB pointer in bdrv_next().  I guess that is
>> because bdrv_next() does not guard against deleted BDSs.
>>
>> Copying all BDS into an own list (in both bdrv_drain_all_begin() and
>> bdrv_drain_all_end()), with a strong reference to every single one, and
>> then draining them really seems to work, though.  (Survived 9000
>> iterations, that seems good enough for something that usually fails
>> after, like, 5.)
> 
> Yes, that makes sense. I'm curious if the patch in
> 
> https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg01649.html
> 
> would also work?

No, unfortunately it did not.

(Or maybe fortunately so, since that means I didn't do a whole lot of
work for nothing :-))

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 512 bytes --]

      reply	other threads:[~2017-11-10 17:49 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-09  0:48 [Qemu-devel] Intermittent hang of iotest 194 (bdrv_drain_all after non-shared storage migration) Max Reitz
2017-11-09  4:21 ` Fam Zheng
2017-11-09 15:14   ` Max Reitz
2017-11-09 15:30     ` Fam Zheng
2017-11-09 19:31       ` Max Reitz
2017-11-10  2:36         ` Fam Zheng
2017-11-10 17:48           ` Max Reitz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46b65050-d4f1-496c-abff-4228f57aaa2b@redhat.com \
    --to=mreitz@redhat.com \
    --cc=famz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).