From: Kevin Wolf <kwolf@redhat.com>
To: Sergio Lopez <slp@redhat.com>
Cc: Fam Zheng <fam@euphon.net>,
Stefano Stabellini <sstabellini@kernel.org>,
qemu-block@nongnu.org, Paul Durrant <paul@xen.org>,
"Michael S. Tsirkin" <mst@redhat.com>,
qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Anthony Perard <anthony.perard@citrix.com>,
xen-devel@lists.xenproject.org
Subject: Re: [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore()
Date: Wed, 16 Dec 2020 13:35:14 +0100 [thread overview]
Message-ID: <20201216123514.GD7548@merkur.fritz.box> (raw)
In-Reply-To: <20201215172337.w7vcn2woze2ejgco@mhamilton>
[-- Attachment #1: Type: text/plain, Size: 6881 bytes --]
Am 15.12.2020 um 18:23 hat Sergio Lopez geschrieben:
> On Tue, Dec 15, 2020 at 04:01:19PM +0100, Kevin Wolf wrote:
> > Am 15.12.2020 um 14:15 hat Sergio Lopez geschrieben:
> > > On Tue, Dec 15, 2020 at 01:12:33PM +0100, Kevin Wolf wrote:
> > > > Am 14.12.2020 um 18:05 hat Sergio Lopez geschrieben:
> > > > > While processing the parents of a BDS, one of the parents may process
> > > > > the child that's doing the tail recursion, which leads to a BDS being
> > > > > processed twice. This is especially problematic for the aio_notifiers,
> > > > > as they might attempt to work on both the old and the new AIO
> > > > > contexts.
> > > > >
> > > > > To avoid this, add the BDS pointer to the ignore list, and check the
> > > > > child BDS pointer while iterating over the children.
> > > > >
> > > > > Signed-off-by: Sergio Lopez <slp@redhat.com>
> > > >
> > > > Ugh, so we get a mixed list of BdrvChild and BlockDriverState? :-/
> > >
> > > I know, it's effective but quite ugly...
> > >
> > > > What is the specific scenario where you saw this breaking? Did you have
> > > > multiple BdrvChild connections between two nodes so that we would go to
> > > > the parent node through one and then come back to the child node through
> > > > the other?
> > >
> > > I don't think this is a corner case. If the graph is walked top->down,
> > > there's no problem since children are added to the ignore list before
> > > getting processed, and siblings don't process each other. But, if the
> > > graph is walked bottom->up, a BDS will start processing its parents
> > > without adding itself to the ignore list, so there's nothing
> > > preventing them from processing it again.
> >
> > I don't understand. child is added to ignore before calling the parent
> > callback on it, so how can we come back through the same BdrvChild?
> >
> > QLIST_FOREACH(child, &bs->parents, next_parent) {
> > if (g_slist_find(*ignore, child)) {
> > continue;
> > }
> > assert(child->klass->set_aio_ctx);
> > *ignore = g_slist_prepend(*ignore, child);
> > child->klass->set_aio_ctx(child, new_context, ignore);
> > }
>
> Perhaps I'm missing something, but the way I understand it, that loop
> is adding the BdrvChild pointer of each of its parents, but not the
> BdrvChild pointer of the BDS that was passed as an argument to
> b_s_a_c_i.
Generally, the caller has already done that.
In the theoretical case that it was the outermost call in the recursion
and it hasn't (I couldn't find any such case), I think we should still
call the callback for the passed BdrvChild like we currently do.
> > You didn't dump the BdrvChild here. I think that would add some
> > information on why we re-entered 0x555ee2fbf660. Maybe you can also add
> > bs->drv->format_name for each node to make the scenario less abstract?
>
> I've generated another trace with more data:
>
> bs=0x565505e48030 (backup-top) enter
> bs=0x565505e48030 (backup-top) processing children
> bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e42090 (child->bs=0x565505e5d420)
> bs=0x565505e5d420 (qcow2) enter
> bs=0x565505e5d420 (qcow2) processing children
> bs=0x565505e5d420 (qcow2) calling bsaci child=0x565505e41ea0 (child->bs=0x565505e52060)
> bs=0x565505e52060 (file) enter
> bs=0x565505e52060 (file) processing children
> bs=0x565505e52060 (file) processing parents
> bs=0x565505e52060 (file) processing itself
> bs=0x565505e5d420 (qcow2) processing parents
> bs=0x565505e5d420 (qcow2) calling set_aio_ctx child=0x5655066a34d0
> bs=0x565505fbf660 (qcow2) enter
> bs=0x565505fbf660 (qcow2) processing children
> bs=0x565505fbf660 (qcow2) calling bsaci child=0x565505e41d20 (child->bs=0x565506bc0c00)
> bs=0x565506bc0c00 (file) enter
> bs=0x565506bc0c00 (file) processing children
> bs=0x565506bc0c00 (file) processing parents
> bs=0x565506bc0c00 (file) processing itself
> bs=0x565505fbf660 (qcow2) processing parents
> bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x565505fc7aa0
> bs=0x565505fbf660 (qcow2) calling set_aio_ctx child=0x5655068b8510
> bs=0x565505e48030 (backup-top) enter
> bs=0x565505e48030 (backup-top) processing children
> bs=0x565505e48030 (backup-top) calling bsaci child=0x565505e3c450 (child->bs=0x565505fbf660)
> bs=0x565505fbf660 (qcow2) enter
> bs=0x565505fbf660 (qcow2) processing children
> bs=0x565505fbf660 (qcow2) processing parents
> bs=0x565505fbf660 (qcow2) processing itself
> bs=0x565505e48030 (backup-top) processing parents
> bs=0x565505e48030 (backup-top) calling set_aio_ctx child=0x565505e402d0
> bs=0x565505e48030 (backup-top) processing itself
> bs=0x565505fbf660 (qcow2) processing itself
Hm, is this complete? Is see no "processing itself" for
bs=0x565505e5d420. Or is this because it crashed before getting there?
Anyway, trying to reconstruct the block graph with BdrvChild pointers
annotated at the edges:
BlockBackend
|
v
backup-top ------------------------+
| | |
| +-----------------------+ |
| 0x5655068b8510 | | 0x565505e3c450
| | |
| 0x565505e42090 | |
v | |
qcow2 ---------------------+ | |
| | | |
| 0x565505e52060 | | | ??? [1]
| | | | |
v 0x5655066a34d0 | | | | 0x565505fc7aa0
file v v v v
qcow2 (backing)
|
| 0x565505e41d20
v
file
[1] This seems to be a BdrvChild with a non-BDS parent. Probably a
BdrvChild directly owned by the backup job.
> So it seems this is happening:
>
> backup-top (5e48030) <---------| (5)
> | | |
> | | (6) ------------> qcow2 (5fbf660)
> | ^ |
> | (3) | | (4)
> |-> (1) qcow2 (5e5d420) ----- |-> file (6bc0c00)
> |
> |-> (2) file (5e52060)
>
> backup-top (5e48030), the BDS that was passed as argument in the first
> bdrv_set_aio_context_ignore() call, is re-entered when qcow2 (5fbf660)
> is processing its parents, and the latter is also re-entered when the
> first one starts processing its children again.
Yes, but look at the BdrvChild pointers, it is through different edges
that we come back to the same node. No BdrvChild is used twice.
If backup-top had added all of its children to the ignore list before
calling into the overlay qcow2, the backing qcow2 wouldn't eventually
have called back into backup-top.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2020-12-16 12:37 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-12-14 17:05 [PATCH v2 0/4] nbd/server: Quiesce coroutines on context switch Sergio Lopez
2020-12-14 17:05 ` [PATCH v2 1/4] block: Honor blk_set_aio_context() context requirements Sergio Lopez
2020-12-15 11:58 ` Kevin Wolf
2020-12-14 17:05 ` [PATCH v2 2/4] block: Avoid processing BDS twice in bdrv_set_aio_context_ignore() Sergio Lopez
2020-12-15 12:12 ` Kevin Wolf
2020-12-15 13:15 ` Sergio Lopez
2020-12-15 15:01 ` Kevin Wolf
2020-12-15 17:23 ` Sergio Lopez
2020-12-16 12:35 ` Kevin Wolf [this message]
2020-12-16 14:55 ` Sergio Lopez
2020-12-16 18:31 ` Kevin Wolf
2020-12-17 9:37 ` Sergio Lopez
2020-12-17 10:58 ` Kevin Wolf
2020-12-17 12:50 ` Vladimir Sementsov-Ogievskiy
2020-12-17 13:06 ` Kevin Wolf
2020-12-17 13:27 ` Sergio Lopez
2020-12-17 14:01 ` Vladimir Sementsov-Ogievskiy
2020-12-17 13:09 ` Sergio Lopez
2020-12-14 17:05 ` [PATCH v2 3/4] nbd/server: Quiesce coroutines on context switch Sergio Lopez
2020-12-14 17:05 ` [PATCH v2 4/4] block: Close block exports in two steps Sergio Lopez
2020-12-15 15:34 ` Kevin Wolf
2020-12-15 17:26 ` Sergio Lopez
2020-12-21 17:07 ` Sergio Lopez
2021-01-20 20:49 ` [PATCH v2 0/4] nbd/server: Quiesce coroutines on context switch Eric Blake
2021-01-21 5:57 ` Sergio Lopez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201216123514.GD7548@merkur.fritz.box \
--to=kwolf@redhat.com \
--cc=anthony.perard@citrix.com \
--cc=fam@euphon.net \
--cc=mreitz@redhat.com \
--cc=mst@redhat.com \
--cc=paul@xen.org \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=slp@redhat.com \
--cc=sstabellini@kernel.org \
--cc=stefanha@redhat.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).