From: "Jürgen Groß" <jgross@suse.com>
To: "Marek Marczykowski-Górecki" <marmarek@invisiblethingslab.com>,
xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: race condition when re-connecting vif after backend died
Date: Wed, 8 Oct 2025 14:32:02 +0200 [thread overview]
Message-ID: <66d8febb-568b-40db-bbe3-d8dfdc43444c@suse.com> (raw)
In-Reply-To: <aOZJhD6_F_ceHoCb@mail-itl>
[-- Attachment #1.1.1: Type: text/plain, Size: 2985 bytes --]
On 08.10.25 13:22, Marek Marczykowski-Górecki wrote:
> Hi,
>
> I have the following scenario:
> 1. Start backend domain (call it netvm1)
> 2. Start frontend domain (call it vm1), with
> vif=['backend=netvm2,mac=00:16:3e:5e:6c:00,script=vif-route-qubes,ip=10.138.17.244']
> 3. Pause vm1 (not strictly required, but makes reproducing much easier)
> 5. Crash/shutdown/destroy netvm1
> 4. Start another backend domain (call it netvm2)
> 5. In quick succession:
> 5.1. unpause vm1
> 5.2. detach (or actually cleanup) vif from vm1 (connected to now dead
> netvm1)
> 5.3. attach similar vif with backend=netvm2
>
> Sometimes it ends up with eth0 being present in vm1, but its xenstore
> state key is still XenbusStateInitializing. And the backend state is at
> XenbusStateInitWait.
> In step 5.2, normally libxl waits for the backend to transition to state
> XenbusStateClosed, and IIUC backend waits for the frontend to do the
> same too. But when the backend is gone, libxl seems to simply removes
> frontend xenstore entries without any coordination with the frontend
> domain itself.
> What I suspect happens is that xenstore events generated at 5.2 are
> getting handled by the frontend's kernel only after 5.3. At this stage,
> frontend sees device that was is XenbusStateConnected transitioning to
> XenbusStateInitializing (not really expected by the frontend to somebody
> else change its state key) and (I guess) doesn't notice device vanished
> for a moment (xenbus_dev_changed() doesn't hit the !exists path). I
> haven't verified it, but I guess it also doesn't notice backend path
> change, so it's still watching the old one (gone at this point).
>
> If my diagnosis is correct, what should be the solution here? Add
> handling for XenbusStateUnknown in xen-netfrontc.c:netback_changed()? If
> so, it should probably carefully cleanup the old device while not
> touching xenstore entries (which belong to the new instance already) and
> then re-initialize the device (xennet_connect()? call).
> Or maybe it should be done in generic way in xenbus_probe.c, in
> xenbus_dev_changed()? Not sure how exactly - maybe by checking if
> backend path (or just backend-id?) changed? And then call both
> device_unregister() (again, being careful to not change xenstore,
> especially not set XenbusStateClosed) and then xenbus_probe_node()?
>
I think we need to know what is going on here.
Can you repeat the test with Xenstore tracing enabled? Just do:
xenstore-control logfile /tmp/xs-trace
before point 3. in your list above and then perform steps 3. - 5.3. and
then send the logfile. Please make sure not to have any additional actions
causing Xenstore traffic in between, as this would make it much harder to
analyze the log.
In case the problem doesn't appear, please delete the logfile before
starting a new attempt (xenstored is appending new trace data to an
existing file).
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
next prev parent reply other threads:[~2025-10-08 12:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-08 11:22 race condition when re-connecting vif after backend died Marek Marczykowski-Górecki
2025-10-08 12:32 ` Jürgen Groß [this message]
2025-10-08 14:04 ` Marek Marczykowski-Górecki
2025-11-02 3:19 ` Marek Marczykowski-Górecki
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=66d8febb-568b-40db-bbe3-d8dfdc43444c@suse.com \
--to=jgross@suse.com \
--cc=marmarek@invisiblethingslab.com \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).