From: Marc Zyngier <maz@kernel.org>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: Sean Christopherson <seanjc@google.com>,
kvmarm@lists.linux.dev, linux-arm-kernel@lists.infradead.org,
Joey Gouly <joey.gouly@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Zenghui Yu <yuzenghui@huawei.com>,
stable@vger.kernel.org, Alexander Potapenko <glider@google.com>
Subject: Re: [PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error
Date: Thu, 10 Oct 2024 13:47:05 +0100 [thread overview]
Message-ID: <86plo85dme.wl-maz@kernel.org> (raw)
In-Reply-To: <ZweUiHMC3RNwNXzY@linux.dev>
On Thu, 10 Oct 2024 09:47:04 +0100,
Oliver Upton <oliver.upton@linux.dev> wrote:
>
> On Thu, Oct 10, 2024 at 08:54:43AM +0100, Marc Zyngier wrote:
> > On Thu, 10 Oct 2024 00:27:46 +0100, Oliver Upton <oliver.upton@linux.dev> wrote:
> > > Then if we can't register the MMIO region for the distributor
> > > everything comes crashing down and a vCPU has made it into the KVM_RUN
> > > loop w/ the VGIC-shaped rug pulled out from under it. There's definitely
> > > another functional bug here where a vCPU's attempts to poke the
> > > distributor wind up reaching userspace as MMIO exits. But we can worry
> > > about that another day.
> >
> > I don't think that one is that bad. Userspace got us here, and they
> > now see an MMIO exit for something that it is not prepared to handle.
> > Suck it up and die (on a black size M t-shirt, please).
>
> LOL, I'll remember that.
>
> The situation I have in mind is a bit harder to blame on userspace,
> though. Supposing that the whole VM was set up correctly, multiple vCPUs
> entering KVM_RUN concurrently could cause this race and have 'unexpected'
> MMIO exits go out to userspace.
>
> vcpu-0 vcpu-1
> ====== ======
> kvm_vgic_map_resources()
> dist->ready = true
> mutex_unlock(config_lock)
> kvm_vgic_map_resources()
> if (vgic_ready())
> return 0
>
> < enter guest >
> typer = writel(0, GICD_CTLR)
>
> < data abort >
> kvm_io_bus_write(...) <= No GICD, out to userspace
>
> vgic_register_dist_iodev()
>
> A small but stupid window to race with.
Ah, gotcha. I guess getting rid of the early-out in
kvm_vgic_map_resources() would plug that one. Want to post a fix for
that?
>
> > > If memory serves, kvm_vgic_map_resources() used to do all of this behind
> > > the config_lock to cure the race, but that wound up inverting lock
> > > ordering on srcu.
> >
> > Probably something like that. We also used to hold the kvm lock, which
> > made everything much simpler, but awfully wrong.
> >
> > > Note to self: Impose strict ordering on GIC initialization v. vCPU
> > > creation if/when we get a new flavor of irqchip.
> >
> > One of the things we should have done when introducing GICv3 is to
> > impose that at KVM_DEV_ARM_VGIC_CTRL_INIT, the GIC memory map is
> > final. I remember some push-back on the QEMU side of things, as they
> > like to decouple things, but this has proved to be a nightmare.
>
> Pushing more of the initialization complexity into userspace feels like
> the right thing. Since we clearly have no idea what we're doing :)
KVM APIv2?
>
> > > The crappy assumption here is kvm_arch_vcpu_run_pid_change() and its
> > > callees are allowed to destroy VM-scoped structures in error handling.
> >
> > I think this is symptomatic of more general issue: we perform VM-wide
> > configuration in the context of a vcpu. We have tons of this stuff to
> > paper over the lack of a "this VM is fully configured" barrier.
> >
> > I wonder whether we could sidestep things by punting the finalisation
> > of the VM to a different context (workqueue?) and simply return
> > -EAGAIN or -EINTR to userspace while we're processing it. That doesn't
> > solve the "I'm missing parts of the address map and I'm going to die"
> > part though.
>
> Throwing it back at userspace would be nice, but unfortunately for ABI I
> think we need to block/spin vCPUs in the kernel til the VM is in fully
> working condition. A fragile userspace could explode for a 'spurious'
> EAGAIN/EINTR where there wasn't one before.
EINTR needs to be handled already, as this is how you report
preemption by a signal. But yeah, overall, I'm not enthralled with
much so far...
M.
--
Without deviation from the norm, progress is not possible.
next prev parent reply other threads:[~2024-10-10 12:47 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-09 18:36 [PATCH] KVM: arm64: Don't eagerly teardown the vgic on init error Marc Zyngier
2024-10-09 19:25 ` Oliver Upton
2024-10-09 19:36 ` Sean Christopherson
2024-10-09 23:27 ` Oliver Upton
2024-10-09 23:30 ` Oliver Upton
2024-10-10 7:54 ` Marc Zyngier
2024-10-10 8:47 ` Oliver Upton
2024-10-10 12:47 ` Marc Zyngier [this message]
2024-10-10 16:47 ` Oliver Upton
2024-10-11 13:20 ` Marc Zyngier
2024-10-24 16:12 ` Mark Brown
2024-10-24 18:05 ` Marc Zyngier
2024-10-25 10:54 ` Mark Brown
2024-10-25 12:18 ` Eric Auger
2024-10-25 12:59 ` Mark Brown
2024-10-25 13:05 ` Eric Auger
2024-10-25 13:05 ` Marc Zyngier
2024-10-25 13:43 ` Mark Brown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86plo85dme.wl-maz@kernel.org \
--to=maz@kernel.org \
--cc=glider@google.com \
--cc=joey.gouly@arm.com \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=oliver.upton@linux.dev \
--cc=seanjc@google.com \
--cc=stable@vger.kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=yuzenghui@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.