All of lore.kernel.org
 help / color / mirror / Atom feed
* randomly firing kvm_arch_timer_handler
@ 2020-06-04 11:21 Janne Karhunen
  2020-06-04 12:37 ` Marc Zyngier
  0 siblings, 1 reply; 4+ messages in thread
From: Janne Karhunen @ 2020-06-04 11:21 UTC (permalink / raw)
  To: kvmarm

Hi,

I have an issue on one particular hardware with gicv3 and the old
stable (4.9.22x) kernel where the timer interrupt randomly leaks to
the host after the guest exit and the kvm_arch_timer_handler gets
triggered. The guest does run, but the whine (unexpected interrupt) is
annoying and it seems to be hindering the performance drastically - of
both the host and the guest. This behavior can even lead to the host
watchdog biting as the firing timer prevents the progress, especially
during the very early boot when the guest is doing heavy paging
anyway.

Based on the comment in the latest code the fundamental issue is that
the interrupt controller does not recognize the timer disable fast
enough on guest exit. Has anyone worked on a proper fix for the issue
for the old stable series? 5+ kernels seem to have quite a drastic
rework on this front. Plain 'isb' did not seem to do the trick after
the timer disable...


--
Janne
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: randomly firing kvm_arch_timer_handler
  2020-06-04 11:21 randomly firing kvm_arch_timer_handler Janne Karhunen
@ 2020-06-04 12:37 ` Marc Zyngier
  2020-06-04 13:14   ` Janne Karhunen
  2020-06-09  5:49   ` Janne Karhunen
  0 siblings, 2 replies; 4+ messages in thread
From: Marc Zyngier @ 2020-06-04 12:37 UTC (permalink / raw)
  To: Janne Karhunen; +Cc: kvmarm

Hi Janne,

On 2020-06-04 12:21, Janne Karhunen wrote:
> Hi,
> 
> I have an issue on one particular hardware with gicv3 and the old
> stable (4.9.22x) kernel where the timer interrupt randomly leaks to
> the host after the guest exit and the kvm_arch_timer_handler gets
> triggered. The guest does run, but the whine (unexpected interrupt) is
> annoying and it seems to be hindering the performance drastically - of
> both the host and the guest. This behavior can even lead to the host
> watchdog biting as the firing timer prevents the progress, especially
> during the very early boot when the guest is doing heavy paging
> anyway.

The only system I witnessed this was a Cavium TX1. It seems incredibly
bad at retiring an interrupt that has been masked at the source.
Which hardware is that?

> Based on the comment in the latest code the fundamental issue is that
> the interrupt controller does not recognize the timer disable fast
> enough on guest exit.
> 
> Has anyone worked on a proper fix for the issue
> for the old stable series? 5+ kernels seem to have quite a drastic
> rework on this front. Plain 'isb' did not seem to do the trick after
> the timer disable...

ISB really has no bearing on how an interrupt gets retired from the
redistributor. The flow we use on mainline these days makes it less
susceptible to this kind of brokenness, but it is still possible to
hit it.

In general, 4.9 is getting pretty old, and only gets things like
security fixes. Quality of emulation issues are definitely not
getting backported.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: randomly firing kvm_arch_timer_handler
  2020-06-04 12:37 ` Marc Zyngier
@ 2020-06-04 13:14   ` Janne Karhunen
  2020-06-09  5:49   ` Janne Karhunen
  1 sibling, 0 replies; 4+ messages in thread
From: Janne Karhunen @ 2020-06-04 13:14 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm

On Thu, Jun 4, 2020 at 3:37 PM Marc Zyngier <maz@kernel.org> wrote:

> > triggered. The guest does run, but the whine (unexpected interrupt) is
> > annoying and it seems to be hindering the performance drastically - of
> > both the host and the guest. This behavior can even lead to the host
> > watchdog biting as the firing timer prevents the progress, especially
> > during the very early boot when the guest is doing heavy paging
> > anyway.
>
> The only system I witnessed this was a Cavium TX1. It seems incredibly
> bad at retiring an interrupt that has been masked at the source.
> Which hardware is that?

Unreleased one


> > Based on the comment in the latest code the fundamental issue is that
> > the interrupt controller does not recognize the timer disable fast
> > enough on guest exit.
> >
> > Has anyone worked on a proper fix for the issue
> > for the old stable series? 5+ kernels seem to have quite a drastic
> > rework on this front. Plain 'isb' did not seem to do the trick after
> > the timer disable...
>
> ISB really has no bearing on how an interrupt gets retired from the
> redistributor. The flow we use on mainline these days makes it less
> susceptible to this kind of brokenness, but it is still possible to
> hit it.

Thanks. I'll try to see if I can backport some of that.

That said, if you remember my previous email, we got the kernel
external hypervisor going. I have multiple kernel versions running OK
(4.9 ... 5.4) under the external hypervisor. It will still need some
polishing rounds and security related work, but in general it seems to
run fine. 95%+ of the kvmarm code is intact and I suppose if this kind
of thing is of interest to others, we could present the work and
potentially some of the hooks required in the kvm code to make it all
run for others, too. The guest and the host can both be VMs and the
guests can be just untouchable holes in the host memory. Kind of like
AMD SEV, just no encryption needed.


--
Janne
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: randomly firing kvm_arch_timer_handler
  2020-06-04 12:37 ` Marc Zyngier
  2020-06-04 13:14   ` Janne Karhunen
@ 2020-06-09  5:49   ` Janne Karhunen
  1 sibling, 0 replies; 4+ messages in thread
From: Janne Karhunen @ 2020-06-09  5:49 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: kvmarm

On Thu, Jun 4, 2020 at 3:37 PM Marc Zyngier <maz@kernel.org> wrote:

> > I have an issue on one particular hardware with gicv3 and the old
> > stable (4.9.22x) kernel where the timer interrupt randomly leaks to
> > the host after the guest exit and the kvm_arch_timer_handler gets
> > triggered. The guest does run, but the whine (unexpected interrupt) is
> > annoying and it seems to be hindering the performance drastically - of
> > both the host and the guest. This behavior can even lead to the host
> > watchdog biting as the firing timer prevents the progress, especially
> > during the very early boot when the guest is doing heavy paging
> > anyway.
>
> The only system I witnessed this was a Cavium TX1. It seems incredibly
> bad at retiring an interrupt that has been masked at the source.

Looks like something like this works as a workaround. On guest exit
lift the timer value prior to turning it off (setting happens
atomically even if the timer retire does not) and then backport the
irq disable logic from the newer kernels into the host irq handler:

+       if (kvm_timer_should_fire(vcpu))
+               kvm_timer_update_irq(vcpu, true);
+
+       timer = &vcpu->arch.timer_cpu;
+       if (timer->irq.level)
+               disable_percpu_irq(irq);
+       else
+               enable_percpu_irq(irq, 0);

Now the hits are still happening, but very, very rarely.


--
Janne
_______________________________________________
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-06-09  5:50 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-04 11:21 randomly firing kvm_arch_timer_handler Janne Karhunen
2020-06-04 12:37 ` Marc Zyngier
2020-06-04 13:14   ` Janne Karhunen
2020-06-09  5:49   ` Janne Karhunen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.