All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Kunkun Jiang <jiangkunkun@huawei.com>
Cc: Oliver Upton <oliver.upton@linux.dev>,
	Joey Gouly <joey.gouly@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	"moderated list:KERNEL VIRTUAL MACHINE FOR ARM64 (KVM/arm64)"
	<linux-arm-kernel@lists.infradead.org>,
	"open list:KERNEL VIRTUAL MACHINE FOR\ ARM64 (KVM/arm64)"
	<kvmarm@lists.linux.dev>,
	open list <linux-kernel@vger.kernel.org>,
	"wanghaibin.wang@huawei.com" <wanghaibin.wang@huawei.com>
Subject: Re: [Question] Received vtimer interrupt but ISTATUS is 0
Date: Tue, 21 Oct 2025 15:46:57 +0100	[thread overview]
Message-ID: <86a51kwbvi.wl-maz@kernel.org> (raw)
In-Reply-To: <f9a37a7d-2141-ee82-c7d6-23d8de9db2c1@huawei.com>

On Tue, 21 Oct 2025 14:38:26 +0100,
Kunkun Jiang <jiangkunkun@huawei.com> wrote:
> 
> Hi Marc,
> 
> On 2025/10/15 0:32, Marc Zyngier wrote:
> > On Tue, 14 Oct 2025 15:45:37 +0100,
> > Kunkun Jiang <jiangkunkun@huawei.com> wrote:
> >> 
> >> Hi all,
> >> 
> >> I'm having a very strange problem that can be simplified to a vtimer
> >> interrupt being received but ISTATUS is 0. Why dose this happen?
> >> According to analysis, it may be the timer condition is met and the
> >> interrupt is generated. Maybe some actions(cancel timer?) are done in
> >> the VM, ISTATUS becomes 0 and he hardware needs to clear the
> >> interrupt. But the clear command is sent too slowly, the OS has
> >> already read the ICC_IAR_EL1. So hypervisor executed
> >> kvm_arch_timer_handler but ISTATUS is 0.
> > 
> > If what you describe is accurate, and that the HW takes so long to
> > retire the timer interrupt that we cannot trust having taken an
> > interrupt, how long until we can trust that what we have is actually
> > correct?
> > 
> > Given that it takes a full exit from the guest before we can handle
> > the interrupt, I am rather puzzled that you observe this sort of bad
> > behaviours on modern HW. You either have an insanely fast CPU with a
> > very slow GIC, or a very bizarre machine (a bit like a ThunderX -- not
> > a compliment).
> I added dump_stack in the exception branch, and the following is the
> stack when the problem occurred.
> > [ 2669.521569] Call trace:
> > [ 2669.521577]  dump_backtrace+0x0/0x220
> > [ 2669.521579]  show_stack+0x20/0x2c
> > [ 2669.521583]  dump_stack+0xf0/0x138
> > [ 2669.521588]  kvm_arch_timer_handler+0x138/0x194
> > [ 2669.521592]  handle_percpu_devid_irq+0x90/0x1f4
> > [ 2669.521598]  __handle_domain_irq+0x84/0xfc
> > [ 2669.521600]  gic_handle_irq+0xfc/0x320
> > [ 2669.521601]  el1_irq+0xb8/0x140
> > [ 2669.521604]  kvm_arch_vcpu_ioctl_run+0x258/0x6fc
> > [ 2669.521607]  kvm_vcpu_ioctl+0x334/0xa94
> > [ 2669.521612]  __arm64_sys_ioctl+0xb0/0xf4
> > [ 2669.521614]  el0_svc_common.constprop.0+0x7c/0x1bc
> > [ 2669.521616]  do_el0_svc+0x2c/0xa4
> > [ 2669.521619]  el0_svc+0x20/0x30
> > [ 2669.521620]  el0_sync_handler+0xb0/0xb4
> > [ 2669.521621]  el0_sync+0x160/0x180By analyzing this stack, it should indeed take a full exit from the 
> guest.Do you think this is a hardware issue?

Of course this is a HW issue. Your GIC is slow to retire a pending
interrupt, you pay the consequences.

> > 
> > How does it work when context-switching from a vcpu that has a pending
> > timer interrupt to one that doesn't? Do you also see spurious
> > interrupts?
> I added a log under the 'if(!vcpu)' branch and tested it, but it did
> not go to this branch. In addition, I have set the vcpu to be bound to
> the core, and only one vcpu is running on one core.

Well, that's hardly testing the conditions I have outlined.

> > 
> >> The code flow is as follows:
> >> kvm_arch_timer_handler
> >>      ->if (kvm_timer_should_fire)
> >>          ->the value of SYS_CNTV_CTL is 0b001(ISTATUS=0,IMASK=0,ENABLE=1)
> >>      ->return IRQ_HANDLED
> >> 
> >> Because ISTATUS is 0, kvm_timer_update_irq will not be executed to
> >> inject this interrupt into the VM. Since EOImode is 1 and the vtimer
> >> interrupt has IRQD_FORWARDED_TO_VCPU flag, hypervisor will not write
> >> ICC_DIR_EL1 to deactivate the interrupt. This interrupt remains in
> >> active state, blocking subsequent interrupt from being
> >> process. Fortunately, in kvm_timer_vcpu_load it will be determined
> >> again whether an interrupt needs to be injected into the VM. But the
> >> delay will definitely increase.
> > 
> > Right, so you are at most a context switch away from your next
> > interrupt, just like in the !vcpu case. While not ideal, that's not
> > fatal.
> > 
> >> 
> >> What I want to discuss is the solution to this problem. My solution is
> >> to add a deactivation action:
> >> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> >> index dbd74e4885e2..46baba531d51 100644
> >> --- a/arch/arm64/kvm/arch_timer.c
> >> +++ b/arch/arm64/kvm/arch_timer.c
> >> @@ -228,8 +228,13 @@ static irqreturn_t kvm_arch_timer_handler(int
> >> irq, void *dev_id)
> >>          else
> >>                  ctx = map.direct_ptimer;
> >> 
> >> -       if (kvm_timer_should_fire(ctx))
> >> +       if (kvm_timer_should_fire(ctx)) {
> >>                  kvm_timer_update_irq(vcpu, true, ctx);
> >> +       } else {
> >> +               struct vgic_irq *irq;
> >> +               irq = vgic_get_vcpu_irq(vcpu, timer_irq(timer_ctx));
> >> +               gic_write_dir(irq->hwintid);
> >> +       }
> >> 
> >>          if (userspace_irqchip(vcpu->kvm) &&
> >>              !static_branch_unlikely(&has_gic_active_state))
> >> 
> >> If you have any new ideas or other solutions to this problem, please
> >> let me know.
> > 
> > That's not right.
> > 
> > For a start, this is GICv3 specific, and will break on everything
> > else. Also, why the round-trip via the vgic_irq when you already have
> > the interrupt number that has fired *as a parameter*?
> > 
> > Finally, this breaks with NV, as you could have switched between EL1
> > and EL2 timers, and since you cannot trust you are in the correct
> > interrupt context (interrupt firing out of context), you can't trust
> > irq->hwintid either, as the mappings will have changed.
> > 
> > Something like the patchlet below should do the trick, but I'm
> > definitely not happy about this sort of sorry hacks.
> > 
> > 	M.
> > 
> > diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> > index dbd74e4885e24..3db7c6bdffbc0 100644
> > --- a/arch/arm64/kvm/arch_timer.c
> > +++ b/arch/arm64/kvm/arch_timer.c
> > @@ -206,6 +206,13 @@ static void soft_timer_cancel(struct hrtimer *hrt)
> >   	hrtimer_cancel(hrt);
> >   }
> >   +static void set_timer_irq_phys_active(struct arch_timer_context
> > *ctx, bool active)
> > +{
> > +	int r;
> > +	r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> > +	WARN_ON(r);
> > +}
> > +
> >   static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >   {
> >   	struct kvm_vcpu *vcpu = *(struct kvm_vcpu **)dev_id;
> > @@ -230,6 +237,8 @@ static irqreturn_t kvm_arch_timer_handler(int irq, void *dev_id)
> >     	if (kvm_timer_should_fire(ctx))
> >   		kvm_timer_update_irq(vcpu, true, ctx);
> > +	else
> > +		set_timer_irq_phys_active(ctx, false);
> >     	if (userspace_irqchip(vcpu->kvm) &&
> >   	    !static_branch_unlikely(&has_gic_active_state))
> > @@ -659,13 +668,6 @@ static void timer_restore_state(struct arch_timer_context *ctx)
> >   	local_irq_restore(flags);
> >   }
> >   -static inline void set_timer_irq_phys_active(struct
> > arch_timer_context *ctx, bool active)
> > -{
> > -	int r;
> > -	r = irq_set_irqchip_state(ctx->host_timer_irq, IRQCHIP_STATE_ACTIVE, active);
> > -	WARN_ON(r);
> > -}
> > -
> >   static void kvm_timer_vcpu_load_gic(struct arch_timer_context *ctx)
> >   {
> >   	struct kvm_vcpu *vcpu = ctx->vcpu;
> > 
> After extensive testing, this patch was able to resolve the issue I
> encountered.
> Tested-by: Kunkun Jiang <jiangkunkun@huawei.com>

Just to be clear: a similar discussion took place over 5 years ago on
the same subject[1], and I was pretty clear about the conclusion.

There is no bug here. Only a slow HW implementation that leads to
suboptimal behaviours. There is no state loss, no lack of forward
progress, and the interrupts still get delivered in finite time. As
far as I am concerned, things work OK.

Thanks,

	M.

[1] https://lore.kernel.org/r/1595584037-6877-1-git-send-email-zhangshaokun@hisilicon.com

-- 
Without deviation from the norm, progress is not possible.

      reply	other threads:[~2025-10-21 14:47 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-14 14:45 [Question] Received vtimer interrupt but ISTATUS is 0 Kunkun Jiang
2025-10-14 16:32 ` Marc Zyngier
2025-10-21 13:38   ` Kunkun Jiang
2025-10-21 14:46     ` Marc Zyngier [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86a51kwbvi.wl-maz@kernel.org \
    --to=maz@kernel.org \
    --cc=catalin.marinas@arm.com \
    --cc=jiangkunkun@huawei.com \
    --cc=joey.gouly@arm.com \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=wanghaibin.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.