Re: [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Oleksii Kurochko <oleksii.kurochko@gmail.com>
Subject: Re: [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ
Date: Fri, 17 Oct 2025 10:20:41 +0200	[thread overview]
Message-ID: <aPH8Waqi5hJyCuzO@Mac.lan> (raw)
In-Reply-To: <39f00b12-a3f7-4185-a8fa-2c99c43695d9@suse.com>

On Fri, Oct 17, 2025 at 09:15:08AM +0200, Jan Beulich wrote:
> On 16.10.2025 18:27, Roger Pau Monné wrote:
> > On Thu, Oct 16, 2025 at 09:32:04AM +0200, Jan Beulich wrote:
> >> @@ -307,15 +309,13 @@ static void cf_check hpet_msi_set_affini
> >>      struct hpet_event_channel *ch = desc->action->dev_id;
> >>      struct msi_msg msg = ch->msi.msg;
> >>  
> >> -    msg.dest32 = set_desc_affinity(desc, mask);
> >> -    if ( msg.dest32 == BAD_APICID )
> >> -        return;
> >> +    /* This really is only for dump_irqs(). */
> >> +    cpumask_copy(desc->arch.cpu_mask, mask);
> > 
> > If you no longer call set_desc_affinity(), could you adjust the second
> > parameter of hpet_msi_set_affinity() to be unsigned int cpu instead of
> > a cpumask?
> 
> Looks like I could, yes. But then we need to split the function, as it's
> also used as the .set_affinity hook.

I see, I wasn't taking that into account.

> > And here just clear desc->arch.cpu_mask and set the passed CPU.
> 
> Which would still better be a cpumask_copy(), just given cpumask_of(cpu)
> as input.

As is it, yes.

> >> -    msg.data &= ~MSI_DATA_VECTOR_MASK;
> >> -    msg.data |= MSI_DATA_VECTOR(desc->arch.vector);
> >> +    msg.dest32 = cpu_mask_to_apicid(mask);
> > 
> > And here you can just use cpu_physical_id().
> 
> Right. All of which (up to here; but see below) perhaps better a separate,
> follow-on cleanup change.

Yes, it's too much fuss, and I also have plans in that area to deal
with it myself anyway.  Just wanted to avoid changing this now to be
changed again.  But it's too unrelated to put in this change.

> >>      msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >>      msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >> -    if ( msg.data != ch->msi.msg.data || msg.dest32 != ch->msi.msg.dest32 )
> >> +    if ( msg.dest32 != ch->msi.msg.dest32 )
> >>          hpet_msi_write(ch, &msg);
> > 
> > A further note here, which ties to my comment on the previous patch
> > about loosing the interrupt during the masked window.  If the vector
> > is the same across all CPUs, we no longer need to update the MSI data
> > field, just the address one, which can be done atomically.  We also
> > have signaling from the IOMMU whether the MSI fields need writing.
> 
> Hmm, yes, we can leverage that, as long as we're willing to make assumptions
> here about what exactly iommu_update_ire_from_msi() does: We'd then rely on
> not only the original (untranslated) msg->data not changing, but also the
> translated one. That looks to hold for both Intel and AMD, but it's still
> something we want to be sure we actually want to make the code dependent
> upon. (I'm intending to at least add an assertion to that effect.)

We could still mask when needed, but the masking would be
conditionally done in hpet_msi_write().

It seems however this might be better done as a followup change.

> > We can avoid the masking, and the possible drop of interrupts.
> 
> Hmm, right. There's nothing wrong with the caller relying on the write
> being atomic now. (Really, continuing to use hpet_msi_write() wouldn't
> be a problem, as re-writing the low half of HPET_Tn_ROUTE() with the
> same value is going to be benign. Unless of course that write was the
> source of the extra IRQs I'm seeing.)

Oh, yes, that's right, we don't even need to avoid the write.

> Taking together with what you said further up, having
> set_channel_irq_affinity() no longer use hpet_msi_set_affinity() as it
> is to ...
> 
> >> @@ -328,7 +328,7 @@ static hw_irq_controller hpet_msi_type =
> >>      .shutdown   = hpet_msi_shutdown,
> >>      .enable	    = hpet_msi_unmask,
> >>      .disable    = hpet_msi_mask,
> >> -    .ack        = ack_nonmaskable_msi_irq,
> >> +    .ack        = irq_actor_none,
> >>      .end        = end_nonmaskable_irq,
> >>      .set_affinity   = hpet_msi_set_affinity,
> 
> ... satisfy the use here would then probably be desirable right away.
> The little bit that's left of hpet_msi_set_affinity() would then be
> open-coded in set_channel_irq_affinity().

As you see fit, I'm not going to insist if the changes become too
unrelated to the fix itself.  Can always be done as a followup patch,
specially taking into account we are in hard code freeze.

> Getting rid of the masking would (hopefully) also get rid of the stray
> IRQs that I'm observing, assuming my guessing towards the reason there
> is correct.
> 
> >> @@ -497,6 +503,7 @@ static void set_channel_irq_affinity(str
> >>      spin_lock(&desc->lock);
> >>      hpet_msi_mask(desc);
> >>      hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> > 
> > I would set the vector table ahead of setting the affinity, in case we
> > can drop the mask calls around this block of code.
> 
> Isn't there a problematic window either way round? I can make the change,
> but I don't see that addressing anything. The new comparator value will
> be written later anyway, and interrupts up to that point aren't of any
> interest anyway. I.e. it doesn't matter which of the CPUs gets to handle
> them.

It's preferable to get a silent stray interrupt (if the per-cpu vector
table is correctly setup), rather than to get a message from Xen that
an unknown vector has been received?

If a vector is injected ahead of vector_irq being set Xen would
complain in do_IRQ() that that's no handler for such vector.

> > I also wonder, do you really need the bind_irq_vector() if you
> > manually set the affinity afterwards, and the vector table plus
> > desc->arch.cpu_mask are also set here?
> 
> At the very least I'd then also need to open-code the setting of
> desc->arch.vector and desc->arch.used. Possibly also the setting of the
> bit in desc->arch.used_vectors. And strictly speaking also the
> trace_irq_mask() invocation.

Let's keep it as-is.

> >> --- a/xen/arch/x86/include/asm/irq-vectors.h
> >> +++ b/xen/arch/x86/include/asm/irq-vectors.h
> >> @@ -18,6 +18,15 @@
> >>  /* IRQ0 (timer) is statically allocated but must be high priority. */
> >>  #define IRQ0_VECTOR             0xf0
> >>  
> >> +/*
> >> + * Low-priority (for now statically allocated) vectors, sharing entry
> >> + * points with exceptions in the 0x10 ... 0x1f range, as long as the
> >> + * respective exception has an error code.
> >> + */
> >> +#define FIRST_LOPRIORITY_VECTOR 0x10
> >> +#define HPET_BROADCAST_VECTOR   X86_EXC_AC
> >> +#define LAST_LOPRIORITY_VECTOR  0x1f
> > 
> > I wonder if it won't be clearer to simply reserve a vector if the HPET
> > is used, instead of hijacking the AC one.  It's one vector less, but
> > arguably now that we unconditionally use physical destination mode our
> > pool of vectors has expanded considerably.
> 
> Well, I'd really like to avoid consuming an otherwise usable vector, if
> at all possible (as per Andrew's FRED plans, that won't be possible
> there anymore then).

If re-using the AC vector is not possible with FRED we might want to
do this uniformly and always consume a vector then?

> >> --- a/xen/arch/x86/irq.c
> >> +++ b/xen/arch/x86/irq.c
> >> @@ -755,8 +755,9 @@ void setup_vector_irq(unsigned int cpu)
> >>          if ( !irq_desc_initialized(desc) )
> >>              continue;
> >>          vector = irq_to_vector(irq);
> >> -        if ( vector >= FIRST_HIPRIORITY_VECTOR &&
> >> -             vector <= LAST_HIPRIORITY_VECTOR )
> >> +        if ( vector <= (vector >= FIRST_HIPRIORITY_VECTOR
> >> +                        ? LAST_HIPRIORITY_VECTOR
> >> +                        : LAST_LOPRIORITY_VECTOR) )
> >>              cpumask_set_cpu(cpu, desc->arch.cpu_mask);
> > 
> > I think this is wrong.  The low priority vector used by the HPET will
> > only target a single CPU at a time, and hence adding extra CPUs to
> > that mask as part of AP bringup is not correct.
> 
> I'm not sure about "wrong". It's not strictly necessary for the HPET one,
> I expect, but it's generally what would be necessary. For the HPET one,
> hpet_msi_set_affinity() replaces the value anyway. (I can add a sentence
> to this effect to the description, if that helps.)

I do think it's wrong, it's just not harmful per-se apart from showing
up in the output of dump_irqs().  The value in desc->arch.cpu_mask
should be the CPU that's the destination of the interrupt.  In this
case, the HPET interrupt does have a single destination at a give
time, and adding another one will make the output of dump_irqs() show
two destinations, when the interrupt will target a single interrupt.

If anything you should add the CPU to the affinity set
(desc->affinity), but that's not needed since you already init the
affinity mask with cpumask_setall().

FWIW, I'm working on tentatively getting rid of the
desc->arch.{cpu,old_cpu,pending}_mask fields and converting them to
plain unsigned ints after we have dropped logical interrupt delivery
for external interrupts.

Thanks, Roger.

next prev parent reply	other threads:[~2025-10-17  8:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-16  7:30 [PATCH for-4.21 00/10] x86/HPET: broadcast IRQ and other improvements Jan Beulich
2025-10-16  7:31 ` [PATCH for-4.21 01/10] x86/HPET: limit channel changes Jan Beulich
2025-10-16 10:24   ` Roger Pau Monné
2025-10-16 11:47     ` Jan Beulich
2025-10-16 15:07       ` Roger Pau Monné
2025-10-16 15:16         ` Jan Beulich
2025-10-16 15:25           ` Roger Pau Monné
2025-10-17  9:23   ` Roger Pau Monné
2025-10-17  9:55     ` Jan Beulich
2025-10-16  7:31 ` [PATCH for-4.21 02/10] x86/HPET: disable unused channels Jan Beulich
2025-10-16 11:42   ` Roger Pau Monné
2025-10-16 11:57     ` Jan Beulich
2025-10-16 15:34       ` Roger Pau Monné
2025-10-16 15:55         ` Jan Beulich
2025-10-16 16:28           ` Roger Pau Monné
2025-10-16 16:31   ` Roger Pau Monné
2025-10-17  6:08     ` Jan Beulich
2025-10-17  6:10       ` Jan Beulich
2025-10-16  7:32 ` [PATCH for-4.21 03/10] x86/HPET: use single, global, low-priority vector for broadcast IRQ Jan Beulich
2025-10-16 16:27   ` Roger Pau Monné
2025-10-17  7:15     ` Jan Beulich
2025-10-17  8:20       ` Roger Pau Monné [this message]
2025-10-20  5:53         ` Jan Beulich
2025-10-20 15:49           ` Roger Pau Monné
2025-10-20 16:05             ` Jan Beulich
2025-10-21  8:37               ` Roger Pau Monné
2025-10-16 17:01   ` Andrew Cooper
2025-10-17  6:23     ` Jan Beulich
2025-10-16  7:32 ` [PATCH for-4.21 04/10] x86/HPET: ignore "stale" IRQs Jan Beulich
2025-10-17  9:19   ` Roger Pau Monné
2025-10-17  9:57     ` Jan Beulich
2025-10-17 12:13       ` Roger Pau Monné
2025-10-16  7:32 ` [PATCH 05/10] x86/HPET: avoid indirect call to event handler Jan Beulich
2025-10-16  7:33 ` [PATCH 06/10] x86/HPET: make another channel flags update atomic Jan Beulich
2025-10-16  7:33 ` [PATCH 07/10] x86/HPET: move legacy tick IRQ count adjustment Jan Beulich
2025-10-16  7:34 ` [PATCH 08/10] x86/HPET: shrink IRQ-descriptor locked region in set_channel_irq_affinity() Jan Beulich
2025-10-16  7:34 ` [PATCH 09/10] x86/HPET: reduce hpet_next_event() call sites Jan Beulich
2025-10-16  7:35 ` [PATCH 10/10] x86/HPET: don't use hardcoded 0 for "long timeout" Jan Beulich
2025-10-16 10:05 ` [PATCH for-4.21 00/10] x86/HPET: broadcast IRQ and other improvements Roger Pau Monné
2025-10-16 10:41   ` Jan Beulich
2025-10-17 16:03 ` Oleksii Kurochko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPH8Waqi5hJyCuzO@Mac.lan \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=oleksii.kurochko@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.