All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: "xen-devel@lists.xenproject.org" <xen-devel@lists.xenproject.org>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Oleksii Kurochko <oleksii.kurochko@gmail.com>
Subject: Re: [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ
Date: Mon, 27 Oct 2025 12:33:36 +0100	[thread overview]
Message-ID: <aP9YkLo782XbfMQM@Mac.lan> (raw)
In-Reply-To: <6428217d-b5f6-4948-aff2-b007a6cfcfc0@suse.com>

On Mon, Oct 27, 2025 at 11:23:58AM +0100, Jan Beulich wrote:
> On 24.10.2025 15:24, Roger Pau Monné wrote:
> > On Thu, Oct 23, 2025 at 05:50:17PM +0200, Jan Beulich wrote:
> >> @@ -343,6 +347,12 @@ static int __init hpet_setup_msi_irq(str
> >>      u32 cfg = hpet_read32(HPET_Tn_CFG(ch->idx));
> >>      irq_desc_t *desc = irq_to_desc(ch->msi.irq);
> >>  
> >> +    clear_irq_vector(ch->msi.irq);
> >> +    ret = bind_irq_vector(ch->msi.irq, HPET_BROADCAST_VECTOR, &cpu_online_map);
> > 
> > By passing cpu_online_map here, it leads to _bind_irq_vector() doing:
> > 
> > cpumask_copy(desc->arch.cpu_mask, &cpu_online_map);
> > 
> > Which strictly speaking is wrong.  However this is just a cosmetic
> > issue until the irq is used for the first time, at which point it will
> > be assigned to a concrete CPU.
> > 
> > You could do:
> > 
> > cpumask_clear(desc->arch.cpu_mask);
> > cpumask_set_cpu(cpumask_any(&cpu_online_map), desc->arch.cpu_mask);
> > 
> > (Or equivalent)
> > 
> > To assign the interrupt to a concrete CPU and reflex it on the
> > cpu_mask after the bind_irq_vector() call, but I can live with it
> > being like this.  I have patches to adjust _bind_irq_vector() myself,
> > which I hope I will be able to post soon.
> 
> Hmm, I wrongly memorized hpet_broadcast_init() as being pre-SMP-init only.
> It has three call sites:
> - mwait_idle_init(), called from cpuidle_presmp_init(),
> - amd_cpuidle_init(), calling in only when invoked the very first time,
>   which is again from cpuidle_presmp_init(),
> - _disable_pit_irq(), called from the regular initcall disable_pit_irq().
> I.e. for the latter you're right that the CPU mask is too broad (in only a
> cosmetic way though). Would be you okay if I used cpumask_of(0) in place
> of &cpu_online_map?

Using cpumask_of(0) would be OK, as the per-cpu vector_irq array will
be updated ahead of assigning the interrupt to a CPU, and hence it
doesn't need to be done for all possible online CPUs in
_bind_irq_vector().

In the context here it would be more accurate to provide an empty CPU
mask, as the interrupt is not yet targeting any CPU.  Using CPU 0
would be a placeholder, which seems fine for the purpose.

> >> @@ -472,19 +482,50 @@ static struct hpet_event_channel *hpet_g
> >>  static void set_channel_irq_affinity(struct hpet_event_channel *ch)
> >>  {
> >>      struct irq_desc *desc = irq_to_desc(ch->msi.irq);
> >> +    struct msi_msg msg = ch->msi.msg;
> >>  
> >>      ASSERT(!local_irq_is_enabled());
> >>      spin_lock(&desc->lock);
> >> -    hpet_msi_mask(desc);
> >> -    hpet_msi_set_affinity(desc, cpumask_of(ch->cpu));
> >> -    hpet_msi_unmask(desc);
> >> +
> >> +    per_cpu(vector_irq, ch->cpu)[HPET_BROADCAST_VECTOR] = ch->msi.irq;
> >> +
> >> +    /*
> >> +     * Open-coding a reduced form of hpet_msi_set_affinity() here.  With the
> >> +     * actual update below (either of the IRTE or of [just] message address;
> >> +     * with interrupt remapping message address/data don't change) now being
> >> +     * atomic, we can avoid masking the IRQ around the update.  As a result
> >> +     * we're no longer at risk of missing IRQs (provided hpet_broadcast_enter()
> >> +     * keeps setting the new deadline only afterwards).
> >> +     */
> >> +    cpumask_copy(desc->arch.cpu_mask, cpumask_of(ch->cpu));
> >> +
> >>      spin_unlock(&desc->lock);
> >>  
> >> -    spin_unlock(&ch->lock);
> >> +    msg.dest32 = cpu_physical_id(ch->cpu);
> >> +    msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK;
> >> +    msg.address_lo |= MSI_ADDR_DEST_ID(msg.dest32);
> >> +    if ( msg.dest32 != ch->msi.msg.dest32 )
> >> +    {
> >> +        ch->msi.msg = msg;
> >>  
> >> -    /* We may have missed an interrupt due to the temporary masking. */
> >> -    if ( ch->event_handler && ch->next_event < NOW() )
> >> -        ch->event_handler(ch);
> >> +        if ( iommu_intremap != iommu_intremap_off )
> >> +        {
> >> +            int rc = iommu_update_ire_from_msi(&ch->msi, &msg);
> >> +
> >> +            ASSERT(rc <= 0);
> >> +            if ( rc >= 0 )
> > 
> > I don't think the rc > 0 part of this check is meaningful, as any rc
> > value > 0 will trigger the ASSERT(rc <= 0) ahead of it.  The code
> > inside of the if block itself only contains ASSERTs, so it's only
> > relevant for debug=y builds that will also have the rc <= 0 ASSERT.
> > 
> > You could possibly use:
> > 
> > ASSERT(rc <= 0);
> > if ( !rc )
> > {
> >     ASSERT(...
> > 
> > And achieve the same result?
> 
> Yes, except that I'd like to keep the >= to cover the case if the first
> assertion was dropped / commented out, as well as to have a doc effect.

Oh, OK.  Fair enough, I wasn't taking into account that this could be
done in case code is modified.

> >> @@ -991,6 +997,13 @@ void alloc_direct_apic_vector(uint8_t *v
> >>      spin_unlock(&lock);
> >>  }
> >>  
> >> +/* This could free any vectors, but is needed only for low-prio ones. */
> >> +void __init free_lopriority_vector(uint8_t vector)
> >> +{
> >> +    ASSERT(vector < FIRST_HIPRIORITY_VECTOR);
> >> +    clear_bit(vector, used_vectors);
> >> +}
> > 
> > I'm undecided whether we want to have such helper.  This is all very
> > specific to the single use by the HPET vector, and hence might be best
> > to simply put the clear_bit() inside of hpet_broadcast_late_init()
> > itself.
> 
> I wanted to avoid making used_vectors non-static.
> 
> > I could see for example other callers wanting to use this also
> > requiring cleanup of the per cpu vector_irq arrays.  Given it's (so
> > far) very limited usage it might be clearer to open-code the
> > clear_bit().
> 
> Dealing with vector_irq[] is a separate thing, though, isn't it?

Possibly, that's part of the binding, rather than the allocation
itself (which is what you cover here).

> >> --- a/xen/drivers/passthrough/amd/iommu_intr.c
> >> +++ b/xen/drivers/passthrough/amd/iommu_intr.c
> >> @@ -551,6 +551,13 @@ int cf_check amd_iommu_msi_msg_update_ir
> >>          for ( i = 1; i < nr; ++i )
> >>              msi_desc[i].remap_index = msi_desc->remap_index + i;
> >>          msg->data = data;
> >> +        /*
> >> +         * While the low address bits don't matter, "canonicalize" the address
> >> +         * by zapping the bits that were transferred to the IRTE.  This way
> >> +         * callers can check for there actually needing to be an update to
> >> +         * wherever the address is put.
> >> +         */
> >> +        msg->address_lo &= ~(MSI_ADDR_DESTMODE_MASK | MSI_ADDR_DEST_ID_MASK);
> > 
> > You might want to mention this change on the commit message also, as
> > it could look unrelated to the rest of the code?
> 
> I thought the comment here provided enough context and detail. I've added
> "AMD interrupt remapping code so far didn't "return" a consistent MSI
>  address when translating an MSI message. Clear respective fields there, to
>  keep the respective assertion in set_channel_irq_affinity() from
>  triggering."

LGTM, I would possibly remove the last "respective" for being
repetitive given the previous one in the sentence.

Thanks, Roger.


  reply	other threads:[~2025-10-27 11:34 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-23 15:48 [PATCH v3 for-4.21 0/9] x86/HPET: broadcast IRQ and other improvements Jan Beulich
2025-10-23 15:49 ` [PATCH v3 for-4.21 1/9] x86/HPET: deal with unused channels Jan Beulich
2025-10-24 12:16   ` Roger Pau Monné
2025-10-23 15:50 ` [PATCH v3 for-4.21 2/9] x86/HPET: use single, global, low-priority vector for broadcast IRQ Jan Beulich
2025-10-24 13:24   ` Roger Pau Monné
2025-10-27 10:23     ` Jan Beulich
2025-10-27 11:33       ` Roger Pau Monné [this message]
2025-10-27 11:53         ` Jan Beulich
2025-10-27 11:57           ` Jan Beulich
2025-10-27 11:57           ` Roger Pau Monné
2025-10-23 15:51 ` [PATCH v3 for-4.21 3/9] x86/HPET: replace handle_hpet_broadcast()'s on-stack cpumask_t Jan Beulich
2025-10-27 12:25   ` Jan Beulich
2025-10-27 15:20     ` Oleksii Kurochko
2025-10-23 15:51 ` [PATCH v3 4/9] x86/HPET: avoid indirect call to event handler Jan Beulich
2025-10-23 15:51 ` [PATCH v3 5/9] x86/HPET: make another channel flags update atomic Jan Beulich
2025-10-23 15:52 ` [PATCH v3 6/9] x86/HPET: move legacy tick IRQ count adjustment Jan Beulich
2025-10-23 15:52 ` [PATCH v3 7/9] x86/HPET: reduce hpet_next_event() call sites Jan Beulich
2025-10-23 15:52 ` [PATCH v3 8/9] x86/HPET: drop "long timeout" handling from reprogram_hpet_evt_channel() Jan Beulich
2025-10-23 15:53 ` [PATCH v3 9/9] x86/HPET: simplify "expire" check a little in reprogram_hpet_evt_channel() Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aP9YkLo782XbfMQM@Mac.lan \
    --to=roger.pau@citrix.com \
    --cc=andrew.cooper3@citrix.com \
    --cc=jbeulich@suse.com \
    --cc=oleksii.kurochko@gmail.com \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.