From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: LKML <linux-kernel@vger.kernel.org>, X86 Kernel <x86@kernel.org>,
iommu@lists.linux.dev, Thomas Gleixner <tglx@linutronix.de>,
Lu Baolu <baolu.lu@linux.intel.com>,
kvm@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
Joerg Roedel <joro@8bytes.org>, "H. Peter Anvin" <hpa@zytor.com>,
Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
Raj Ashok <ashok.raj@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
maz@kernel.org, seanjc@google.com,
Robin Murphy <robin.murphy@arm.com>,
jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler
Date: Wed, 15 Nov 2023 12:04:01 -0800 [thread overview]
Message-ID: <20231115120401.3e02d977@jacob-builder> (raw)
In-Reply-To: <20231115125624.GF3818@noisy.programming.kicks-ass.net>
Hi Peter,
On Wed, 15 Nov 2023 13:56:24 +0100, Peter Zijlstra <peterz@infradead.org>
wrote:
> On Sat, Nov 11, 2023 at 08:16:39PM -0800, Jacob Pan wrote:
>
> > +static __always_inline inline void handle_pending_pir(struct pi_desc
> > *pid, struct pt_regs *regs) +{
>
> __always_inline means that... (A)
>
> > + int i, vec = FIRST_EXTERNAL_VECTOR;
> > + u64 pir_copy[4];
> > +
> > + /*
> > + * Make a copy of PIR which contains IRQ pending bits for
> > vectors,
> > + * then invoke IRQ handlers for each pending vector.
> > + * If any new interrupts were posted while we are processing,
> > will
> > + * do again before allowing new notifications. The idea is to
> > + * minimize the number of the expensive notifications if IRQs
> > come
> > + * in a high frequency burst.
> > + */
> > + for (i = 0; i < 4; i++)
> > + pir_copy[i] = raw_atomic64_xchg((atomic64_t
> > *)&pid->pir_l[i], 0); +
> > + /*
> > + * Ideally, we should start from the high order bits set in
> > the PIR
> > + * since each bit represents a vector. Higher order bit
> > position means
> > + * the vector has higher priority. But external vectors are
> > allocated
> > + * based on availability not priority.
> > + *
> > + * EOI is included in the IRQ handlers call to apic_ack_irq,
> > which
> > + * allows higher priority system interrupt to get in between.
> > + */
> > + for_each_set_bit_from(vec, (unsigned long *)&pir_copy[0], 256)
> > + call_irq_handler(vec, regs);
> > +
> > +}
> > +
> > +/*
> > + * Performance data shows that 3 is good enough to harvest 90+% of the
> > benefit
> > + * on high IRQ rate workload.
> > + * Alternatively, could make this tunable, use 3 as default.
> > + */
> > +#define MAX_POSTED_MSI_COALESCING_LOOP 3
> > +
> > +/*
> > + * For MSIs that are delivered as posted interrupts, the CPU
> > notifications
> > + * can be coalesced if the MSIs arrive in high frequency bursts.
> > + */
> > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
> > +{
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > + struct pi_desc *pid;
> > + int i = 0;
> > +
> > + pid = this_cpu_ptr(&posted_interrupt_desc);
> > +
> > + inc_irq_stat(posted_msi_notification_count);
> > + irq_enter();
> > +
> > + while (i++ < MAX_POSTED_MSI_COALESCING_LOOP) {
> > + handle_pending_pir(pid, regs);
> > +
> > + /*
> > + * If there are new interrupts posted in PIR, do
> > again. If
> > + * nothing pending, no need to wait for more
> > interrupts.
> > + */
> > + if (is_pir_pending(pid))
>
> So this reads those same 4 words we xchg in handle_pending_pir(), right?
>
> > + continue;
> > + else
> > + break;
> > + }
> > +
> > + /*
> > + * Clear outstanding notification bit to allow new IRQ
> > notifications,
> > + * do this last to maximize the window of interrupt coalescing.
> > + */
> > + pi_clear_on(pid);
> > +
> > + /*
> > + * There could be a race of PI notification and the clearing
> > of ON bit,
> > + * process PIR bits one last time such that handling the new
> > interrupts
> > + * are not delayed until the next IRQ.
> > + */
> > + if (unlikely(is_pir_pending(pid)))
> > + handle_pending_pir(pid, regs);
>
> (A) ... we get _two_ copies of that thing in this function. Does that
> make sense ?
>
> > +
> > + apic_eoi();
> > + irq_exit();
> > + set_irq_regs(old_regs);
> > +}
> > #endif /* X86_POSTED_MSI */
>
> Would it not make more sense to write things something like:
>
it is a great idea, we can save expensive xchg if pir[i] is 0. But I have
to tweak a little to let it perform better.
> bool handle_pending_pir()
> {
> bool handled = false;
> u64 pir_copy[4];
>
> for (i = 0; i < 4; i++) {
> if (!pid-pir_l[i]) {
> pir_copy[i] = 0;
> continue;
> }
>
> pir_copy[i] = arch_xchg(&pir->pir_l[i], 0);
we are interleaving cacheline read and xchg. So made it to
for (i = 0; i < 4; i++) {
pir_copy[i] = pid->pir_l[i];
}
for (i = 0; i < 4; i++) {
if (pir_copy[i]) {
pir_copy[i] = arch_xchg(&pid->pir_l[i], 0);
handled = true;
}
}
With DSA MEMFILL test just one queue one MSI, we are saving 3 xchg per loop.
Here is the performance comparison in IRQ rate:
Original RFC 9.29 m/sec,
Optimized in your email 8.82m/sec,
Tweaked above: 9.54m/s
I need to test with more MSI vectors spreading out to all 4 u64. I suspect
the benefit will decrease since we need to do both read and xchg for
non-zero entries.
> handled |= true;
> }
>
> if (!handled)
> return handled;
>
> for_each_set_bit()
> ....
>
> return handled.
> }
>
> sysvec_posted_blah_blah()
> {
> bool done = false;
> bool handled;
>
> for (;;) {
> handled = handle_pending_pir();
> if (done)
> break;
> if (!handled || ++loops > MAX_LOOPS) {
> pi_clear_on(pid);
> /* once more after clear_on */
> done = true;
> }
> }
> }
>
>
> Hmm?
Thanks,
Jacob
next prev parent reply other threads:[~2023-11-15 19:59 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-12 4:16 [PATCH RFC 00/13] Coalesced Interrupt Delivery with posted MSI Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 01/13] x86: Move posted interrupt descriptor out of vmx code Jacob Pan
2023-12-06 16:33 ` Thomas Gleixner
2023-12-08 4:54 ` Jacob Pan
2023-12-08 9:31 ` Thomas Gleixner
2023-12-08 23:21 ` Jacob Pan
2023-12-09 0:28 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 02/13] x86: Add a Kconfig option for posted MSI Jacob Pan
2023-12-06 16:35 ` Thomas Gleixner
2023-12-09 21:24 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 03/13] x86: Reserved a per CPU IDT vector for posted MSIs Jacob Pan
2023-12-06 16:47 ` Thomas Gleixner
2023-12-09 21:53 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 04/13] iommu/vt-d: Add helper and flag to check/disable posted MSI Jacob Pan
2023-12-06 16:49 ` Thomas Gleixner
2023-11-12 4:16 ` [PATCH RFC 05/13] x86/irq: Set up per host CPU posted interrupt descriptors Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 06/13] x86/irq: Unionize PID.PIR for 64bit access w/o casting Jacob Pan
2023-12-06 16:51 ` Thomas Gleixner
2023-11-12 4:16 ` [PATCH RFC 07/13] x86/irq: Add helpers for checking Intel PID Jacob Pan
2023-12-06 19:02 ` Thomas Gleixner
2024-01-26 23:31 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 08/13] x86/irq: Factor out calling ISR from common_interrupt Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler Jacob Pan
2023-11-15 12:42 ` Peter Zijlstra
2023-11-15 20:05 ` Jacob Pan
2023-11-15 12:56 ` Peter Zijlstra
2023-11-15 20:04 ` Jacob Pan [this message]
2023-11-15 20:25 ` Peter Zijlstra
2023-12-06 19:50 ` Thomas Gleixner
2023-12-08 4:46 ` Jacob Pan
2023-12-08 11:52 ` Thomas Gleixner
2023-12-08 20:02 ` Jacob Pan
2024-01-26 23:32 ` Jacob Pan
2023-12-06 19:14 ` Thomas Gleixner
2023-11-12 4:16 ` [PATCH RFC 10/13] x86/irq: Handle potential lost IRQ during migration and CPU offline Jacob Pan
2023-12-06 20:09 ` Thomas Gleixner
2023-11-12 4:16 ` [PATCH RFC 11/13] iommu/vt-d: Add an irq_chip for posted MSIs Jacob Pan
2023-12-06 20:15 ` Thomas Gleixner
2024-01-26 23:31 ` Jacob Pan
2023-12-06 20:44 ` Thomas Gleixner
2023-12-13 3:42 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 12/13] iommu/vt-d: Add a helper to retrieve PID address Jacob Pan
2023-12-06 20:19 ` Thomas Gleixner
2024-01-26 23:30 ` Jacob Pan
2024-02-13 8:21 ` Thomas Gleixner
2024-02-13 19:31 ` Jacob Pan
2023-11-12 4:16 ` [PATCH RFC 13/13] iommu/vt-d: Enable posted mode for device MSIs Jacob Pan
2023-12-06 20:26 ` Thomas Gleixner
2023-12-13 22:00 ` Jacob Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231115120401.3e02d977@jacob-builder \
--to=jacob.jun.pan@linux.intel.com \
--cc=ashok.raj@intel.com \
--cc=baolu.lu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dave.hansen@intel.com \
--cc=hpa@zytor.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox