From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Zeng Guang <guang.zeng@intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>, X86 Kernel <x86@kernel.org>,
Peter Zijlstra <peterz@infradead.org>,
"iommu@lists.linux.dev" <iommu@lists.linux.dev>,
Thomas Gleixner <tglx@linutronix.de>,
Lu Baolu <baolu.lu@linux.intel.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"Hansen, Dave" <dave.hansen@intel.com>,
Joerg Roedel <joro@8bytes.org>, "H. Peter Anvin" <hpa@zytor.com>,
Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
"Luse, Paul E" <paul.e.luse@intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
Jens Axboe <axboe@kernel.dk>, "Raj, Ashok" <ashok.raj@intel.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"maz@kernel.org" <maz@kernel.org>,
"seanjc@google.com" <seanjc@google.com>,
Robin Murphy <robin.murphy@arm.com>,
jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH 09/15] x86/irq: Install posted MSI notification handler
Date: Tue, 2 Apr 2024 19:43:55 -0700 [thread overview]
Message-ID: <20240402194355.72b2ade8@jacob-builder> (raw)
In-Reply-To: <a4f169fa-663d-4a94-878b-d783f67d48c9@intel.com>
Hi Zeng,
On Fri, 29 Mar 2024 15:32:00 +0800, Zeng Guang <guang.zeng@intel.com> wrote:
> On 1/27/2024 7:42 AM, Jacob Pan wrote:
> > @@ -353,6 +360,111 @@ void intel_posted_msi_init(void)
> > pid->nv = POSTED_MSI_NOTIFICATION_VECTOR;
> > pid->ndst = this_cpu_read(x86_cpu_to_apicid);
> > }
> > +
> > +/*
> > + * De-multiplexing posted interrupts is on the performance path, the
> > code
> > + * below is written to optimize the cache performance based on the
> > following
> > + * considerations:
> > + * 1.Posted interrupt descriptor (PID) fits in a cache line that is
> > frequently
> > + * accessed by both CPU and IOMMU.
> > + * 2.During posted MSI processing, the CPU needs to do 64-bit read and
> > xchg
> > + * for checking and clearing posted interrupt request (PIR), a 256
> > bit field
> > + * within the PID.
> > + * 3.On the other side, the IOMMU does atomic swaps of the entire PID
> > cache
> > + * line when posting interrupts and setting control bits.
> > + * 4.The CPU can access the cache line a magnitude faster than the
> > IOMMU.
> > + * 5.Each time the IOMMU does interrupt posting to the PIR will evict
> > the PID
> > + * cache line. The cache line states after each operation are as
> > follows:
> > + * CPU IOMMU PID Cache line
> > state
> > + * ---------------------------------------------------------------
> > + *...read64 exclusive
> > + *...lock xchg64 modified
> > + *... post/atomic swap invalid
> > + *...-------------------------------------------------------------
> > + *
> > + * To reduce L1 data cache miss, it is important to avoid contention
> > with
> > + * IOMMU's interrupt posting/atomic swap. Therefore, a copy of PIR is
> > used
> > + * to dispatch interrupt handlers.
> > + *
> > + * In addition, the code is trying to keep the cache line state
> > consistent
> > + * as much as possible. e.g. when making a copy and clearing the PIR
> > + * (assuming non-zero PIR bits are present in the entire PIR), it does:
> > + * read, read, read, read, xchg, xchg, xchg, xchg
> > + * instead of:
> > + * read, xchg, read, xchg, read, xchg, read, xchg
> > + */
> > +static __always_inline inline bool handle_pending_pir(u64 *pir, struct
> > pt_regs *regs) +{
> > + int i, vec = FIRST_EXTERNAL_VECTOR;
> > + unsigned long pir_copy[4];
> > + bool handled = false;
> > +
> > + for (i = 0; i < 4; i++)
> > + pir_copy[i] = pir[i];
> > +
> > + for (i = 0; i < 4; i++) {
> > + if (!pir_copy[i])
> > + continue;
> > +
> > + pir_copy[i] = arch_xchg(pir, 0);
>
> Here is a problem that pir_copy[i] will always be written as pir[0].
> This leads to handle spurious posted MSIs later.
Yes, you are right. It should be
pir_copy[i] = arch_xchg(&pir[i], 0);
Will fix in v2, really appreciated.
> > + handled = true;
> > + }
> > +
> > + if (handled) {
> > + for_each_set_bit_from(vec, pir_copy,
> > FIRST_SYSTEM_VECTOR)
> > + call_irq_handler(vec, regs);
> > + }
> > +
> > + return handled;
> > +}
> > +
> > +/*
> > + * Performance data shows that 3 is good enough to harvest 90+% of the
> > benefit
> > + * on high IRQ rate workload.
> > + */
> > +#define MAX_POSTED_MSI_COALESCING_LOOP 3
> > +
> > +/*
> > + * For MSIs that are delivered as posted interrupts, the CPU
> > notifications
> > + * can be coalesced if the MSIs arrive in high frequency bursts.
> > + */
> > +DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
> > +{
> > + struct pt_regs *old_regs = set_irq_regs(regs);
> > + struct pi_desc *pid;
> > + int i = 0;
> > +
> > + pid = this_cpu_ptr(&posted_interrupt_desc);
> > +
> > + inc_irq_stat(posted_msi_notification_count);
> > + irq_enter();
> > +
> > + /*
> > + * Max coalescing count includes the extra round of
> > handle_pending_pir
> > + * after clearing the outstanding notification bit. Hence, at
> > most
> > + * MAX_POSTED_MSI_COALESCING_LOOP - 1 loops are executed here.
> > + */
> > + while (++i < MAX_POSTED_MSI_COALESCING_LOOP) {
> > + if (!handle_pending_pir(pid->pir64, regs))
> > + break;
> > + }
> > +
> > + /*
> > + * Clear outstanding notification bit to allow new IRQ
> > notifications,
> > + * do this last to maximize the window of interrupt coalescing.
> > + */
> > + pi_clear_on(pid);
> > +
> > + /*
> > + * There could be a race of PI notification and the clearing
> > of ON bit,
> > + * process PIR bits one last time such that handling the new
> > interrupts
> > + * are not delayed until the next IRQ.
> > + */
> > + handle_pending_pir(pid->pir64, regs);
> > +
> > + apic_eoi();
> > + irq_exit();
> > + set_irq_regs(old_regs);
> > }
> > #endif /* X86_POSTED_MSI */
> >
Thanks,
Jacob
next prev parent reply other threads:[~2024-04-03 2:39 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-26 23:42 [PATCH 00/15] Coalesced Interrupt Delivery with posted MSI Jacob Pan
2024-01-26 23:42 ` [PATCH 01/15] x86/irq: Move posted interrupt descriptor out of vmx code Jacob Pan
2024-01-26 23:42 ` [PATCH 02/15] x86/irq: Unionize PID.PIR for 64bit access w/o casting Jacob Pan
2024-01-26 23:42 ` [PATCH 03/15] x86/irq: Use bitfields exclusively in posted interrupt descriptor Jacob Pan
2024-01-31 1:48 ` Sean Christopherson
2024-02-06 0:40 ` Jacob Pan
2024-01-26 23:42 ` [PATCH 04/15] x86/irq: Add a Kconfig option for posted MSI Jacob Pan
2024-04-05 2:28 ` Robert Hoo
2024-04-05 15:54 ` Jacob Pan
2024-01-26 23:42 ` [PATCH 05/15] x86/irq: Reserve a per CPU IDT vector for posted MSIs Jacob Pan
2024-04-04 13:38 ` Robert Hoo
2024-04-04 17:17 ` Jacob Pan
2024-01-26 23:42 ` [PATCH 06/15] x86/irq: Set up per host CPU posted interrupt descriptors Jacob Pan
2024-02-13 19:44 ` Jacob Pan
2024-01-26 23:42 ` [PATCH 07/15] x86/irq: Add accessors for " Jacob Pan
2024-01-26 23:42 ` [PATCH 08/15] x86/irq: Factor out calling ISR from common_interrupt Jacob Pan
2024-01-26 23:42 ` [PATCH 09/15] x86/irq: Install posted MSI notification handler Jacob Pan
2024-03-29 7:32 ` Zeng Guang
2024-04-03 2:43 ` Jacob Pan [this message]
2024-01-26 23:42 ` [PATCH 10/15] x86/irq: Factor out common code for checking pending interrupts Jacob Pan
2024-01-26 23:42 ` [PATCH 11/15] x86/irq: Extend checks for pending vectors to posted interrupts Jacob Pan
2024-01-26 23:42 ` [PATCH 12/15] iommu/vt-d: Make posted MSI an opt-in cmdline option Jacob Pan
2024-01-26 23:42 ` [PATCH 13/15] iommu/vt-d: Add an irq_chip for posted MSIs Jacob Pan
2024-01-26 23:42 ` [PATCH 14/15] iommu/vt-d: Add a helper to retrieve PID address Jacob Pan
2024-01-26 23:42 ` [PATCH 15/15] iommu/vt-d: Enable posted mode for device MSIs Jacob Pan
2024-02-08 15:34 ` [PATCH 00/15] Coalesced Interrupt Delivery with posted MSI Jens Axboe
2024-02-09 17:43 ` Jacob Pan
2024-02-09 20:31 ` Jens Axboe
2024-02-12 18:27 ` Jacob Pan
2024-02-12 18:36 ` Jens Axboe
2024-02-12 20:13 ` Jacob Pan
2024-02-13 1:10 ` Jacob Pan
2024-04-04 13:45 ` Robert Hoo
2024-04-04 17:37 ` Jacob Pan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240402194355.72b2ade8@jacob-builder \
--to=jacob.jun.pan@linux.intel.com \
--cc=ashok.raj@intel.com \
--cc=axboe@kernel.dk \
--cc=baolu.lu@linux.intel.com \
--cc=bp@alien8.de \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@intel.com \
--cc=guang.zeng@intel.com \
--cc=hpa@zytor.com \
--cc=iommu@lists.linux.dev \
--cc=joro@8bytes.org \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=paul.e.luse@intel.com \
--cc=peterz@infradead.org \
--cc=robin.murphy@arm.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).