Re: [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Jacob Pan <jacob.jun.pan@linux.intel.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>, X86 Kernel <x86@kernel.org>,
	iommu@lists.linux.dev, Lu Baolu <baolu.lu@linux.intel.com>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Joerg Roedel <joro@8bytes.org>, "H. Peter Anvin" <hpa@zytor.com>,
	Borislav Petkov <bp@alien8.de>, Ingo Molnar <mingo@redhat.com>,
	Raj Ashok <ashok.raj@intel.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	maz@kernel.org, seanjc@google.com,
	Robin Murphy <robin.murphy@arm.com>,
	jacob.jun.pan@linux.intel.com
Subject: Re: [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler
Date: Thu, 7 Dec 2023 20:46:07 -0800	[thread overview]
Message-ID: <20231207204607.2d2a3b72@jacob-builder> (raw)
In-Reply-To: <87cyvjun3z.ffs@tglx>

Hi Thomas,

On Wed, 06 Dec 2023 20:50:24 +0100, Thomas Gleixner <tglx@linutronix.de>
wrote:

> On Wed, Nov 15 2023 at 13:56, Peter Zijlstra wrote:
> >
> > Would it not make more sense to write things something like:
> >
> > bool handle_pending_pir()
> > {
> > 	bool handled = false;
> > 	u64 pir_copy[4];
> >
> > 	for (i = 0; i < 4; i++) {
> > 		if (!pid-pir_l[i]) {
> > 			pir_copy[i] = 0;
> > 			continue;
> > 		}
> >
> > 		pir_copy[i] = arch_xchg(&pir->pir_l[i], 0);
> > 		handled |= true;
> > 	}
> >
> > 	if (!handled)
> > 		return handled;
> >
> > 	for_each_set_bit()
> > 		....
> >
> > 	return handled.
> > }  
> 
> I don't understand what the whole copy business is about. It's
> absolutely not required.
> 
> static bool handle_pending_pir(unsigned long *pir)
> {
>         unsigned int idx, vec;
> 	bool handled = false;
>         unsigned long pend;
>         
>         for (idx = 0; offs < 4; idx++) {
>                 if (!pir[idx])
>                 	continue;
> 		pend = arch_xchg(pir + idx, 0);
>                 for_each_set_bit(vec, &pend, 64)
> 			call_irq_handler(vec + idx * 64, NULL);
>                 handled = true;
> 	}
>         return handled;
> }
> 
My thinking is the following:
The PIR cache line is contended by between CPU and IOMMU, where CPU can
access PIR much faster. Nevertheless, when IOMMU does atomic swap of the
PID (PIR included), L1 cache gets evicted. Subsequent CPU read or xchg will
deal with invalid cold cache.

By making a copy of PIR as quickly as possible and clearing PIR with xchg,
we minimized the chance that IOMMU does atomic swap in the middle.
Therefore, having less L1D misses.

In the code above, it does read, xchg, and call_irq_handler() in a loop
to handle the 4 64bit PIR bits at a time. IOMMU has a greater chance to do
atomic xchg on the PIR cache line while doing call_irq_handler(). Therefore,
it causes more L1D misses.

I might be missing something?

I tested the two versions below with my DSA memory fill test and measured
DMA bandwidth and perf cache misses:

#ifdef NO_PIR_COPY
static __always_inline inline bool handle_pending_pir(u64 *pir, struct pt_regs *regs)
{
	int i, vec;
	bool handled = false;
	unsigned long pending;

	for (i = 0; i < 4; i++) {
		if (!pir[i])
			continue;

		pending = arch_xchg(pir + i, 0);
		for_each_set_bit(vec, &pending, 64)
			call_irq_handler(i * 64 + vec, regs);
		handled = true;
	}

	return handled;
}
#else
static __always_inline inline bool handle_pending_pir(u64 *pir, struct pt_regs *regs)
{
	int i, vec = FIRST_EXTERNAL_VECTOR;
	bool handled = false;
	unsigned long pir_copy[4];

	for (i = 0; i < 4; i++)
		pir_copy[i] = pir[i];

	for (i = 0; i < 4; i++) {
		if (!pir_copy[i])
			continue;

		pir_copy[i] = arch_xchg(pir, 0);
		handled = true;
	}

	if (handled) {
		for_each_set_bit_from(vec, pir_copy, FIRST_SYSTEM_VECTOR)
			call_irq_handler(vec, regs);
	}

	return handled;
}
#endif

DEFINE_IDTENTRY_SYSVEC(sysvec_posted_msi_notification)
{
	struct pt_regs *old_regs = set_irq_regs(regs);
	struct pi_desc *pid;
	int i = 0;

	pid = this_cpu_ptr(&posted_interrupt_desc);

	inc_irq_stat(posted_msi_notification_count);
	irq_enter();

	while (i++ < MAX_POSTED_MSI_COALESCING_LOOP) {
		if (!handle_pending_pir(pid->pir64, regs))
			break;
	}

	/*
	 * Clear outstanding notification bit to allow new IRQ notifications,
	 * do this last to maximize the window of interrupt coalescing.
	 */
	pi_clear_on(pid);

	/*
	 * There could be a race of PI notification and the clearing of ON bit,
	 * process PIR bits one last time such that handling the new interrupts
	 * are not delayed until the next IRQ.
	 */
	handle_pending_pir(pid->pir64, regs);

	apic_eoi();
	irq_exit();
	set_irq_regs(old_regs);
}

Without PIR copy:

DMA memfill bandwidth: 4.944 Gbps
Performance counter stats for './run_intr.sh 512 30':                                                             
                                                                                                                   
    77,313,298,506      L1-dcache-loads                                               (79.98%)                     
         8,279,458      L1-dcache-load-misses     #    0.01% of all L1-dcache accesses  (80.03%)                   
    41,654,221,245      L1-dcache-stores                                              (80.01%)                     
            10,476      LLC-load-misses           #    0.31% of all LL-cache accesses  (79.99%)                    
         3,332,748      LLC-loads                                                     (80.00%)                     
                                                                                                                   
      30.212055434 seconds time elapsed                                                                            
                                                                                                                   
       0.002149000 seconds user                                                                                    
      30.183292000 seconds sys
                        

With PIR copy:
DMA memfill bandwidth: 5.029 Gbps
Performance counter stats for './run_intr.sh 512 30':

    78,327,247,423      L1-dcache-loads                                               (80.01%)
         7,762,311      L1-dcache-load-misses     #    0.01% of all L1-dcache accesses  (80.01%)
    42,203,221,466      L1-dcache-stores                                              (79.99%)
            23,691      LLC-load-misses           #    0.67% of all LL-cache accesses  (80.01%)
         3,561,890      LLC-loads                                                     (80.00%)

      30.201065706 seconds time elapsed

       0.005950000 seconds user
      30.167885000 seconds sys


> No?
> 
> > sysvec_posted_blah_blah()
> > {
> > 	bool done = false;
> > 	bool handled;
> >
> > 	for (;;) {
> > 		handled = handle_pending_pir();
> > 		if (done)
> > 			break;
> > 		if (!handled || ++loops > MAX_LOOPS) {  
> 
> That does one loop too many. Should be ++loops == MAX_LOOPS. No?
> 
> > 			pi_clear_on(pid);
> > 			/* once more after clear_on */
> > 			done = true;
> > 		}
> > 	}
> > }
> >
> >
> > Hmm?  
> 
> I think that can be done less convoluted.
> 
> {
> 	struct pi_desc *pid = this_cpu_ptr(&posted_interrupt_desc);
> 	struct pt_regs *old_regs = set_irq_regs(regs);
>         int loops;
> 
> 	for (loops = 0;;) {
>         	bool handled = handle_pending_pir((unsigned
> long)pid->pir);
> 
>                 if (++loops > MAX_LOOPS)
>                 	break;
> 
>                 if (!handled || loops == MAX_LOOPS) {
>                 	pi_clear_on(pid);
>                         /* Break the loop after handle_pending_pir()! */
>                         loops = MAX_LOOPS;
>                 }
> 	}
> 
> 	...
> 	set_irq_regs(old_regs);
> }
> 
> Hmm? :)


Thanks,

Jacob

next prev parent reply	other threads:[~2023-12-08  4:41 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-12  4:16 [PATCH RFC 00/13] Coalesced Interrupt Delivery with posted MSI Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 01/13] x86: Move posted interrupt descriptor out of vmx code Jacob Pan
2023-12-06 16:33   ` Thomas Gleixner
2023-12-08  4:54     ` Jacob Pan
2023-12-08  9:31       ` Thomas Gleixner
2023-12-08 23:21         ` Jacob Pan
2023-12-09  0:28         ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 02/13] x86: Add a Kconfig option for posted MSI Jacob Pan
2023-12-06 16:35   ` Thomas Gleixner
2023-12-09 21:24     ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 03/13] x86: Reserved a per CPU IDT vector for posted MSIs Jacob Pan
2023-12-06 16:47   ` Thomas Gleixner
2023-12-09 21:53     ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 04/13] iommu/vt-d: Add helper and flag to check/disable posted MSI Jacob Pan
2023-12-06 16:49   ` Thomas Gleixner
2023-11-12  4:16 ` [PATCH RFC 05/13] x86/irq: Set up per host CPU posted interrupt descriptors Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 06/13] x86/irq: Unionize PID.PIR for 64bit access w/o casting Jacob Pan
2023-12-06 16:51   ` Thomas Gleixner
2023-11-12  4:16 ` [PATCH RFC 07/13] x86/irq: Add helpers for checking Intel PID Jacob Pan
2023-12-06 19:02   ` Thomas Gleixner
2024-01-26 23:31     ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 08/13] x86/irq: Factor out calling ISR from common_interrupt Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 09/13] x86/irq: Install posted MSI notification handler Jacob Pan
2023-11-15 12:42   ` Peter Zijlstra
2023-11-15 20:05     ` Jacob Pan
2023-11-15 12:56   ` Peter Zijlstra
2023-11-15 20:04     ` Jacob Pan
2023-11-15 20:25       ` Peter Zijlstra
2023-12-06 19:50     ` Thomas Gleixner
2023-12-08  4:46       ` Jacob Pan [this message]
2023-12-08 11:52         ` Thomas Gleixner
2023-12-08 20:02           ` Jacob Pan
2024-01-26 23:32           ` Jacob Pan
2023-12-06 19:14   ` Thomas Gleixner
2023-11-12  4:16 ` [PATCH RFC 10/13] x86/irq: Handle potential lost IRQ during migration and CPU offline Jacob Pan
2023-12-06 20:09   ` Thomas Gleixner
2023-11-12  4:16 ` [PATCH RFC 11/13] iommu/vt-d: Add an irq_chip for posted MSIs Jacob Pan
2023-12-06 20:15   ` Thomas Gleixner
2024-01-26 23:31     ` Jacob Pan
2023-12-06 20:44   ` Thomas Gleixner
2023-12-13  3:42     ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 12/13] iommu/vt-d: Add a helper to retrieve PID address Jacob Pan
2023-12-06 20:19   ` Thomas Gleixner
2024-01-26 23:30     ` Jacob Pan
2024-02-13  8:21       ` Thomas Gleixner
2024-02-13 19:31         ` Jacob Pan
2023-11-12  4:16 ` [PATCH RFC 13/13] iommu/vt-d: Enable posted mode for device MSIs Jacob Pan
2023-12-06 20:26   ` Thomas Gleixner
2023-12-13 22:00     ` Jacob Pan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231207204607.2d2a3b72@jacob-builder \
    --to=jacob.jun.pan@linux.intel.com \
    --cc=ashok.raj@intel.com \
    --cc=baolu.lu@linux.intel.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@intel.com \
    --cc=hpa@zytor.com \
    --cc=iommu@lists.linux.dev \
    --cc=joro@8bytes.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=robin.murphy@arm.com \
    --cc=seanjc@google.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox