All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mathias Nyman <mathias.nyman@linux.intel.com>
To: x86@kernel.org
Cc: linux-pci <linux-pci@vger.kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	LKML <linux-kernel@vger.kernel.org>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Evan Green <evgreen@chromium.org>,
	"Ghorai, Sukumar" <sukumar.ghorai@intel.com>,
	"Amara, Madhusudanarao" <madhusudanarao.amara@intel.com>,
	"Nandamuri, Srikanth" <srikanth.nandamuri@intel.com>
Subject: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug
Date: Wed, 18 Mar 2020 21:25:39 +0200	[thread overview]
Message-ID: <806c51fa-992b-33ac-61a9-00a606f82edb@linux.intel.com> (raw)

Hi

I can reproduce the lost MSI interrupt issue on 5.6-rc6 which includes
the "Plug non-maskable MSI affinity race" patch.

I can see this on a couple platforms, I'm running a script that first generates
a lot of usb traffic, and then in a busyloop sets irq affinity and turns off
and on cpus:

for i in 1 3 5 7; do
	echo "1" > /sys/devices/system/cpu/cpu$i/online
done
echo "A" > "/proc/irq/*/smp_affinity"
echo "A" > "/proc/irq/*/smp_affinity"
echo "F" > "/proc/irq/*/smp_affinity"
for i in 1 3 5 7; do
	echo "0" > /sys/devices/system/cpu/cpu$i/online
done

I added some very simple debugging but I don't really know what to look for.
xhci interrupts (122) just stop after a setting msi affinity, it survived many
similar msi_set_affinity() calls before this.

I'm not that familiar with the inner workings of this, but I'll be happy to
help out with adding debugging and testing patches.

Details:

 cat /proc/interrupts
            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5       CPU6       CPU7
   0:         26          0          0          0          0          0          0          0   IO-APIC    2-edge      timer
   1:          0          0          0          0          0          7          0          0   IO-APIC    1-edge      i8042
   4:          0          4      59941          0          0          0          0          0   IO-APIC    4-edge      ttyS0
   8:          0          0          0          0          0          0          1          0   IO-APIC    8-edge      rtc0
   9:          0         40          8          0          0          0          0          0   IO-APIC    9-fasteoi   acpi
  16:          0          0          0          0          0          0          0          0   IO-APIC   16-fasteoi   i801_smbus
 120:          0          0        293          0          0          0          0          0   PCI-MSI 32768-edge      i915
 121:        728          0          0         58          0          0          0          0   PCI-MSI 520192-edge      enp0s31f6
 122:      63575       2271          0       1957       7262          0          0          0   PCI-MSI 327680-edge      xhci_hcd
 123:          0          0          0          0          0          0          0          0   PCI-MSI 514048-edge      snd_hda_intel:card0
 NMI:          0          0          0          0          0          0          0          0   Non-maskable interrupts
 
trace snippet: 
      <idle>-0     [001] d.h.   129.676900: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677507: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677556: xhci_irq: xhci irq
      <idle>-0     [001] d.h.   129.677647: xhci_irq: xhci irq
      <...>-14    [001] d..1   129.679802: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 2 -> 6
      <idle>-0     [003] d.h.   129.682639: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.682769: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.682908: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683552: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683677: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.683819: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689017: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689140: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689307: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.689984: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.690107: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.690278: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695541: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695674: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.695839: xhci_irq: xhci irq
      <idle>-0     [003] d.H.   129.696667: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.696797: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.696973: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702288: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702380: xhci_irq: xhci irq
      <idle>-0     [003] d.h.   129.702493: xhci_irq: xhci irq
 migration/3-24    [003] d..1   129.703150: msi_set_affinity: direct update msi 122, vector 33 -> 33, apicid: 6 -> 0
 kworker/0:0-5     [000] d.h.   131.328790: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   133.312704: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   135.360786: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
      <idle>-0     [000] d.h.   137.344694: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   139.128679: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   141.312686: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   143.360703: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0
 kworker/0:0-5     [000] d.h.   145.344791: msi_set_affinity: direct update msi 121, vector 34 -> 34, apicid: 0 -> 0


--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -92,6 +92,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
            cfg->vector == old_cfg.vector ||
            old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR ||
            cfg->dest_apicid == old_cfg.dest_apicid) {
+               trace_printk("direct update msi %u, vector %u -> %u, apicid: %u -> %u\n",
+                    irqd->irq,
+                    old_cfg.vector, cfg->vector,
+                    old_cfg.dest_apicid, cfg->dest_apicid);
                irq_msi_update_msg(irqd, cfg);
                return ret;
        }
@@ -134,7 +138,10 @@ msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
         */
        if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
                this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
-
+       trace_printk("twostep update msi, irq %u, vector %u -> %u, apicid: %u -> %u\n",
+                    irqd->irq,
+                    old_cfg.vector, cfg->vector,
+                    old_cfg.dest_apicid, cfg->dest_apicid);
        /* Redirect it to the new vector on the local CPU temporarily */
        old_cfg.vector = cfg->vector;
        irq_msi_update_msg(irqd, &old_cfg);

Thanks
-Mathias

             reply	other threads:[~2020-03-18 19:23 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-18 19:25 Mathias Nyman [this message]
2020-03-19 20:24 ` MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug Evan Green
2020-03-20  8:07   ` Mathias Nyman
2020-03-20  9:52 ` Thomas Gleixner
2020-03-23  9:42   ` Mathias Nyman
2020-03-23 14:10     ` Thomas Gleixner
2020-03-23 20:32       ` Mathias Nyman
2020-03-24  0:24         ` Thomas Gleixner
2020-03-24 16:17           ` Evan Green
2020-03-24 19:03             ` Thomas Gleixner
2020-05-01 18:43               ` Raj, Ashok
2020-05-05 19:36                 ` Thomas Gleixner
2020-05-05 20:16                   ` Raj, Ashok
2020-05-05 21:47                     ` Thomas Gleixner
2020-05-07 12:18                       ` Raj, Ashok
2020-05-07 12:53                         ` Thomas Gleixner
2020-05-07 17:57                           ` Raj, Ashok
2020-05-07 19:41                             ` Thomas Gleixner
2020-03-25 17:12             ` Mathias Nyman
     [not found] <20200508005528.GB61703@otc-nc-03>
2020-05-08 11:04 ` Thomas Gleixner
2020-05-08 16:09   ` Raj, Ashok
2020-05-08 16:49     ` Thomas Gleixner
2020-05-11 19:03       ` Raj, Ashok
2020-05-11 20:14         ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=806c51fa-992b-33ac-61a9-00a606f82edb@linux.intel.com \
    --to=mathias.nyman@linux.intel.com \
    --cc=bhelgaas@google.com \
    --cc=evgreen@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=madhusudanarao.amara@intel.com \
    --cc=srikanth.nandamuri@intel.com \
    --cc=sukumar.ghorai@intel.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.