From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.7 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6F23C54E49 for ; Thu, 7 May 2020 19:41:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id ABB27208CA for ; Thu, 7 May 2020 19:41:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727930AbgEGTlq (ORCPT ); Thu, 7 May 2020 15:41:46 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1726320AbgEGTlp (ORCPT ); Thu, 7 May 2020 15:41:45 -0400 Received: from Galois.linutronix.de (Galois.linutronix.de [IPv6:2a0a:51c0:0:12e:550::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1C22CC05BD43; Thu, 7 May 2020 12:41:45 -0700 (PDT) Received: from p5de0bf0b.dip0.t-ipconnect.de ([93.224.191.11] helo=nanos.tec.linutronix.de) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1jWmOn-0004f0-9z; Thu, 07 May 2020 21:41:37 +0200 Received: by nanos.tec.linutronix.de (Postfix, from userid 1000) id 7396E102652; Thu, 7 May 2020 21:41:36 +0200 (CEST) From: Thomas Gleixner To: "Raj\, Ashok" Cc: "Raj\, Ashok" , Evan Green , Mathias Nyman , x86@kernel.org, linux-pci , LKML , Bjorn Helgaas , "Ghorai\, Sukumar" , "Amara\, Madhusudanarao" , "Nandamuri\, Srikanth" , Ashok Raj Subject: Re: MSI interrupt for xhci still lost on 5.6-rc6 after cpu hotplug In-Reply-To: <20200507175715.GA22426@otc-nc-03> References: <20200501184326.GA17961@araj-mobl1.jf.intel.com> <878si6rx7f.fsf@nanos.tec.linutronix.de> <20200505201616.GA15481@otc-nc-03> <875zdarr4h.fsf@nanos.tec.linutronix.de> <20200507121850.GB85463@otc-nc-03> <87wo5nj48a.fsf@nanos.tec.linutronix.de> <20200507175715.GA22426@otc-nc-03> Date: Thu, 07 May 2020 21:41:36 +0200 Message-ID: <87blmzedn3.fsf@nanos.tec.linutronix.de> MIME-Version: 1.0 Content-Type: text/plain X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Ashok, "Raj, Ashok" writes: > > I think i got mixed up with logical apic id and logical cpu :-( Stuff happens. > -0 [000] d.h. 44.376659: msi_set_affinity: quirk[1] new vector allocated, new apic = 2 vector = 33 this apic = 0 > -0 [000] d.h. 44.376684: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 0 Nvec 33 Napic 2 > -0 [000] d.h. 44.376685: xhci_irq: xhci irq > -0 [001] d.h. 44.376750: msi_set_affinity: quirk[1] new vector allocated, new apic = 2 vector = 33 this apic = 2 > -0 [001] d.h. 44.376774: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 2 Nvec 33 Napic 2 > -0 [001] d.h. 44.376776: xhci_irq: xhci irq > -0 [001] d.h. 44.395824: xhci_irq: xhci irq > <...>-14 [001] d..1 44.400666: msi_set_affinity: quirk[1] new vector allocated, new apic = 6 vector = 33 this apic = 2 > <...>-14 [001] d..1 44.400691: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 2 Nvec 33 Napic 6 > -0 [003] d.h. 44.421021: xhci_irq: xhci irq > -0 [003] d.h. 44.421135: xhci_irq: xhci irq > migration/3-24 [003] d..1 44.421784: msi_set_affinity: quirk[1] new vector allocated, new apic = 0 vector = 33 this apic = 6 > migration/3-24 [003] d..1 44.421803: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 6 Nvec 33 Napic 0 So this last one is a direct update. Straight forward moving it from one to the other CPU on the same vector number. And that's the case where we either expect the interrupt to come in on CPU3 or on CPU0. There is actually an example in the trace: -0 [000] d.h. 40.616467: msi_set_affinity: quirk[1] new vector allocated, new apic = 2 vector = 33 this apic = 0 -0 [000] d.h. 40.616488: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 0 Nvec 33 Napic 2 -0 [000] d.h. 40.616488: xhci_irq: xhci irq -0 [001] d.h. 40.616504: xhci_irq: xhci irq > migration/3-24 [003] d..1 44.421784: msi_set_affinity: quirk[1] new vector allocated, new apic = 0 vector = 33 this apic = 6 > migration/3-24 [003] d..1 44.421803: msi_set_affinity: Direct Update: irq 123 Ovec=33 Oapic 6 Nvec 33 Napic 0 But as this last one is the migration thread, aka stomp machine, I assume this is a hotplug operation. Which means the CPU cannot handle interrupts anymore. In that case we check the old vector on the unplugged CPU in fixup_irqs() and do the retrigger from there. Can you please add tracing to that one as well? Thanks, tglx