From: Lyude Paul <lyude@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>,
"hpa@zytor.com" <hpa@zytor.com>,
"keith.busch@intel.com" <keith.busch@intel.com>,
"mingo@kernel.org" <mingo@kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
Borislav Petkov <bp@alien8.de>
Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel
Date: Thu, 25 Jan 2018 13:23:58 -0500 [thread overview]
Message-ID: <1516904638.5161.1.camel@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.20.1801250945080.2020@nanos>
I think you are right, apologies. Glad to know this isn't a regression in the
IRQ handling code :). It looks like our nouveau problems are probably coming
from the fact that we don't just leave IRQs setup through suspend/resume which
as far as I can tell, is probably not the correct thing to do.
Going to get some patches onto the mailing list for this, thanks for the help!
On Thu, 2018-01-25 at 09:54 +0100, Thomas Gleixner wrote:
> On Wed, 24 Jan 2018, Lyude Paul wrote:
> > Sorry about that! Let me clarify a little bit: this is a problem that shows
> > up
> > on mainline. Normally when we suspend the GPU in nouveau, we free the IRQs
> > it's using before going into suspend
> > (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:88), then reserve IRQs again
> > on resume (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:134). Since this
> > patch got pushed to mainline, the IRQ we get from request_irq() ends up
> > having
> > the same MSI vector as another device on the system:
>
> It's not the same.
>
> > nouveau:
> > parent:
> > domain: VECTOR
> > hwirq: 0x2f
> > chip: APIC
> > flags: 0x0
> > Vector: 35
> > Target: 1
>
> Vector 35 on CPU1
>
> > After resume and allocating the interrupt for nouveau again, we get a
> > message
> > from the kernel saying:
> >
> > [ 217.150787] do_IRQ: 1.35 No irq handler for vector
>
> That's because there is a pending irq on the old vector for unknown reasons.
>
> > As well, nouveau ends up getting no interrupts from the card and as a
> > result
> > fails to come back up:
> >
> > [ 219.153049] nouveau 0000:22:00.0: DRM: EVO timeout
> > [ 220.226254] r8169 0000:1e:00.0 enp30s0: link up
> > [ 221.153054] nouveau 0000:22:00.0: DRM: base-0: timeout
> > [ 223.153528] nouveau 0000:22:00.0: DRM: base-0: timeout
> >
> > If we look through all of the other IRQ allocations, we'll find that now
> > two
> > devices have the MSI vector 35:
> >
> > nouveau:
> > parent:
> > domain: VECTOR
> > hwirq: 0x2f
> > chip: APIC
> > flags: 0x0
> > Vector: 35
> > Target: 1
>
> Vector 35 on CPU1
>
> > and the PCI bridge (00:01.3 PCI bridge: Advanced Micro Devices, Inc.
> > [AMD]
> > Family 17h (Models 00h-0fh) PCIe GPP Bridge):
> >
> > parent:
> > domain: VECTOR
> > hwirq: 0x19
> > chip: APIC
> > flags: 0x0
> > Vector: 35
> > Target: 0
>
> Vector 35 on CPU0. Same vector but different CPUs. So it's NOT the same
> thing.
>
> The real issue is something completely different and the revert of this
> patch merily papers over the underlying problem. I'm pretty sure that you
> can trigger this even with the revert in place. Do the following before
> suspend:
>
> echo 2 >/proc/irq/$NOUVEAUIRQ/smp_affinity_list
>
> Then do suspend/resume and you should end up with the same situation.
>
> I can't tell from your dmesg, but I'm pretty confident that
>
> > [ 217.150787] do_IRQ: 1.35 No irq handler for vector
>
> happens _before_ the nouveau driver requests the irq again. Can please you
> add some printk to the code in question to verify that?
>
> Thanks,
>
> tglx
next prev parent reply other threads:[~2018-01-25 18:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-01-23 22:01 "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel Lyude Paul
2018-01-24 1:26 ` Lyude Paul
2018-01-24 12:52 ` Thomas Gleixner
2018-01-24 17:49 ` Lyude Paul
2018-01-24 19:13 ` Ghannam, Yazen
2018-01-24 19:56 ` Lyude Paul
2018-01-24 20:02 ` Lyude Paul
2018-01-25 3:29 ` Mike Galbraith
2018-01-25 18:29 ` Lyude Paul
2018-01-25 8:54 ` Thomas Gleixner
2018-01-25 18:23 ` Lyude Paul [this message]
2018-01-25 18:46 ` Thomas Gleixner
2018-01-25 19:25 ` Lyude Paul
2018-01-25 20:12 ` Thomas Gleixner
2018-01-24 12:50 ` Thomas Gleixner
2018-01-24 13:38 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1516904638.5161.1.camel@redhat.com \
--to=lyude@redhat.com \
--cc=Yazen.Ghannam@amd.com \
--cc=bp@alien8.de \
--cc=hpa@zytor.com \
--cc=keith.busch@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.