All of lore.kernel.org
 help / color / mirror / Atom feed
From: Lyude Paul <lyude@redhat.com>
To: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>,
	Thomas Gleixner <tglx@linutronix.de>
Cc: "hpa@zytor.com" <hpa@zytor.com>,
	"keith.busch@intel.com" <keith.busch@intel.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Borislav Petkov <bp@alien8.de>
Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel
Date: Wed, 24 Jan 2018 14:56:50 -0500	[thread overview]
Message-ID: <1516823810.4109.26.camel@redhat.com> (raw)
In-Reply-To: <DM5PR12MB19161E66F534B7050B8CCBF0F8E20@DM5PR12MB1916.namprd12.prod.outlook.com>

On Wed, 2018-01-24 at 19:13 +0000, Ghannam, Yazen wrote:
> > -----Original Message-----
> > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> > owner@vger.kernel.org] On Behalf Of Lyude Paul
> > Sent: Wednesday, January 24, 2018 12:49 PM
> > To: Thomas Gleixner <tglx@linutronix.de>
> > Cc: hpa@zytor.com; keith.busch@intel.com; mingo@kernel.org; linux-
> > kernel@vger.kernel.org
> > Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau
> > in
> > mainline kernel
> > 
> > Hi, please ignore the warning: it happens before and after the regressing
> > commit (I didn't actually mean to include it on the log I gave here,
> > whoops).
> > As for how I determined nouveau is getting assigned the same IRQ vector as
> > another device, I checked using /sys/kernel/debug/irq. Additionally; when
> > nouveau does initialize properly after resume (e.g. after reverting this
> > patch) I see it get assigned a seperate vector from the other devices.
> > 
> 
> +Boris. This thread seems to have split.
> 
> Lyude,
> Does the warning show on mainline or does it only show when bisecting?
> 
> Sorry, I'm not sure what you mean by "it happens before and after the
> regressing commit".
Sorry about that! Let me clarify a little bit: this is a problem that shows up
on mainline. Normally when we suspend the GPU in nouveau, we free the IRQs
it's using before going into suspend
(drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:88), then reserve IRQs again
on resume (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:134). Since this
patch got pushed to mainline, the IRQ we get from request_irq() ends up having
the same MSI vector as another device on the system:

Before suspend, nouveau's IRQ allocation:

    handler:  handle_edge_irq
    device:   0000:22:00.0
    status:   0x00000000
    istate:   0x00000000
    ddepth:   0
    wdepth:   0
    dstate:   0x01400200
                IRQD_ACTIVATED
                IRQD_IRQ_STARTED
                IRQD_SINGLE_TARGET
    node:     0
    affinity: 0-7
    effectiv: 1
    pending:  
    domain:  PCI-MSI-2
     hwirq:   0x1100000
     chip:    PCI-MSI
      flags:   0x10
                 IRQCHIP_SKIP_SET_WAKE
     parent:
        domain:  VECTOR
         hwirq:   0x2f
         chip:    APIC
          flags:   0x0
         Vector:    35
         Target:     1

    After resume and allocating the interrupt for nouveau again, we get a message
    from the kernel saying: 

    [  217.150787] do_IRQ: 1.35 No irq handler for vector

    As well, nouveau ends up getting no interrupts from the card and as a result
    fails to come back up:

    [  219.153049] nouveau 0000:22:00.0: DRM: EVO timeout
    [  220.226254] r8169 0000:1e:00.0 enp30s0: link up
    [  221.153054] nouveau 0000:22:00.0: DRM: base-0: timeout
    [  223.153528] nouveau 0000:22:00.0: DRM: base-0: timeout

    If we look through all of the other IRQ allocations, we'll find that now two
    devices have the MSI vector 35:

    nouveau:
    handler:  handle_edge_irq
    device:   0000:22:00.0
    status:   0x00000000
    istate:   0x00000000
    ddepth:   0
    wdepth:   0
    dstate:   0x01400200
                IRQD_ACTIVATED
                IRQD_IRQ_STARTED
                IRQD_SINGLE_TARGET
    node:     0
    affinity: 0-7
    effectiv: 1
    pending:  
    domain:  PCI-MSI-2
     hwirq:   0x1100000
     chip:    PCI-MSI
      flags:   0x10
                 IRQCHIP_SKIP_SET_WAKE
     parent:
        domain:  VECTOR
         hwirq:   0x2f
         chip:    APIC
          flags:   0x0
         Vector:    35
         Target:     1

    and the PCI bridge (00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD]
    Family 17h (Models 00h-0fh) PCIe GPP Bridge):

        handler:  handle_edge_irq
        device:   0000:00:01.3
        status:   0x00000000
        istate:   0x00000000
        ddepth:   0
        wdepth:   0
        dstate:   0x03400200
                    IRQD_ACTIVATED
                    IRQD_IRQ_STARTED
                    IRQD_SINGLE_TARGET
        node:     0
        affinity: 0-7
        effectiv: 0
        pending:  
        domain:  PCI-MSI-2
         hwirq:   0x5800
         chip:    PCI-MSI
          flags:   0x10
                     IRQCHIP_SKIP_SET_WAKE
         parent:
            domain:  VECTOR
             hwirq:   0x19
             chip:    APIC
              flags:   0x0
             Vector:    35
             Target:     0

    hope this helps clarify, I will keep looking at this from my end as well
    > 
> 
> Boris,
> In any case, I like your idea on saving the block addresses. I can look into
> this.
> 
> Thanks,
> Yazen
-- 
Cheers,
	Lyude Paul

  reply	other threads:[~2018-01-24 19:56 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-23 22:01 "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel Lyude Paul
2018-01-24  1:26 ` Lyude Paul
2018-01-24 12:52   ` Thomas Gleixner
2018-01-24 17:49     ` Lyude Paul
2018-01-24 19:13       ` Ghannam, Yazen
2018-01-24 19:56         ` Lyude Paul [this message]
2018-01-24 20:02           ` Lyude Paul
2018-01-25  3:29             ` Mike Galbraith
2018-01-25 18:29               ` Lyude Paul
2018-01-25  8:54           ` Thomas Gleixner
2018-01-25 18:23             ` Lyude Paul
2018-01-25 18:46               ` Thomas Gleixner
2018-01-25 19:25                 ` Lyude Paul
2018-01-25 20:12                   ` Thomas Gleixner
2018-01-24 12:50 ` Thomas Gleixner
2018-01-24 13:38   ` Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1516823810.4109.26.camel@redhat.com \
    --to=lyude@redhat.com \
    --cc=Yazen.Ghannam@amd.com \
    --cc=bp@alien8.de \
    --cc=hpa@zytor.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.