From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752665AbeAXT4y (ORCPT ); Wed, 24 Jan 2018 14:56:54 -0500 Received: from mail-qt0-f169.google.com ([209.85.216.169]:37223 "EHLO mail-qt0-f169.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752518AbeAXT4x (ORCPT ); Wed, 24 Jan 2018 14:56:53 -0500 X-Google-Smtp-Source: AH8x226a1CrVKG+N4JRzFnFLNCQ2V8qgARlUgJ22zZ4CDt9bZWIDH+NDCU3PIH/5QFdDFtkj+8Z0cg== Message-ID: <1516823810.4109.26.camel@redhat.com> Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau in mainline kernel From: Lyude Paul To: "Ghannam, Yazen" , Thomas Gleixner Cc: "hpa@zytor.com" , "keith.busch@intel.com" , "mingo@kernel.org" , "linux-kernel@vger.kernel.org" , Borislav Petkov Date: Wed, 24 Jan 2018 14:56:50 -0500 In-Reply-To: References: <1516744873.29151.3.camel@redhat.com> <1516757219.29151.7.camel@redhat.com> <1516816150.4109.2.camel@redhat.com> Organization: Red Hat Content-Type: text/plain; charset="UTF-8" X-Mailer: Evolution 3.26.4 (3.26.4-1.fc27) Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2018-01-24 at 19:13 +0000, Ghannam, Yazen wrote: > > -----Original Message----- > > From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel- > > owner@vger.kernel.org] On Behalf Of Lyude Paul > > Sent: Wednesday, January 24, 2018 12:49 PM > > To: Thomas Gleixner > > Cc: hpa@zytor.com; keith.busch@intel.com; mingo@kernel.org; linux- > > kernel@vger.kernel.org > > Subject: Re: "irq/matrix: Spread interrupts on allocation" breaks nouveau > > in > > mainline kernel > > > > Hi, please ignore the warning: it happens before and after the regressing > > commit (I didn't actually mean to include it on the log I gave here, > > whoops). > > As for how I determined nouveau is getting assigned the same IRQ vector as > > another device, I checked using /sys/kernel/debug/irq. Additionally; when > > nouveau does initialize properly after resume (e.g. after reverting this > > patch) I see it get assigned a seperate vector from the other devices. > > > > +Boris. This thread seems to have split. > > Lyude, > Does the warning show on mainline or does it only show when bisecting? > > Sorry, I'm not sure what you mean by "it happens before and after the > regressing commit". Sorry about that! Let me clarify a little bit: this is a problem that shows up on mainline. Normally when we suspend the GPU in nouveau, we free the IRQs it's using before going into suspend (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:88), then reserve IRQs again on resume (drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c:134). Since this patch got pushed to mainline, the IRQ we get from request_irq() ends up having the same MSI vector as another device on the system: Before suspend, nouveau's IRQ allocation: handler: handle_edge_irq device: 0000:22:00.0 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x01400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 1 pending: domain: PCI-MSI-2 hwirq: 0x1100000 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x2f chip: APIC flags: 0x0 Vector: 35 Target: 1 After resume and allocating the interrupt for nouveau again, we get a message from the kernel saying: [ 217.150787] do_IRQ: 1.35 No irq handler for vector As well, nouveau ends up getting no interrupts from the card and as a result fails to come back up: [ 219.153049] nouveau 0000:22:00.0: DRM: EVO timeout [ 220.226254] r8169 0000:1e:00.0 enp30s0: link up [ 221.153054] nouveau 0000:22:00.0: DRM: base-0: timeout [ 223.153528] nouveau 0000:22:00.0: DRM: base-0: timeout If we look through all of the other IRQ allocations, we'll find that now two devices have the MSI vector 35: nouveau: handler: handle_edge_irq device: 0000:22:00.0 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x01400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 1 pending: domain: PCI-MSI-2 hwirq: 0x1100000 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x2f chip: APIC flags: 0x0 Vector: 35 Target: 1 and the PCI bridge (00:01.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) PCIe GPP Bridge): handler: handle_edge_irq device: 0000:00:01.3 status: 0x00000000 istate: 0x00000000 ddepth: 0 wdepth: 0 dstate: 0x03400200 IRQD_ACTIVATED IRQD_IRQ_STARTED IRQD_SINGLE_TARGET node: 0 affinity: 0-7 effectiv: 0 pending: domain: PCI-MSI-2 hwirq: 0x5800 chip: PCI-MSI flags: 0x10 IRQCHIP_SKIP_SET_WAKE parent: domain: VECTOR hwirq: 0x19 chip: APIC flags: 0x0 Vector: 35 Target: 0 hope this helps clarify, I will keep looking at this from my end as well > > > Boris, > In any case, I like your idea on saving the block addresses. I can look into > this. > > Thanks, > Yazen -- Cheers, Lyude Paul