* Severe reproducible nouveau breakage in 2.6.36 (and maybe .35) @ 2010-11-10 19:28 Andrew Lutomirski 2010-11-10 20:06 ` Andrew Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Andrew Lutomirski @ 2010-11-10 19:28 UTC (permalink / raw) To: linux-kernel, dri-devel, Ben Skeggs Hi all- Somewhere between 2.6.34-fedora-whatever and 2.6.36, Nouveau became extremely broken on my hardware. It appears to be triggered by a bug in my monitor (HP LP2475w), which causes the monitor to disappear from DVI when it goes to sleep. Every time the console blanks (in X or otherwise AFAICT) the system crashes oddly but unrecoverably. This is 100% reproducible by Ctrl-Alt-F2 followed by 'echo 1 >/sys/class/graphics/fb0/blank' *from SSH* and waiting a few seconds for the monitor to go to sleep, but it also happens if I just walk away from the computer long enough for it to blank itself. This is present on F14's kernel and on 2.6.36 from kernel.org. This may or may not be related to the unreproducible crashes that I used to get rarely on 2.6.34. The symptoms are: - netconsole becomes very unreliable. (This makes it rather hard to get any good debugging info because I don't have a real serial port.) - system doesn't answer pings. userspace seems dead as well. - capslock will work intermittently - the lockup detector doesn't say anything. - After a few seconds, the system thinks that the tsc is massively unstable and switches clocksources. (I think this is because the clocksource watchdog fails to schedule for awhile and then somehow ends up running and thinking it detected a clocksource failure.) - SysRq-c will give me my console back and spew (useless?) garbage. Usually it also causes a panic and I get nothing else out of the system. The most recent time I triggered this, I got an amazing amount of console spew about unexpected NMIs. None of it made it to serial console, and the part left on the screen was so far down as to be pretty much useless. lockdep shows nothing interesting (or at least nothing interesting that stays on the screen long enough for me to read). The best hint I have is from this patch (sorry for whitespace damage): diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index 612fa6d..6823a4d 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -1014,6 +1014,8 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) uint32_t unplug_mask, plug_mask, change_mask; uint32_t hpd0, hpd1 = 0; + printk(KERN_ERR "in nv50_display_irq_hotplug_bh\n"); + hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050); if (dev_priv->chipset >= 0x90) hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); @@ -1062,6 +1064,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) if (dev_priv->chipset >= 0x90) nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); + printk(KERN_ERR "about to drm_helper_hpd_irq_event\n"); drm_helper_hpd_irq_event(dev); } @@ -1072,6 +1075,7 @@ nv50_display_irq_handler(struct drm_device *dev) uint32_t delayed = 0; if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { + printk(KERN_ERR "nv50 got hpd irq\n"); if (!work_pending(&dev_priv->hpd_work)) queue_work(dev_priv->wq, &dev_priv->hpd_work); } which spews "nv50 got hpd irq" once the display blanks. Nouveau startup says: [ 15.646535] nouveau 0000:04:00.0: PCI INT A -> GSI 24 (level, low) -> IRQ 24 [ 15.646540] nouveau 0000:04:00.0: setting latency timer to 64 [ 15.650606] [drm] nouveau 0000:04:00.0: Detected an NV50 generation card (0x086f00a2) [ 15.657126] [drm] nouveau 0000:04:00.0: Attempting to load BIOS image from PRAMIN [ 15.714410] [drm] nouveau 0000:04:00.0: ... appears to be valid [ 15.714413] [drm] nouveau 0000:04:00.0: BIT BIOS found [ 15.714415] [drm] nouveau 0000:04:00.0: Bios version 60.86.5b.00 [ 15.714418] [drm] nouveau 0000:04:00.0: TMDS table version 2.0 [ 15.714420] [drm] nouveau 0000:04:00.0: Found Display Configuration Block version 4.0 [ 15.714423] [drm] nouveau 0000:04:00.0: Raw DCB entry 0: 02011300 00000028 [ 15.714425] [drm] nouveau 0000:04:00.0: Raw DCB entry 1: 01011302 00000010 [ 15.714427] [drm] nouveau 0000:04:00.0: Raw DCB entry 2: 01000310 00000028 [ 15.714429] [drm] nouveau 0000:04:00.0: Raw DCB entry 3: 02000312 00000010 [ 15.714430] [drm] nouveau 0000:04:00.0: Raw DCB entry 4: 0000000e 00000000 [ 15.714433] [drm] nouveau 0000:04:00.0: DCB connector table: VHER 0x40 5 14 2 [ 15.714435] [drm] nouveau 0000:04:00.0: 0: 0x00002030: type 0x30 idx 0 tag 0x08 [ 15.714438] [drm] nouveau 0000:04:00.0: 1: 0x00001130: type 0x30 idx 1 tag 0x07 [ 15.714441] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 0 at offset 0xC34B [ 15.740011] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 1 at offset 0xC6B5 [ 15.758892] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 2 at offset 0xD2F6 [ 15.758903] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 3 at offset 0xD3E8 [ 15.760960] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table 4 at offset 0xD5E2 [ 15.760965] [drm] nouveau 0000:04:00.0: Parsing VBIOS init table at offset 0xD647 [ 15.781884] [drm] nouveau 0000:04:00.0: 0xD647: Condition still not met after 20ms, skipping following opcodes [ 15.781953] [drm] nouveau 0000:04:00.0: Detected 256MiB VRAM [ 15.873252] [TTM] Zone kernel: Available graphics memory: 3055420 kiB. [ 15.873256] [TTM] Zone dma32: Available graphics memory: 2097152 kiB. [ 15.873259] [TTM] Initializing pool allocator. [ 15.948218] [drm] nouveau 0000:04:00.0: 512 MiB GART (aperture) [ 15.983208] [drm] nouveau 0000:04:00.0: Allocating FIFO number 1 [ 15.998872] [drm] nouveau 0000:04:00.0: nouveau_channel_alloc: initialised FIFO 1 [ 16.158101] [drm] nouveau 0000:04:00.0: allocated 1920x1200 fb: 0x40230000, bo ffff8801b48a5000 [ 16.158315] fbcon: nouveaufb (fb0) is primary device [ 16.165464] Console: switching to colour frame buffer device 240x75 [ 16.168574] fb0: nouveaufb frame buffer device [ 16.168576] drm: registered panic notifier [ 16.168601] [drm] Initialized nouveau 0.0.16 20090420 for 0000:04:00.0 on minor 0 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: Severe reproducible nouveau breakage in 2.6.36 (and maybe .35) 2010-11-10 19:28 Severe reproducible nouveau breakage in 2.6.36 (and maybe .35) Andrew Lutomirski @ 2010-11-10 20:06 ` Andrew Lutomirski 2010-11-10 21:21 ` [PATCH 0/2] Fix nouveau-related freezes Andy Lutomirski 2010-11-10 21:32 ` Andy Lutomirski 0 siblings, 2 replies; 14+ messages in thread From: Andrew Lutomirski @ 2010-11-10 20:06 UTC (permalink / raw) To: linux-kernel, dri-devel, Ben Skeggs On Wed, Nov 10, 2010 at 2:28 PM, Andrew Lutomirski <andy@luto.us> wrote: > Hi all- > > Somewhere between 2.6.34-fedora-whatever and 2.6.36, Nouveau became > extremely broken on my hardware. It appears to be triggered by a bug > in my monitor (HP LP2475w), which causes the monitor to disappear from > DVI when it goes to sleep. Every time the console blanks (in X or > otherwise AFAICT) the system crashes oddly but unrecoverably. This is > 100% reproducible by Ctrl-Alt-F2 followed by 'echo 1 >>/sys/class/graphics/fb0/blank' *from SSH* and waiting a few seconds > for the monitor to go to sleep, but it also happens if I just walk > away from the computer long enough for it to blank itself. This is > present on F14's kernel and on 2.6.36 from kernel.org. This may or > may not be related to the unreproducible crashes that I used to get > rarely on 2.6.34. > > The best hint I have is from this patch (sorry for whitespace damage): > > > which spews "nv50 got hpd irq" once the display blanks. I tracked it down. The interrupt code in 2.6.36 is totally broken --- it acknowledges the interrupt *in the bottom half*. This might work by accident if the bottom half gets queued on a different CPU, but something probably changed (concurrency-managed workqueues?) that make the BH end up on the same cpu. So the cpu starves the BH and there goes a cpu. Then the clocksource watchdog hits and takes the whole system down when it calls stop_machine, which also gets starved on that cpu. Patch coming. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 0/2] Fix nouveau-related freezes 2010-11-10 20:06 ` Andrew Lutomirski @ 2010-11-10 21:21 ` Andy Lutomirski 2010-11-10 21:32 ` Andy Lutomirski 1 sibling, 0 replies; 14+ messages in thread From: Andy Lutomirski @ 2010-11-10 21:21 UTC (permalink / raw) To: Ben Skeggs, dri-devel; +Cc: linux-kernel, Andy Lutomirski Nouveau takes down my system quite reliably when any hotplug event occurs. The bug happens because the IRQ handler didn't acknowledge the hotplug state until the bottom half, so the card generated a new interrupt immediately, starving the bottom half and permanently starving that CPU (and hence the bottom half). Even with this fix, a lot of the IRQ code looks rather broken. This is tested on 2.6.36 (and makes the system stable for me), but it also applies cleanly to 2.6.37 (untested, but surely also necessary). Fedora 14's 2.6.35 kernels seem to have to same problem for me, so I suspect that 2.6.35 needs this fix as well. (All of my tests are on an NV50 card.) Andy Lutomirski (2): Use existing defines for NV50 hotplug registers nouveau: Acknowledge HPD irq in handler, not bottom half drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + drivers/gpu/drm/nouveau/nv50_display.c | 21 +++++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-) -- 1.7.3.2 >From 8055e8485f28491fe6219c512e379b4b89bcd465 Mon Sep 17 00:00:00 2001 Message-Id: <8055e8485f28491fe6219c512e379b4b89bcd465.1289423199.git.luto@mit.edu> In-Reply-To: <cover.1289423199.git.luto@mit.edu> References: <AANLkTimcEiBJtWx2tA=dqm6881g0B7NomXFsZauzfgy8@mail.gmail.com> <cover.1289423199.git.luto@mit.edu> From: Andy Lutomirski <luto@mit.edu> Date: Wed, 10 Nov 2010 14:49:12 -0500 Subject: [PATCH 1/2] Use existing defines for NV50 hotplug registers This doesn't change code at all, but it makes it a lot easier to understand. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: <stable@kernel.org> --- drivers/gpu/drm/nouveau/nv50_display.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index 612fa6d..83a7d27 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -453,8 +453,8 @@ static int nv50_display_disable(struct drm_device *dev) nv_wr32(dev, NV50_PDISPLAY_INTR_EN, 0x00000000); /* disable hotplug interrupts */ - nv_wr32(dev, 0xe054, 0xffffffff); - nv_wr32(dev, 0xe050, 0x00000000); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, 0xffffffff); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_INTR, 0x00000000); if (dev_priv->chipset >= 0x90) { nv_wr32(dev, 0xe074, 0xffffffff); nv_wr32(dev, 0xe070, 0x00000000); @@ -1014,7 +1014,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) uint32_t unplug_mask, plug_mask, change_mask; uint32_t hpd0, hpd1 = 0; - hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050); + hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); if (dev_priv->chipset >= 0x90) hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); @@ -1058,7 +1058,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); } - nv_wr32(dev, 0xe054, nv_rd32(dev, 0xe054)); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); if (dev_priv->chipset >= 0x90) nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); -- 1.7.3.2 >From cb559f4c96f82d5bf0c132b3330aecd4885a0dda Mon Sep 17 00:00:00 2001 Message-Id: <cb559f4c96f82d5bf0c132b3330aecd4885a0dda.1289423199.git.luto@mit.edu> In-Reply-To: <cover.1289423199.git.luto@mit.edu> References: <AANLkTimcEiBJtWx2tA=dqm6881g0B7NomXFsZauzfgy8@mail.gmail.com> <cover.1289423199.git.luto@mit.edu> From: Andy Lutomirski <luto@mit.edu> Date: Wed, 10 Nov 2010 15:08:39 -0500 Subject: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half The old code generated an interrupt storm bad enough to completely take down my system. This only fixes the bits that are defined nouveau_regs.h. Newer hardware uses another register that isn't described, and I don't have that hardware to test. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: <stable@kernel.org> --- drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index b1be617..b6c62cc 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -531,6 +531,11 @@ struct drm_nouveau_private { struct work_struct irq_work; struct work_struct hpd_work; + struct { + spinlock_t lock; + uint32_t hpd0_bits; + } hpd_state; + struct list_head vbl_waiting; struct { diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c index 794b0ee..b62a601 100644 --- a/drivers/gpu/drm/nouveau/nouveau_irq.c +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev) if (dev_priv->card_type >= NV_50) { INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh); INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh); + spin_lock_init(&dev_priv->hpd_state.lock); INIT_LIST_HEAD(&dev_priv->vbl_waiting); } } diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index 83a7d27..0df08e3 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) uint32_t unplug_mask, plug_mask, change_mask; uint32_t hpd0, hpd1 = 0; - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); + spin_lock_irq(&dev_priv->hpd_state.lock); + hpd0 = dev_priv->hpd_state.hpd0_bits; + dev_priv->hpd_state.hpd0_bits = 0; + spin_unlock_irq(&dev_priv->hpd_state.lock); + + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); if (dev_priv->chipset >= 0x90) hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); } - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); if (dev_priv->chipset >= 0x90) nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev) uint32_t delayed = 0; if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { - if (!work_pending(&dev_priv->hpd_work)) - queue_work(dev_priv->wq, &dev_priv->hpd_work); + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits); + spin_lock(&dev_priv->hpd_state.lock); + dev_priv->hpd_state.hpd0_bits |= hpd0_bits; + spin_unlock(&dev_priv->hpd_state.lock); + + queue_work(dev_priv->wq, &dev_priv->hpd_work); } while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) { -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 0/2] Fix nouveau-related freezes 2010-11-10 20:06 ` Andrew Lutomirski 2010-11-10 21:21 ` [PATCH 0/2] Fix nouveau-related freezes Andy Lutomirski @ 2010-11-10 21:32 ` Andy Lutomirski 2010-11-10 21:32 ` [PATCH 1/2] Use existing defines for NV50 hotplug registers Andy Lutomirski 2010-11-10 21:32 ` [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half Andy Lutomirski 1 sibling, 2 replies; 14+ messages in thread From: Andy Lutomirski @ 2010-11-10 21:32 UTC (permalink / raw) To: Ben Skeggs, dri-devel; +Cc: linux-kernel, Andy Lutomirski [sorry for resend -- apparently git-send-email doesn't like mbox files] Nouveau takes down my system quite reliably when any hotplug event occurs. The bug happens because the IRQ handler didn't acknowledge the hotplug state until the bottom half, so the card generated a new interrupt immediately, starving the bottom half and permanently starving that CPU (and hence the bottom half). Even with this fix, a lot of the IRQ code looks rather broken. This is tested on 2.6.36 (and makes the system stable for me), but it also applies cleanly to 2.6.37 (untested, but surely also necessary). Fedora 14's 2.6.35 kernels seem to have to same problem for me, so I suspect that 2.6.35 needs this fix as well. (All of my tests are on an NV50 card.) Andy Lutomirski (2): Use existing defines for NV50 hotplug registers nouveau: Acknowledge HPD irq in handler, not bottom half drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + drivers/gpu/drm/nouveau/nv50_display.c | 21 +++++++++++++++------ 3 files changed, 21 insertions(+), 6 deletions(-) -- 1.7.3.2 ^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/2] Use existing defines for NV50 hotplug registers 2010-11-10 21:32 ` Andy Lutomirski @ 2010-11-10 21:32 ` Andy Lutomirski 2010-11-10 21:32 ` [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half Andy Lutomirski 1 sibling, 0 replies; 14+ messages in thread From: Andy Lutomirski @ 2010-11-10 21:32 UTC (permalink / raw) To: Ben Skeggs, dri-devel; +Cc: linux-kernel, Andy Lutomirski This doesn't change code at all, but it makes it a lot easier to understand. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: <stable@kernel.org> --- drivers/gpu/drm/nouveau/nv50_display.c | 8 ++++---- 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index 612fa6d..83a7d27 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -453,8 +453,8 @@ static int nv50_display_disable(struct drm_device *dev) nv_wr32(dev, NV50_PDISPLAY_INTR_EN, 0x00000000); /* disable hotplug interrupts */ - nv_wr32(dev, 0xe054, 0xffffffff); - nv_wr32(dev, 0xe050, 0x00000000); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, 0xffffffff); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_INTR, 0x00000000); if (dev_priv->chipset >= 0x90) { nv_wr32(dev, 0xe074, 0xffffffff); nv_wr32(dev, 0xe070, 0x00000000); @@ -1014,7 +1014,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) uint32_t unplug_mask, plug_mask, change_mask; uint32_t hpd0, hpd1 = 0; - hpd0 = nv_rd32(dev, 0xe054) & nv_rd32(dev, 0xe050); + hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); if (dev_priv->chipset >= 0x90) hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); @@ -1058,7 +1058,7 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); } - nv_wr32(dev, 0xe054, nv_rd32(dev, 0xe054)); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); if (dev_priv->chipset >= 0x90) nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 21:32 ` Andy Lutomirski 2010-11-10 21:32 ` [PATCH 1/2] Use existing defines for NV50 hotplug registers Andy Lutomirski @ 2010-11-10 21:32 ` Andy Lutomirski 2010-11-10 22:10 ` Ben Skeggs 1 sibling, 1 reply; 14+ messages in thread From: Andy Lutomirski @ 2010-11-10 21:32 UTC (permalink / raw) To: Ben Skeggs, dri-devel; +Cc: linux-kernel, Andy Lutomirski The old code generated an interrupt storm bad enough to completely take down my system. This only fixes the bits that are defined nouveau_regs.h. Newer hardware uses another register that isn't described, and I don't have that hardware to test. Signed-off-by: Andy Lutomirski <luto@mit.edu> Cc: <stable@kernel.org> --- drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++---- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index b1be617..b6c62cc 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -531,6 +531,11 @@ struct drm_nouveau_private { struct work_struct irq_work; struct work_struct hpd_work; + struct { + spinlock_t lock; + uint32_t hpd0_bits; + } hpd_state; + struct list_head vbl_waiting; struct { diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c index 794b0ee..b62a601 100644 --- a/drivers/gpu/drm/nouveau/nouveau_irq.c +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev) if (dev_priv->card_type >= NV_50) { INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh); INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh); + spin_lock_init(&dev_priv->hpd_state.lock); INIT_LIST_HEAD(&dev_priv->vbl_waiting); } } diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c index 83a7d27..0df08e3 100644 --- a/drivers/gpu/drm/nouveau/nv50_display.c +++ b/drivers/gpu/drm/nouveau/nv50_display.c @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) uint32_t unplug_mask, plug_mask, change_mask; uint32_t hpd0, hpd1 = 0; - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); + spin_lock_irq(&dev_priv->hpd_state.lock); + hpd0 = dev_priv->hpd_state.hpd0_bits; + dev_priv->hpd_state.hpd0_bits = 0; + spin_unlock_irq(&dev_priv->hpd_state.lock); + + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); if (dev_priv->chipset >= 0x90) hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); } - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); if (dev_priv->chipset >= 0x90) nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev) uint32_t delayed = 0; if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { - if (!work_pending(&dev_priv->hpd_work)) - queue_work(dev_priv->wq, &dev_priv->hpd_work); + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL); + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits); + spin_lock(&dev_priv->hpd_state.lock); + dev_priv->hpd_state.hpd0_bits |= hpd0_bits; + spin_unlock(&dev_priv->hpd_state.lock); + + queue_work(dev_priv->wq, &dev_priv->hpd_work); } while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) { -- 1.7.3.2 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 21:32 ` [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half Andy Lutomirski @ 2010-11-10 22:10 ` Ben Skeggs 2010-11-10 22:25 ` Andrew Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Ben Skeggs @ 2010-11-10 22:10 UTC (permalink / raw) To: Andy Lutomirski; +Cc: dri-devel, linux-kernel On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: > The old code generated an interrupt storm bad enough to completely > take down my system. > > This only fixes the bits that are defined nouveau_regs.h. Newer hardware > uses another register that isn't described, and I don't have that hardware > to test. Thanks for looking at this. I'll take a closer look at the problem today and see what I can come up with too, that'll work with the newer hardware too. Ben. > > Signed-off-by: Andy Lutomirski <luto@mit.edu> > Cc: <stable@kernel.org> > --- > drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ > drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + > drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++---- > 3 files changed, 19 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h > index b1be617..b6c62cc 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_drv.h > +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h > @@ -531,6 +531,11 @@ struct drm_nouveau_private { > struct work_struct irq_work; > struct work_struct hpd_work; > > + struct { > + spinlock_t lock; > + uint32_t hpd0_bits; > + } hpd_state; > + > struct list_head vbl_waiting; > > struct { > diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c > index 794b0ee..b62a601 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_irq.c > +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c > @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev) > if (dev_priv->card_type >= NV_50) { > INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh); > INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh); > + spin_lock_init(&dev_priv->hpd_state.lock); > INIT_LIST_HEAD(&dev_priv->vbl_waiting); > } > } > diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c > index 83a7d27..0df08e3 100644 > --- a/drivers/gpu/drm/nouveau/nv50_display.c > +++ b/drivers/gpu/drm/nouveau/nv50_display.c > @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) > uint32_t unplug_mask, plug_mask, change_mask; > uint32_t hpd0, hpd1 = 0; > > - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); > + spin_lock_irq(&dev_priv->hpd_state.lock); > + hpd0 = dev_priv->hpd_state.hpd0_bits; > + dev_priv->hpd_state.hpd0_bits = 0; > + spin_unlock_irq(&dev_priv->hpd_state.lock); > + > + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); > if (dev_priv->chipset >= 0x90) > hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); > > @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) > helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); > } > > - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); > if (dev_priv->chipset >= 0x90) > nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); > > @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev) > uint32_t delayed = 0; > > if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { > - if (!work_pending(&dev_priv->hpd_work)) > - queue_work(dev_priv->wq, &dev_priv->hpd_work); > + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL); > + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits); > + spin_lock(&dev_priv->hpd_state.lock); > + dev_priv->hpd_state.hpd0_bits |= hpd0_bits; > + spin_unlock(&dev_priv->hpd_state.lock); > + > + queue_work(dev_priv->wq, &dev_priv->hpd_work); > } > > while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) { ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:10 ` Ben Skeggs @ 2010-11-10 22:25 ` Andrew Lutomirski 2010-11-10 22:35 ` Ben Skeggs 0 siblings, 1 reply; 14+ messages in thread From: Andrew Lutomirski @ 2010-11-10 22:25 UTC (permalink / raw) To: bskeggs; +Cc: dri-devel, linux-kernel On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: >> The old code generated an interrupt storm bad enough to completely >> take down my system. >> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware >> uses another register that isn't described, and I don't have that hardware >> to test. > Thanks for looking at this. I'll take a closer look at the problem > today and see what I can come up with too, that'll work with the newer > hardware too. It should be as simple as adding an hpd1 field to the hpd_state and making exactly the same change. (It would be nice to put the register definitions into nouveau_regs.h as well -- I didn't really want to muck around with a bunch of magic numbers that I can't test.) I tried writing 0xffffffff to the display IRQ control in the handler to explicitly acknowledge the IRQ, but either I did it wrong or it had no effect. I imagine that this explains the unreproducible crashes I had on F13 as well. --Andy > > Ben. >> >> Signed-off-by: Andy Lutomirski <luto@mit.edu> >> Cc: <stable@kernel.org> >> --- >> drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ >> drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + >> drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++---- >> 3 files changed, 19 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h >> index b1be617..b6c62cc 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h >> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h >> @@ -531,6 +531,11 @@ struct drm_nouveau_private { >> struct work_struct irq_work; >> struct work_struct hpd_work; >> >> + struct { >> + spinlock_t lock; >> + uint32_t hpd0_bits; >> + } hpd_state; >> + >> struct list_head vbl_waiting; >> >> struct { >> diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c >> index 794b0ee..b62a601 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_irq.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c >> @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev) >> if (dev_priv->card_type >= NV_50) { >> INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh); >> INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh); >> + spin_lock_init(&dev_priv->hpd_state.lock); >> INIT_LIST_HEAD(&dev_priv->vbl_waiting); >> } >> } >> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c >> index 83a7d27..0df08e3 100644 >> --- a/drivers/gpu/drm/nouveau/nv50_display.c >> +++ b/drivers/gpu/drm/nouveau/nv50_display.c >> @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) >> uint32_t unplug_mask, plug_mask, change_mask; >> uint32_t hpd0, hpd1 = 0; >> >> - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); >> + spin_lock_irq(&dev_priv->hpd_state.lock); >> + hpd0 = dev_priv->hpd_state.hpd0_bits; >> + dev_priv->hpd_state.hpd0_bits = 0; >> + spin_unlock_irq(&dev_priv->hpd_state.lock); >> + >> + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); >> if (dev_priv->chipset >= 0x90) >> hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); >> >> @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) >> helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); >> } >> >> - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); >> if (dev_priv->chipset >= 0x90) >> nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); >> >> @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev) >> uint32_t delayed = 0; >> >> if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { >> - if (!work_pending(&dev_priv->hpd_work)) >> - queue_work(dev_priv->wq, &dev_priv->hpd_work); >> + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL); >> + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits); >> + spin_lock(&dev_priv->hpd_state.lock); >> + dev_priv->hpd_state.hpd0_bits |= hpd0_bits; >> + spin_unlock(&dev_priv->hpd_state.lock); >> + >> + queue_work(dev_priv->wq, &dev_priv->hpd_work); >> } >> >> while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) { > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:25 ` Andrew Lutomirski @ 2010-11-10 22:35 ` Ben Skeggs 2010-11-10 22:51 ` Andrew Lutomirski 0 siblings, 1 reply; 14+ messages in thread From: Ben Skeggs @ 2010-11-10 22:35 UTC (permalink / raw) To: Andrew Lutomirski; +Cc: dri-devel, linux-kernel On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: > On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: > >> The old code generated an interrupt storm bad enough to completely > >> take down my system. > >> > >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware > >> uses another register that isn't described, and I don't have that hardware > >> to test. > > Thanks for looking at this. I'll take a closer look at the problem > > today and see what I can come up with too, that'll work with the newer > > hardware too. > > It should be as simple as adding an hpd1 field to the hpd_state and > making exactly the same change. (It would be nice to put the register > definitions into nouveau_regs.h as well -- I didn't really want to > muck around with a bunch of magic numbers that I can't test.) Yes, it is. I can confirm the problem on another card, but it doesn't actually cause any crashes here. If you can rework the patch to support the newer chips too, that'd be great. As for magic numbers, the register names for those regs are wrong anyway. The joy of reverse-engineering the support. It doesn't really matter if you want to stick to them or go back to "magic" numbers. Ben. > > I tried writing 0xffffffff to the display IRQ control in the handler > to explicitly acknowledge the IRQ, but either I did it wrong or it had > no effect. > > I imagine that this explains the unreproducible crashes I had on F13 as well. > > --Andy > > > > > Ben. > >> > >> Signed-off-by: Andy Lutomirski <luto@mit.edu> > >> Cc: <stable@kernel.org> > >> --- > >> drivers/gpu/drm/nouveau/nouveau_drv.h | 5 +++++ > >> drivers/gpu/drm/nouveau/nouveau_irq.c | 1 + > >> drivers/gpu/drm/nouveau/nv50_display.c | 17 +++++++++++++---- > >> 3 files changed, 19 insertions(+), 4 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h > >> index b1be617..b6c62cc 100644 > >> --- a/drivers/gpu/drm/nouveau/nouveau_drv.h > >> +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h > >> @@ -531,6 +531,11 @@ struct drm_nouveau_private { > >> struct work_struct irq_work; > >> struct work_struct hpd_work; > >> > >> + struct { > >> + spinlock_t lock; > >> + uint32_t hpd0_bits; > >> + } hpd_state; > >> + > >> struct list_head vbl_waiting; > >> > >> struct { > >> diff --git a/drivers/gpu/drm/nouveau/nouveau_irq.c b/drivers/gpu/drm/nouveau/nouveau_irq.c > >> index 794b0ee..b62a601 100644 > >> --- a/drivers/gpu/drm/nouveau/nouveau_irq.c > >> +++ b/drivers/gpu/drm/nouveau/nouveau_irq.c > >> @@ -52,6 +52,7 @@ nouveau_irq_preinstall(struct drm_device *dev) > >> if (dev_priv->card_type >= NV_50) { > >> INIT_WORK(&dev_priv->irq_work, nv50_display_irq_handler_bh); > >> INIT_WORK(&dev_priv->hpd_work, nv50_display_irq_hotplug_bh); > >> + spin_lock_init(&dev_priv->hpd_state.lock); > >> INIT_LIST_HEAD(&dev_priv->vbl_waiting); > >> } > >> } > >> diff --git a/drivers/gpu/drm/nouveau/nv50_display.c b/drivers/gpu/drm/nouveau/nv50_display.c > >> index 83a7d27..0df08e3 100644 > >> --- a/drivers/gpu/drm/nouveau/nv50_display.c > >> +++ b/drivers/gpu/drm/nouveau/nv50_display.c > >> @@ -1014,7 +1014,12 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) > >> uint32_t unplug_mask, plug_mask, change_mask; > >> uint32_t hpd0, hpd1 = 0; > >> > >> - hpd0 = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL) & nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); > >> + spin_lock_irq(&dev_priv->hpd_state.lock); > >> + hpd0 = dev_priv->hpd_state.hpd0_bits; > >> + dev_priv->hpd_state.hpd0_bits = 0; > >> + spin_unlock_irq(&dev_priv->hpd_state.lock); > >> + > >> + hpd0 &= nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_INTR); > >> if (dev_priv->chipset >= 0x90) > >> hpd1 = nv_rd32(dev, 0xe074) & nv_rd32(dev, 0xe070); > >> > >> @@ -1058,7 +1063,6 @@ nv50_display_irq_hotplug_bh(struct work_struct *work) > >> helper->dpms(connector->encoder, DRM_MODE_DPMS_OFF); > >> } > >> > >> - nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL)); > >> if (dev_priv->chipset >= 0x90) > >> nv_wr32(dev, 0xe074, nv_rd32(dev, 0xe074)); > >> > >> @@ -1072,8 +1076,13 @@ nv50_display_irq_handler(struct drm_device *dev) > >> uint32_t delayed = 0; > >> > >> if (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_HOTPLUG) { > >> - if (!work_pending(&dev_priv->hpd_work)) > >> - queue_work(dev_priv->wq, &dev_priv->hpd_work); > >> + uint32_t hpd0_bits = nv_rd32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL); > >> + nv_wr32(dev, NV50_PCONNECTOR_HOTPLUG_CTRL, hpd0_bits); > >> + spin_lock(&dev_priv->hpd_state.lock); > >> + dev_priv->hpd_state.hpd0_bits |= hpd0_bits; > >> + spin_unlock(&dev_priv->hpd_state.lock); > >> + > >> + queue_work(dev_priv->wq, &dev_priv->hpd_work); > >> } > >> > >> while (nv_rd32(dev, NV50_PMC_INTR_0) & NV50_PMC_INTR_0_DISPLAY) { > > > > > > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:35 ` Ben Skeggs @ 2010-11-10 22:51 ` Andrew Lutomirski 2010-11-10 22:55 ` Maarten Maathuis 2010-11-10 22:58 ` Ben Skeggs 0 siblings, 2 replies; 14+ messages in thread From: Andrew Lutomirski @ 2010-11-10 22:51 UTC (permalink / raw) To: bskeggs; +Cc: dri-devel, linux-kernel On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: >> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: >> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: >> >> The old code generated an interrupt storm bad enough to completely >> >> take down my system. >> >> >> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware >> >> uses another register that isn't described, and I don't have that hardware >> >> to test. >> > Thanks for looking at this. I'll take a closer look at the problem >> > today and see what I can come up with too, that'll work with the newer >> > hardware too. >> >> It should be as simple as adding an hpd1 field to the hpd_state and >> making exactly the same change. (It would be nice to put the register >> definitions into nouveau_regs.h as well -- I didn't really want to >> muck around with a bunch of magic numbers that I can't test.) > Yes, it is. I can confirm the problem on another card, but it doesn't > actually cause any crashes here. If you can rework the patch to support > the newer chips too, that'd be great. > > As for magic numbers, the register names for those regs are wrong > anyway. The joy of reverse-engineering the support. It doesn't really > matter if you want to stick to them or go back to "magic" numbers. That explains why INTR and CTRL seemed backwards :) I'll leave the magic numbers for the 0xe07? stuff. Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back. Patch to follow after I boot and test it here. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:51 ` Andrew Lutomirski @ 2010-11-10 22:55 ` Maarten Maathuis 2010-11-10 23:01 ` Andrew Lutomirski 2010-11-10 22:58 ` Ben Skeggs 1 sibling, 1 reply; 14+ messages in thread From: Maarten Maathuis @ 2010-11-10 22:55 UTC (permalink / raw) To: Andrew Lutomirski; +Cc: bskeggs, linux-kernel, dri-devel On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <luto@mit.edu> wrote: > On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <bskeggs@redhat.com> wrote: >> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: >>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: >>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: >>> >> The old code generated an interrupt storm bad enough to completely >>> >> take down my system. >>> >> >>> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware >>> >> uses another register that isn't described, and I don't have that hardware >>> >> to test. >>> > Thanks for looking at this. I'll take a closer look at the problem >>> > today and see what I can come up with too, that'll work with the newer >>> > hardware too. >>> >>> It should be as simple as adding an hpd1 field to the hpd_state and >>> making exactly the same change. (It would be nice to put the register >>> definitions into nouveau_regs.h as well -- I didn't really want to >>> muck around with a bunch of magic numbers that I can't test.) >> Yes, it is. I can confirm the problem on another card, but it doesn't >> actually cause any crashes here. If you can rework the patch to support >> the newer chips too, that'd be great. >> >> As for magic numbers, the register names for those regs are wrong >> anyway. The joy of reverse-engineering the support. It doesn't really >> matter if you want to stick to them or go back to "magic" numbers. > > That explains why INTR and CTRL seemed backwards :) I'll leave the > magic numbers for the 0xe07? stuff. Perhaps remove the bad definitions from the reg file, or rename them to UNKsomething? > > Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back. > > Patch to follow after I boot and test it here. > > --Andy > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel > -- Far away from the primal instinct, the song seems to fade away, the river get wider between your thoughts and the things we do and say. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:55 ` Maarten Maathuis @ 2010-11-10 23:01 ` Andrew Lutomirski 2010-11-10 23:12 ` Ben Skeggs 0 siblings, 1 reply; 14+ messages in thread From: Andrew Lutomirski @ 2010-11-10 23:01 UTC (permalink / raw) To: Maarten Maathuis; +Cc: bskeggs, linux-kernel, dri-devel On Wed, Nov 10, 2010 at 5:55 PM, Maarten Maathuis <madman2003@gmail.com> wrote: > On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <luto@mit.edu> wrote: >> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <bskeggs@redhat.com> wrote: >>> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: >>>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: >>>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: >>>> >> The old code generated an interrupt storm bad enough to completely >>>> >> take down my system. >>>> >> >>>> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware >>>> >> uses another register that isn't described, and I don't have that hardware >>>> >> to test. >>>> > Thanks for looking at this. I'll take a closer look at the problem >>>> > today and see what I can come up with too, that'll work with the newer >>>> > hardware too. >>>> >>>> It should be as simple as adding an hpd1 field to the hpd_state and >>>> making exactly the same change. (It would be nice to put the register >>>> definitions into nouveau_regs.h as well -- I didn't really want to >>>> muck around with a bunch of magic numbers that I can't test.) >>> Yes, it is. I can confirm the problem on another card, but it doesn't >>> actually cause any crashes here. If you can rework the patch to support >>> the newer chips too, that'd be great. >>> >>> As for magic numbers, the register names for those regs are wrong >>> anyway. The joy of reverse-engineering the support. It doesn't really >>> matter if you want to stick to them or go back to "magic" numbers. >> >> That explains why INTR and CTRL seemed backwards :) I'll leave the >> magic numbers for the 0xe07? stuff. > > Perhaps remove the bad definitions from the reg file, or rename them > to UNKsomething? Well, they're known. One is hotplug detect enable (unless the code is wrong) and the other is hotplug interrupt status. --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 23:01 ` Andrew Lutomirski @ 2010-11-10 23:12 ` Ben Skeggs 0 siblings, 0 replies; 14+ messages in thread From: Ben Skeggs @ 2010-11-10 23:12 UTC (permalink / raw) To: Andrew Lutomirski; +Cc: Maarten Maathuis, linux-kernel, dri-devel On Wed, 2010-11-10 at 18:01 -0500, Andrew Lutomirski wrote: > On Wed, Nov 10, 2010 at 5:55 PM, Maarten Maathuis <madman2003@gmail.com> wrote: > > On Wed, Nov 10, 2010 at 11:51 PM, Andrew Lutomirski <luto@mit.edu> wrote: > >> On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > >>> On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: > >>>> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > >>>> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: > >>>> >> The old code generated an interrupt storm bad enough to completely > >>>> >> take down my system. > >>>> >> > >>>> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware > >>>> >> uses another register that isn't described, and I don't have that hardware > >>>> >> to test. > >>>> > Thanks for looking at this. I'll take a closer look at the problem > >>>> > today and see what I can come up with too, that'll work with the newer > >>>> > hardware too. > >>>> > >>>> It should be as simple as adding an hpd1 field to the hpd_state and > >>>> making exactly the same change. (It would be nice to put the register > >>>> definitions into nouveau_regs.h as well -- I didn't really want to > >>>> muck around with a bunch of magic numbers that I can't test.) > >>> Yes, it is. I can confirm the problem on another card, but it doesn't > >>> actually cause any crashes here. If you can rework the patch to support > >>> the newer chips too, that'd be great. > >>> > >>> As for magic numbers, the register names for those regs are wrong > >>> anyway. The joy of reverse-engineering the support. It doesn't really > >>> matter if you want to stick to them or go back to "magic" numbers. > >> > >> That explains why INTR and CTRL seemed backwards :) I'll leave the > >> magic numbers for the 0xe07? stuff. > > > > Perhaps remove the bad definitions from the reg file, or rename them > > to UNKsomething? > > Well, they're known. One is hotplug detect enable (unless the code is > wrong) and the other is hotplug interrupt status. That's also not correct, if anything the most accurate names so far would probably be: #define NV_PGPIO_INTR_EN_0 0xe050 #define NV_PGPIO_INTR_0 0xe054 #define NV_PGPIO_INTR_EN_1 0xe070 #define NV_PGPIO_INTR_1 0xe074 PGPIO is a guess, and there's other stuff in that range too, but it's definitely *not* PCONNECTOR. Anyway, this doesn't matter. Whatever change in names can happen in nouveau git and make it's way to Linus from there, the fix for nouveau git is already going to be different enough from what'll apply on Linus' tree right now. My opinion is, lets just fix the bug in mainline (without register naming) and fix the naming etc in nouveau git. Ben. > > > > --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half 2010-11-10 22:51 ` Andrew Lutomirski 2010-11-10 22:55 ` Maarten Maathuis @ 2010-11-10 22:58 ` Ben Skeggs 1 sibling, 0 replies; 14+ messages in thread From: Ben Skeggs @ 2010-11-10 22:58 UTC (permalink / raw) To: Andrew Lutomirski; +Cc: dri-devel, linux-kernel On Wed, 2010-11-10 at 17:51 -0500, Andrew Lutomirski wrote: > On Wed, Nov 10, 2010 at 5:35 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > > On Wed, 2010-11-10 at 17:25 -0500, Andrew Lutomirski wrote: > >> On Wed, Nov 10, 2010 at 5:10 PM, Ben Skeggs <bskeggs@redhat.com> wrote: > >> > On Wed, 2010-11-10 at 16:32 -0500, Andy Lutomirski wrote: > >> >> The old code generated an interrupt storm bad enough to completely > >> >> take down my system. > >> >> > >> >> This only fixes the bits that are defined nouveau_regs.h. Newer hardware > >> >> uses another register that isn't described, and I don't have that hardware > >> >> to test. > >> > Thanks for looking at this. I'll take a closer look at the problem > >> > today and see what I can come up with too, that'll work with the newer > >> > hardware too. > >> > >> It should be as simple as adding an hpd1 field to the hpd_state and > >> making exactly the same change. (It would be nice to put the register > >> definitions into nouveau_regs.h as well -- I didn't really want to > >> muck around with a bunch of magic numbers that I can't test.) > > Yes, it is. I can confirm the problem on another card, but it doesn't > > actually cause any crashes here. If you can rework the patch to support > > the newer chips too, that'd be great. > > > > As for magic numbers, the register names for those regs are wrong > > anyway. The joy of reverse-engineering the support. It doesn't really > > matter if you want to stick to them or go back to "magic" numbers. > > That explains why INTR and CTRL seemed backwards :) I'll leave the > magic numbers for the 0xe07? stuff. That sounds good, it'll all get a cleanup at some point and switched to "proper" (well, our best guess, you'd have to ask NVIDIA about the real ones) names. Ben. > > Also, I accidentally dropped the "& enabled_bits" part -- I'll put that back. > > Patch to follow after I boot and test it here. > > --Andy ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2010-11-10 23:12 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-11-10 19:28 Severe reproducible nouveau breakage in 2.6.36 (and maybe .35) Andrew Lutomirski 2010-11-10 20:06 ` Andrew Lutomirski 2010-11-10 21:21 ` [PATCH 0/2] Fix nouveau-related freezes Andy Lutomirski 2010-11-10 21:32 ` Andy Lutomirski 2010-11-10 21:32 ` [PATCH 1/2] Use existing defines for NV50 hotplug registers Andy Lutomirski 2010-11-10 21:32 ` [PATCH 2/2] nouveau: Acknowledge HPD irq in handler, not bottom half Andy Lutomirski 2010-11-10 22:10 ` Ben Skeggs 2010-11-10 22:25 ` Andrew Lutomirski 2010-11-10 22:35 ` Ben Skeggs 2010-11-10 22:51 ` Andrew Lutomirski 2010-11-10 22:55 ` Maarten Maathuis 2010-11-10 23:01 ` Andrew Lutomirski 2010-11-10 23:12 ` Ben Skeggs 2010-11-10 22:58 ` Ben Skeggs
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox