public inbox for kernel-testers@vger.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: reinette chatre <reinette.chatre@intel.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Kernel Testers List <kernel-testers@vger.kernel.org>,
	Eric Anholt <eric@anholt.net>, "Ma, Ling" <ling.ma@intel.com>,
	"bugzilla-daemon@bugzilla.kernel.org"
	<bugzilla-daemon@bugzilla.kernel.org>
Subject: Re: [Bug #13819] system freeze when switching to console
Date: Tue, 8 Sep 2009 11:06:21 -0700 (PDT)	[thread overview]
Message-ID: <alpine.LFD.2.01.0909081039300.7458@localhost.localdomain> (raw)
In-Reply-To: <1252431375.14735.139.camel@rc-desk>



On Tue, 8 Sep 2009, reinette chatre wrote:
> 
> As you can see from the kernel version it is not a build of a vanilla
> kernel. It only contains changes related to the wireless networking work
> I am doing.
> 
> Here is the output:

Thanks, this is great. It pinpoints the problem very effectively.

> [  352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
> [  352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]

The code here is

	  16:	48 8b 80 00 01 00 00 	mov    0x100(%rax),%rax
	  1d:	48 8b 50 08          	mov    0x8(%rax),%rdx
	  21:	48 85 d2             	test   %rdx,%rdx
	  24:	74 11                	je     0x37
	  26:	49 8b 44 24 78       	mov    0x78(%r12),%rax
	  2b:*	8b 80 84 00 00 00    	mov    0x84(%rax),%eax     <-- trapping instruction
	  31:	89 82 08 08 00 00    	mov    %eax,0x808(%rdx)
	  37:	f6 45 a0 02          	testb  $0x2,-0x60(%rbp)

and that "testb $0x2, -0x60(%rbp)" seems to be the

	if (iir & I915_USER_INTERRUPT) {

test if I'm reading things right. Although it could also be the

	if (eir & I915_ERROR_MEMORY_REFRESH) {

thing. The disassembly is totally impossible to read, because the stupid 
i915 driver is chock-full of crap like

	if (IS_G4X(dev)) {
		..

which expands to insane amounts of code that check the PCI ID's one by 
one.

Intel guys: could you _please_ stop doing that. Create a capability mask 
in the device or something, so that you can test for "is this a G4x" with 
a single bit test, rather than have code like this:

        mov    0x31c(%rsi),%eax
        cmp    $0x2982,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2972,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2992,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x29a2,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2a02,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2a12,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2a42,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2e02,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2e12,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2e22,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x2e32,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>
        cmp    $0x42,%eax
        je     0xffffffff8124b669 <i915_driver_irq_handler+177>

for that IS_G4X() thing (I'm not kidding - that's exactly a hundred bytes 
of code for that _stupid_ test, and it's inlined!)

Anyway, we're getting that DRM irq, and it has a normal IRQ stack trace:

> [  352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
> [  352.804006] Call Trace:
> [  352.804006]  <IRQ> 
> [  352.804006]  [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> [  352.804006]  [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> [  352.804006]  [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> [  352.804006]  [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> [  352.804006]  [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> [  352.804006]  [<ffffffff8100c793>] ret_from_intr+0x0/0xf

.. but it happened just as we're tearing down the DRM irq handling:

> [  352.804006]  <EOI> 
> [  352.804006]  [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> [  352.804006]  [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [  352.804006]  [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
> [  352.804006]  [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [  352.804006]  [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> [  352.804006]  [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> [  352.804006]  [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [  352.804006]  [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
> [  352.804006]  [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> [  352.804006]  [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> [  352.804006]  [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]

so what is going on is that the i915 driver has obviously torn down some 
state before it uninstalls the irq, so the irq happens when the state has 
already been torn down, and the irq handler is not ready for that.

This patch *may* fix it - simply by getting rid of the irq early. However, 
I did not check whether maybe something in i915_gem_idle() actually needs 
the interrupt to be able to happen, so this is TOTALLY UNTESTED!

		Linus
---
 drivers/gpu/drm/i915/i915_gem.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7edb5b9..80e5ba4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4232,15 +4232,11 @@ int
 i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
 		       struct drm_file *file_priv)
 {
-	int ret;
-
 	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		return 0;
 
-	ret = i915_gem_idle(dev);
 	drm_irq_uninstall(dev);
-
-	return ret;
+	return i915_gem_idle(dev);
 }
 
 void

  reply	other threads:[~2009-09-08 18:06 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-09-06 17:15 2.6.31-rc9: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-09-06 17:15 ` [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13809] oprofile: possible circular locking dependency detected Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13836] suspend script fails, related to stdout? Rafael J. Wysocki
2009-09-07  3:28   ` Tomas M.
     [not found]     ` <4AA47DF8.80302-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2009-09-10 21:05       ` Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13740] X server crashes with 2.6.31-rc2 when options are changed Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-09-08 16:29   ` reinette chatre
2009-09-08 17:00     ` Linus Torvalds
     [not found]       ` <alpine.LFD.2.01.0909080943030.7458-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-09-08 17:36         ` reinette chatre
2009-09-08 18:06           ` Linus Torvalds [this message]
     [not found]             ` <alpine.LFD.2.01.0909081039300.7458-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-09-08 18:20               ` Jesse Barnes
2009-09-08 19:26                 ` Linus Torvalds
     [not found]                   ` <alpine.LFD.2.01.0909081219130.7458-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-09-08 19:31                     ` Jesse Barnes
2009-09-08 22:06                       ` Linus Torvalds
     [not found]                         ` <alpine.LFD.2.01.0909081502510.7458-bi+AKbBUZKY6gyzm1THtWbp2dZbC/Bob@public.gmane.org>
2009-09-08 22:11                           ` Jesse Barnes
2009-09-08 23:36                             ` Linus Torvalds
2009-09-08 23:45                               ` Jesse Barnes
2009-09-08 23:05                         ` Jesse Barnes
2009-09-08 23:56                           ` reinette chatre
2009-09-08 19:19               ` Linus Torvalds
2009-09-08 22:37               ` reinette chatre
2009-09-08 23:16                 ` Jesse Barnes
2009-09-08 23:27                   ` reinette chatre
2009-09-08 17:24   ` Jesse Barnes
2009-09-06 17:24 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13906] Huawei E169 GPRS connection causes Ooops Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki
2009-09-06 20:55   ` Ricardo Jorge da Fonseca Marques Ferreira
2009-09-06 21:11     ` Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13942] Troubles with AoE and uninitialized object Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k Rafael J. Wysocki
2009-09-08 19:30   ` Fabio Comolli
     [not found]     ` <b637ec0b0909081230y1b9d2abbo258662343cca7932-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-10 21:09       ` Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13941] x86 Geode issue Rafael J. Wysocki
2009-09-06 20:30   ` Martin-Éric Racine
     [not found]     ` <11fae7c70909061330h11c33ef3vc8ed4b7a0778874e-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2009-09-06 21:12       ` Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13947] Libertas: Association request to the driver failed Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13950] Oops when USB Serial disconnected while in use Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13987] Received NMI interrupt at resume Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13948] ath5k broken after suspend-to-ram Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14018] kernel freezes, inotify problem Rafael J. Wysocki
2009-09-06 21:37   ` Eric Paris
2009-09-06 21:51     ` Rafael J. Wysocki
     [not found]       ` <200909062351.02664.rjw-KKrjLPT3xs0@public.gmane.org>
2009-09-09  5:58         ` Christoph Thielecke
2009-09-06 17:24 ` [Bug #14013] hd don't show up Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14017] _end symbol missing from Symbol.map Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14058] Oops in fsnotify Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14070] lockdep warning triggered by dup_fd Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14043] System sometimes hangs during boot Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14095] Asus EeePC 1005HA-M: Suspend hangs and disables the wireless Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14103] cdc_acm gives I/O error Rafael J. Wysocki
2009-09-09 16:49   ` Stefan Schmidt
2009-09-06 17:24 ` [Bug #14133] WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14114] Tuning a saa7134 based card is broken in kernel 2.6.31-rc7 Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14135] NULL pointer dereference in ima_counts_put Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14138] Regression in suspend to ram Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14136] readcd Oops Rafael J. Wysocki
2009-09-07  5:38   ` Bob Tracy
     [not found]     ` <20090907053833.GA15287-GsGdjl1a1FBK+8ot1e1MDQ@public.gmane.org>
2009-09-10 21:11       ` Rafael J. Wysocki
     [not found]         ` <200909102311.26544.rjw-KKrjLPT3xs0@public.gmane.org>
2009-09-11  5:02           ` Bob Tracy
2009-09-06 17:24 ` [Bug #14137] usb console regressions Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14140] 2.6.31-rc9 breaks gianfar Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14141] order 2 page allocation failures Rafael J. Wysocki
2009-09-09 15:22   ` Mel Gorman
     [not found]     ` <20090909152206.GJ24614-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
2009-09-10 21:14       ` Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #14139] Output to external monitor is broken Rafael J. Wysocki
  -- strict thread matches above, loose matches on Subject: below --
2009-08-25 20:00 2.6.31-rc7-git2: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-19 20:20 2.6.31-rc6-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-19 20:26 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-19 23:35   ` reinette chatre
2009-08-20 14:55     ` Rafael J. Wysocki
2009-08-09 20:36 2.6.31-rc5-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-09 20:44 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-02 18:49 2.6.31-rc5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-02 18:58 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-07-26 20:23 2.6.31-rc4: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-07-26 20:28 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LFD.2.01.0909081039300.7458@localhost.localdomain \
    --to=torvalds@linux-foundation.org \
    --cc=bugzilla-daemon@bugzilla.kernel.org \
    --cc=eric@anholt.net \
    --cc=kernel-testers@vger.kernel.org \
    --cc=ling.ma@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=reinette.chatre@intel.com \
    --cc=rjw@sisk.pl \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox