i915 X lockup

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* i915 X lockup
@ 2009-02-27  9:28 Jiri Slaby
  2009-02-27 10:01 ` Peter Zijlstra
  2009-02-27 10:32 ` Andrew Morton
  0 siblings, 2 replies; 15+ messages in thread
From: Jiri Slaby @ 2009-02-27  9:28 UTC (permalink / raw)
  To: airlied; +Cc: eric, keithp, dri-devel, Andrew Morton, Linux kernel mailing list

Hi,

everytime I run X, it gets stuck. Currently running on mmotm 
2009-02-26-16-58, but I think this is wider problem. I had i915 disabled 
for a long time (until I noticed today).

SysRq : Show Locks Held

Showing all locks held in the system:
3 locks held by events/0/10:
  #0:  (events){+.+.+.}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
  #1:  (&(&dev_priv->mm.retire_work)->work){+.+...}, at: 
[<ffffffff8025223d>] worker_thread+0x19d/0x340
  #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff804057ba>] 
i915_gem_retire_work_handler+0x3a/0x90
1 lock held by mingetty/3899:
  #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
n_tty_read+0x48e/0x8e0
1 lock held by mingetty/3900:
  #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
n_tty_read+0x48e/0x8e0
1 lock held by mingetty/3901:
  #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
n_tty_read+0x48e/0x8e0
1 lock held by X/4007:
  #0:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8040563c>] 
i915_gem_throttle_ioctl+0x2c/0x60
2 locks held by bash/4105:
  #0:  (sysrq_key_table_lock){......}, at: [<ffffffff803de366>] 
__handle_sysrq+0x26/0x190
  #1:  (tasklist_lock){.+.+..}, at: [<ffffffff80266c1f>] 
debug_show_all_locks+0x3f/0x1c0

=============================================

INFO: task events/0:10 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
events/0      D 0000000000000000     0    10      2
  ffff8801cb22fd60 0000000000000046 ffff8801cb22fcc0 ffffffff809d5cb0
  0000000000010400 ffffffff804057ba ffff8801cb20a6d0 ffff8801cb20a080
  ffff8801cb20a328 00000000802690a3 00000000ffff0ea1 0000000000000002
Call Trace:
  [<ffffffff804057ba>] ? i915_gem_retire_work_handler+0x3a/0x90
  [<ffffffff8026804d>] ? mark_held_locks+0x6d/0x90
  [<ffffffff80612fb5>] ? mutex_lock_nested+0x185/0x310
  [<ffffffff80612f46>] mutex_lock_nested+0x116/0x310
  [<ffffffff804057ba>] ? i915_gem_retire_work_handler+0x3a/0x90
  [<ffffffff802690a3>] ? __lock_acquire+0xab3/0x12c0
  [<ffffffff80405780>] ? i915_gem_retire_work_handler+0x0/0x90
  [<ffffffff804057ba>] i915_gem_retire_work_handler+0x3a/0x90
  [<ffffffff80252290>] worker_thread+0x1f0/0x340
  [<ffffffff8025223d>] ? worker_thread+0x19d/0x340
  [<ffffffff80614aff>] ? _spin_unlock_irqrestore+0x3f/0x60
  [<ffffffff80256de0>] ? autoremove_wake_function+0x0/0x40
  [<ffffffff8026838d>] ? trace_hardirqs_on+0xd/0x10
  [<ffffffff802520a0>] ? worker_thread+0x0/0x340
  [<ffffffff80256a2e>] kthread+0x9e/0xb0
  [<ffffffff8020d51a>] child_rip+0xa/0x20
  [<ffffffff8020cf3c>] ? restore_args+0x0/0x30
  [<ffffffff80256990>] ? kthread+0x0/0xb0
  [<ffffffff8020d510>] ? child_rip+0x0/0x20
3 locks held by events/0/10:
  #0:  (events){+.+.+.}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
  #1:  (&(&dev_priv->mm.retire_work)->work){+.+...}, at: 
[<ffffffff8025223d>] worker_thread+0x19d/0x340
  #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff804057ba>] 
i915_gem_retire_work_handler+0x3a/0x90




Adapter is:
00:02.0 VGA compatible controller [0300]: Intel Corporation 82G33/G31 
Express Integrated Graphics Controller [8086:29c2] (rev 02) (prog-if 00 
[VGA controller])
         Subsystem: Intel Corporation 82G33/G31 Express Integrated 
Graphics Controller [8086:29c2]
         Flags: bus master, fast devsel, latency 0, IRQ 26
         Memory at ffa80000 (32-bit, non-prefetchable) [size=512K]
         I/O ports at ec00 [size=8]
         Memory at d0000000 (32-bit, prefetchable) [size=256M]
         Memory at ff900000 (32-bit, non-prefetchable) [size=1M]
         Capabilities: [90] Message Signalled Interrupts: Mask- 64bit- 
Count=1/1 Enable+
         Capabilities: [d0] Power Management version 2
         Kernel driver in use: i915



X server complains:
[mi] EQ overflowing. The server is probably stuck in an infinite loop.
[mi] mieqEnequeue: out-of-order valuator event; dropping.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27  9:28 i915 X lockup Jiri Slaby
@ 2009-02-27 10:01 ` Peter Zijlstra
  2009-02-27 10:12   ` Jiri Slaby
  2009-02-27 10:32 ` Andrew Morton
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Zijlstra @ 2009-02-27 10:01 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: airlied, eric, keithp, dri-devel, Andrew Morton,
	Linux kernel mailing list

On Fri, 2009-02-27 at 10:28 +0100, Jiri Slaby wrote:

> SysRq : Show Locks Held
> 
> Showing all locks held in the system:
> 3 locks held by events/0/10:
>   #0:  (events){+.+.+.}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #1:  (&(&dev_priv->mm.retire_work)->work){+.+...}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff804057ba>] i915_gem_retire_work_handler+0x3a/0x90

> 1 lock held by X/4007:
>   #0:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8040563c>] i915_gem_throttle_ioctl+0x2c/0x60

> =============================================
> 
> INFO: task events/0:10 blocked for more than 120 seconds.
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> events/0      D 0000000000000000     0    10      2
>   ffff8801cb22fd60 0000000000000046 ffff8801cb22fcc0 ffffffff809d5cb0
>   0000000000010400 ffffffff804057ba ffff8801cb20a6d0 ffff8801cb20a080
>   ffff8801cb20a328 00000000802690a3 00000000ffff0ea1 0000000000000002
> Call Trace:
>   [<ffffffff804057ba>] ? i915_gem_retire_work_handler+0x3a/0x90
>   [<ffffffff8026804d>] ? mark_held_locks+0x6d/0x90
>   [<ffffffff80612fb5>] ? mutex_lock_nested+0x185/0x310
>   [<ffffffff80612f46>] mutex_lock_nested+0x116/0x310
>   [<ffffffff804057ba>] ? i915_gem_retire_work_handler+0x3a/0x90
>   [<ffffffff802690a3>] ? __lock_acquire+0xab3/0x12c0
>   [<ffffffff80405780>] ? i915_gem_retire_work_handler+0x0/0x90
>   [<ffffffff804057ba>] i915_gem_retire_work_handler+0x3a/0x90
>   [<ffffffff80252290>] worker_thread+0x1f0/0x340
>   [<ffffffff8025223d>] ? worker_thread+0x19d/0x340
>   [<ffffffff80614aff>] ? _spin_unlock_irqrestore+0x3f/0x60
>   [<ffffffff80256de0>] ? autoremove_wake_function+0x0/0x40
>   [<ffffffff8026838d>] ? trace_hardirqs_on+0xd/0x10
>   [<ffffffff802520a0>] ? worker_thread+0x0/0x340
>   [<ffffffff80256a2e>] kthread+0x9e/0xb0
>   [<ffffffff8020d51a>] child_rip+0xa/0x20
>   [<ffffffff8020cf3c>] ? restore_args+0x0/0x30
>   [<ffffffff80256990>] ? kthread+0x0/0xb0
>   [<ffffffff8020d510>] ? child_rip+0x0/0x20
> 3 locks held by events/0/10:
>   #0:  (events){+.+.+.}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #1:  (&(&dev_priv->mm.retire_work)->work){+.+...}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff804057ba>] i915_gem_retire_work_handler+0x3a/0x90


Looks like eventd blocking on X, would be good to have sysrq-w output
too, to see what X is up to (assuming it is blocked, and not spinning
like mad with a lock held).


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 10:01 ` Peter Zijlstra
@ 2009-02-27 10:12   ` Jiri Slaby
  2009-02-27 10:14     ` Jiri Slaby
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2009-02-27 10:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jiri Slaby, airlied, eric, keithp, dri-devel, Andrew Morton,
	Linux kernel mailing list

On 27.2.2009 11:01, Peter Zijlstra wrote:
> would be good to have sysrq-w output

There was nothing but events. So this is rather an intel driver 
userspace bug?

SysRq : Show Blocked State
   task                        PC stack   pid father
events/1      D 0000000000000000     0    11      2
  ffff8801cb231da0 0000000000000046 ffff8801cb231d00 ffffffff80231ef8
  0000003800000010 0000000000010180 ffff880028053840 ffff880028057180
  ffff8801cb20c790 ffff8801cb20ca38 00000001ca3f90e8 00000000ffff7a0e
Call Trace:
  [<ffffffff80231ef8>] ? dequeue_entity+0x18/0x1a0
  [<ffffffff80230b50>] ? dequeue_task+0xb0/0xf0
  [<ffffffff805fca2a>] __mutex_lock_slowpath+0xea/0x170
  [<ffffffff803f60a0>] ? i915_gem_retire_work_handler+0x0/0x90
  [<ffffffff805fc6f6>] mutex_lock+0x26/0x50
  [<ffffffff803f60d8>] i915_gem_retire_work_handler+0x38/0x90
  [<ffffffff80250792>] worker_thread+0x172/0x250
  [<ffffffff80254da0>] ? autoremove_wake_function+0x0/0x40
  [<ffffffff80250620>] ? worker_thread+0x0/0x250
  [<ffffffff802549be>] kthread+0x9e/0xb0
  [<ffffffff8020d3da>] child_rip+0xa/0x20
  [<ffffffff80254920>] ? kthread+0x0/0xb0
  [<ffffffff8020d3d0>] ? child_rip+0x0/0x20

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 10:12   ` Jiri Slaby
@ 2009-02-27 10:14     ` Jiri Slaby
  0 siblings, 0 replies; 15+ messages in thread
From: Jiri Slaby @ 2009-02-27 10:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: airlied, eric, keithp, dri-devel, Andrew Morton,
	Linux kernel mailing list

On 27.2.2009 11:12, Jiri Slaby wrote:
> So this is rather an intel driver userspace bug?

Bullshit, ignore me :).

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27  9:28 i915 X lockup Jiri Slaby
  2009-02-27 10:01 ` Peter Zijlstra
@ 2009-02-27 10:32 ` Andrew Morton
  2009-02-27 13:04   ` Sitsofe Wheeler
  1 sibling, 1 reply; 15+ messages in thread
From: Andrew Morton @ 2009-02-27 10:32 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: airlied, eric, keithp, dri-devel, Linux kernel mailing list

On Fri, 27 Feb 2009 10:28:51 +0100 Jiri Slaby <jirislaby@gmail.com> wrote:

> everytime I run X, it gets stuck. Currently running on mmotm 
> 2009-02-26-16-58, but I think this is wider problem. I had i915 disabled 
> for a long time (until I noticed today).
> 
> SysRq : Show Locks Held
> 
> Showing all locks held in the system:
> 3 locks held by events/0/10:
>   #0:  (events){+.+.+.}, at: [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #1:  (&(&dev_priv->mm.retire_work)->work){+.+...}, at: 
> [<ffffffff8025223d>] worker_thread+0x19d/0x340
>   #2:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff804057ba>] 
> i915_gem_retire_work_handler+0x3a/0x90
> 1 lock held by mingetty/3899:
>   #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
> n_tty_read+0x48e/0x8e0
> 1 lock held by mingetty/3900:
>   #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
> n_tty_read+0x48e/0x8e0
> 1 lock held by mingetty/3901:
>   #0:  (&tty->atomic_read_lock){+.+.+.}, at: [<ffffffff803cb5de>] 
> n_tty_read+0x48e/0x8e0
> 1 lock held by X/4007:
>   #0:  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8040563c>] 
> i915_gem_throttle_ioctl+0x2c/0x60
> 2 locks held by bash/4105:
>   #0:  (sysrq_key_table_lock){......}, at: [<ffffffff803de366>] 
> __handle_sysrq+0x26/0x190
>   #1:  (tasklist_lock){.+.+..}, at: [<ffffffff80266c1f>] 
> debug_show_all_locks+0x3f/0x1c0

I assume that i915_gem_throttle_ioctl->i915_gem_ring_throttle is stuck
in i915_wait_request(), holding struct_mutex.  That of course makes
keventd block.

Perhaps i915_wait_request() is waiting for keventd to do something,
which is the deadlock.  That "something" could be to simply finish its
current call to i915_gem_retire_work_handler().

But worse, it could be some completely other keventd handler which
isn't getting run, because that keventd instance is stuck over in
i915_gem_retire_work_handler().

IOW, the usual keventd problem.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 10:32 ` Andrew Morton
@ 2009-02-27 13:04   ` Sitsofe Wheeler
  2009-02-27 13:49     ` Jiri Slaby
  0 siblings, 1 reply; 15+ messages in thread
From: Sitsofe Wheeler @ 2009-02-27 13:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Slaby, airlied, eric, keithp, dri-devel,
	Linux kernel mailing list

On Fri, Feb 27, 2009 at 02:32:31AM -0800, Andrew Morton wrote:
> On Fri, 27 Feb 2009 10:28:51 +0100 Jiri Slaby <jirislaby@gmail.com> wrote:
> 
> > everytime I run X, it gets stuck. Currently running on mmotm 
> > 2009-02-26-16-58, but I think this is wider problem. I had i915 disabled 
> > for a long time (until I noticed today).

Which version of X are you using? Does it support kernel modesetting? If
not, did you disable kernel modesetting in the KConfig file for i915?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 13:04   ` Sitsofe Wheeler
@ 2009-02-27 13:49     ` Jiri Slaby
  2009-02-27 23:12       ` Sitsofe Wheeler
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2009-02-27 13:49 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Andrew Morton, airlied, eric, keithp, dri-devel,
	Linux kernel mailing list

On 27.2.2009 14:04, Sitsofe Wheeler wrote:
> On Fri, Feb 27, 2009 at 02:32:31AM -0800, Andrew Morton wrote:
>> On Fri, 27 Feb 2009 10:28:51 +0100 Jiri Slaby<jirislaby@gmail.com>  wrote:
>>
>>> everytime I run X, it gets stuck. Currently running on mmotm
>>> 2009-02-26-16-58, but I think this is wider problem. I had i915 disabled
>>> for a long time (until I noticed today).
>
> Which version of X are you using? Does it support kernel modesetting? If
> not, did you disable kernel modesetting in the KConfig file for i915?

xorg-x11-server-7.4-17.3
which is
X.Org X Server 1.5.2

modesetting enabled:
CONFIG_DRM_I915_KMS=y

Which X version is needed for that?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 13:49     ` Jiri Slaby
@ 2009-02-27 23:12       ` Sitsofe Wheeler
  2009-02-28  0:20         ` Eric Anholt
  0 siblings, 1 reply; 15+ messages in thread
From: Sitsofe Wheeler @ 2009-02-27 23:12 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Andrew Morton, airlied, eric, keithp, dri-devel,
	Linux kernel mailing list

On Fri, Feb 27, 2009 at 02:49:06PM +0100, Jiri Slaby wrote:
> On 27.2.2009 14:04, Sitsofe Wheeler wrote:
> >On Fri, Feb 27, 2009 at 02:32:31AM -0800, Andrew Morton wrote:
> >>On Fri, 27 Feb 2009 10:28:51 +0100 Jiri Slaby<jirislaby@gmail.com>  wrote:
> >>
> >>>everytime I run X, it gets stuck. Currently running on mmotm
> >>>2009-02-26-16-58, but I think this is wider problem. I had i915 disabled
> >>>for a long time (until I noticed today).
> >
> >Which version of X are you using? Does it support kernel modesetting? If
> >not, did you disable kernel modesetting in the KConfig file for i915?
> 
> xorg-x11-server-7.4-17.3
> which is
> X.Org X Server 1.5.2
> 
> modesetting enabled:
> CONFIG_DRM_I915_KMS=y
> 
> Which X version is needed for that?

Good question. I can see that 7.4 supports GEM but I see nothing about
kernel modesetting (
http://www.phoronix.com/scan.php?page=article&item=xorg_74_final&num=1).
I know it's enabled in the Fedora (since Fedora 9) xorgs but I have no
idea about openSUSE (which I believe is what you are using based on
package numbers). Apparently kernel modesetting can be turned off on the
kernel command line by using nomodesetting so that might be a quick
thing to try...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-27 23:12       ` Sitsofe Wheeler
@ 2009-02-28  0:20         ` Eric Anholt
  2009-02-28  8:31           ` Jiri Slaby
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Anholt @ 2009-02-28  0:20 UTC (permalink / raw)
  To: Sitsofe Wheeler
  Cc: Jiri Slaby, Andrew Morton, airlied, keithp, dri-devel,
	Linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 1558 bytes --]

On Fri, 2009-02-27 at 23:12 +0000, Sitsofe Wheeler wrote:
> On Fri, Feb 27, 2009 at 02:49:06PM +0100, Jiri Slaby wrote:
> > On 27.2.2009 14:04, Sitsofe Wheeler wrote:
> > >On Fri, Feb 27, 2009 at 02:32:31AM -0800, Andrew Morton wrote:
> > >>On Fri, 27 Feb 2009 10:28:51 +0100 Jiri Slaby<jirislaby@gmail.com>  wrote:
> > >>
> > >>>everytime I run X, it gets stuck. Currently running on mmotm
> > >>>2009-02-26-16-58, but I think this is wider problem. I had i915 disabled
> > >>>for a long time (until I noticed today).
> > >
> > >Which version of X are you using? Does it support kernel modesetting? If
> > >not, did you disable kernel modesetting in the KConfig file for i915?
> > 
> > xorg-x11-server-7.4-17.3
> > which is
> > X.Org X Server 1.5.2
> > 
> > modesetting enabled:
> > CONFIG_DRM_I915_KMS=y
> > 
> > Which X version is needed for that?
> 
> Good question. I can see that 7.4 supports GEM but I see nothing about
> kernel modesetting (
> http://www.phoronix.com/scan.php?page=article&item=xorg_74_final&num=1).
> I know it's enabled in the Fedora (since Fedora 9) xorgs but I have no
> idea about openSUSE (which I believe is what you are using based on
> package numbers). Apparently kernel modesetting can be turned off on the
> kernel command line by using nomodesetting so that might be a quick
> thing to try...

KMS support is not a feature of the server but of your 2D driver.  You
want 2.6.2, or things will be bad.

-- 
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28  0:20         ` Eric Anholt
@ 2009-02-28  8:31           ` Jiri Slaby
  2009-02-28  8:47             ` Andrew Morton
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Slaby @ 2009-02-28  8:31 UTC (permalink / raw)
  To: Eric Anholt
  Cc: Sitsofe Wheeler, Andrew Morton, airlied, keithp, dri-devel,
	Linux kernel mailing list

On 28.2.2009 01:20, Eric Anholt wrote:
> KMS support is not a feature of the server but of your 2D driver.  You
> want 2.6.2, or things will be bad.

I have 2.5.0. After turning KMS off, problem seems to be solved.

Anyway, I would appreciate a version of the intel driver being in the 
Kconfig text, otherwise it looks like: don't use this on machines with 
installation from stone age. If one has latest stable release of a 
distro, he doesn't even think he doesn't have "new enough userspace".

For reference, the text is:
Choose this option if you want kernel modesetting enabled by default,
and you have a new enough userspace to support this. Running old
userspaces with this enabled will cause pain.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28  8:31           ` Jiri Slaby
@ 2009-02-28  8:47             ` Andrew Morton
  2009-02-28  9:00               ` Eric Anholt
  2009-02-28 17:11               ` Keith Packard
  0 siblings, 2 replies; 15+ messages in thread
From: Andrew Morton @ 2009-02-28  8:47 UTC (permalink / raw)
  To: Jiri Slaby
  Cc: Eric Anholt, Sitsofe Wheeler, airlied, keithp, dri-devel,
	Linux kernel mailing list

On Sat, 28 Feb 2009 09:31:28 +0100 Jiri Slaby <jirislaby@gmail.com> wrote:

> On 28.2.2009 01:20, Eric Anholt wrote:
> > KMS support is not a feature of the server but of your 2D driver.  You
> > want 2.6.2, or things will be bad.
> 
> I have 2.5.0. After turning KMS off, problem seems to be solved.
> 
> Anyway, I would appreciate a version of the intel driver being in the 
> Kconfig text, otherwise it looks like: don't use this on machines with 
> installation from stone age. If one has latest stable release of a 
> distro, he doesn't even think he doesn't have "new enough userspace".
> 
> For reference, the text is:
> Choose this option if you want kernel modesetting enabled by default,
> and you have a new enough userspace to support this. Running old
> userspaces with this enabled will cause pain.

Hang on.

The kernel deadlocked on struct_mutex, did it not?  That's a kernel bug
regardless of what userspace you're running.

Do we know why this happened?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28  8:47             ` Andrew Morton
@ 2009-02-28  9:00               ` Eric Anholt
  2009-02-28 18:24                 ` Bruno Prémont
  2009-02-28 17:11               ` Keith Packard
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Anholt @ 2009-02-28  9:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jiri Slaby, Sitsofe Wheeler, airlied, keithp, dri-devel,
	Linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 1965 bytes --]

On Sat, 2009-02-28 at 00:47 -0800, Andrew Morton wrote:
> On Sat, 28 Feb 2009 09:31:28 +0100 Jiri Slaby <jirislaby@gmail.com> wrote:
> 
> > On 28.2.2009 01:20, Eric Anholt wrote:
> > > KMS support is not a feature of the server but of your 2D driver.  You
> > > want 2.6.2, or things will be bad.
> > 
> > I have 2.5.0. After turning KMS off, problem seems to be solved.
> > 
> > Anyway, I would appreciate a version of the intel driver being in the 
> > Kconfig text, otherwise it looks like: don't use this on machines with 
> > installation from stone age. If one has latest stable release of a 
> > distro, he doesn't even think he doesn't have "new enough userspace".
> > 
> > For reference, the text is:
> > Choose this option if you want kernel modesetting enabled by default,
> > and you have a new enough userspace to support this. Running old
> > userspaces with this enabled will cause pain.
> 
> Hang on.
> 
> The kernel deadlocked on struct_mutex, did it not?  That's a kernel bug
> regardless of what userspace you're running.
> 
> Do we know why this happened?

Userland went stomping all over the device state that the kernel thinks
it controls since you went and turned on the KMS option asserting "I'm
not going to run old userland", so the GPU got hung, and further
software using the GPU hung, and then somebody waiting for someone else
finishing using the GPU (struct_mutex) got hung.

The only proposal to prevent it is to use the "don't let userland map my
PCI device any more" support we now have available to us, which would
make X fail early on.  The unfortunate side-effect of that is that we
lose the ability to run incredibly useful userland debug tools that do
read-only access to registers.  We're moving bits of those into debugfs
for 2.6.30, but it's work that's not done even for the tools we have
today.

-- 
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28  9:00               ` Eric Anholt
@ 2009-02-28 18:24                 ` Bruno Prémont
  2009-02-28 19:57                   ` Eric Anholt
  0 siblings, 1 reply; 15+ messages in thread
From: Bruno Prémont @ 2009-02-28 18:24 UTC (permalink / raw)
  To: Eric Anholt
  Cc: Andrew Morton, Jiri Slaby, Sitsofe Wheeler, airlied, keithp,
	dri-devel, Linux kernel mailing list

On Sat, 28 February 2009 Eric Anholt <eric@anholt.net> wrote:
> On Sat, 2009-02-28 at 00:47 -0800, Andrew Morton wrote:
> > The kernel deadlocked on struct_mutex, did it not?  That's a kernel
> > bug regardless of what userspace you're running.
> > 
> > Do we know why this happened?
> 
> Userland went stomping all over the device state that the kernel
> thinks it controls since you went and turned on the KMS option
> asserting "I'm not going to run old userland", so the GPU got hung,
> and further software using the GPU hung, and then somebody waiting
> for someone else finishing using the GPU (struct_mutex) got hung.
> 
> The only proposal to prevent it is to use the "don't let userland map
> my PCI device any more" support we now have available to us, which
> would make X fail early on.  The unfortunate side-effect of that is
> that we lose the ability to run incredibly useful userland debug
> tools that do read-only access to registers.  We're moving bits of
> those into debugfs for 2.6.30, but it's work that's not done even for
> the tools we have today.

I also saw/see Xorg lockup...

I'm running xf86-video-intel-2.6.1 (2.6.2 released only very recently)
and kernel with KMS disabled (KMS not capable of getting framebuffer
properly configured it seems, at least display remains black)

For me it's a deadlock between intel driver dispatching GEM requests
and events (see log below).
For me each time it happends was while interacting with a webbrowser,
though it does not happend very often.

Connecting to the notebook via ssh I can kill -KILL Xorg (kill -TERM
does not work).
When doing this GPU gets at least confused though in order to get
vesafb back working it's sufficient to start Xorg with vesa driver
(running with intel driver before rebooting leads to locked X and
need to retry the killing)

Bruno


Graphics card:
00:02.0 VGA compatible controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02) (prog-if 00 [VGA controller])
        Subsystem: Acer Incorporated [ALI] Device 0035
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
        Latency: 0
        Interrupt: pin A routed to IRQ 11
        Region 0: Memory at e8000000 (32-bit, prefetchable) [size=128M]
        Region 1: Memory at e0000000 (32-bit, non-prefetchable) [size=512K]
        Region 2: I/O ports at 1800 [size=8]
        Capabilities: [d0] Power Management version 1
                Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

00:02.1 Display controller: Intel Corporation 82852/855GM Integrated Graphics Device (rev 02)
        Subsystem: Acer Incorporated [ALI] Device 0035
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Region 0: Memory at f0000000 (32-bit, prefetchable) [size=128M]
        Region 1: Memory at e0080000 (32-bit, non-prefetchable) [size=512K]
        Capabilities: [d0] Power Management version 1
                Flags: PMEClk- DSI+ D1+ D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 PME-Enable- DSel=0 DScale=0 PME-

drm debug output around lock time + sysreq+t task traces:
Feb 28 18:12:16 [kernel] [27945.376131] [drm:drm_agp_bind_pages] 
Feb 28 18:12:16 [kernel] [27945.376162] [drm:i915_add_request] 3688298
Feb 28 18:12:16 [kernel] [27945.376169] [drm:i915_add_request] 3688299
Feb 28 18:12:16 [kernel] [27945.376181] [drm:drm_ioctl] pid=6673, cmd=0xc0086457, nr=0x57, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.376192] [drm:drm_ioctl] pid=6673, cmd=0x400c645f, nr=0x5f, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.376204] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.390764] [drm:drm_ioctl] ret = fffffe00
Feb 28 18:12:16 [kernel] [27945.390831] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.403653] [drm:drm_ioctl] ret = fffffe00
Feb 28 18:12:16 [kernel] [27945.403697] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.416451] [drm:drm_ioctl] ret = fffffe00
Feb 28 18:12:16 [kernel] [27945.416500] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
Feb 28 18:12:16 [kernel] [27945.430630] [drm:drm_ioctl] ret = fffffe00
Feb 28 18:12:16 [kernel] [27945.430677] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
... (same message repeating lots of time ...)
Feb 28 18:12:56 [kernel] [27986.140232] [drm:drm_ioctl] ret = fffffe00
Feb 28 18:12:56 [kernel] [27986.140274] [drm:drm_ioctl] pid=6673, cmd=0x6458, nr=0x58, dev 0xe200, auth=1
Feb 28 18:16:00 [kernel] [28169.605072] SysRq : Show State
Feb 28 18:16:00 [kernel] [28169.605085]   task                PC stack   pid father
Feb 28 18:16:00 [kernel] [28169.605092] init          S 00000000  1188     1      0
Feb 28 18:16:00 [kernel] [28169.605104]  dd814ae4 00000086 004c4b3e 00000000 00000296 dd82a000 00000286 00000000
Feb 28 18:16:00 [kernel] [28169.605120]  004c4b3e 004c4b3e 00000000 dd814b4c c036b4b7 004c4b3e 00000000 da6d3dfc
Feb 28 18:16:00 [kernel] [28169.605134]  dd2d1b94 da785b94 908c73bf 0000199f 90402881 0000199f c0133610 c046e72c
Feb 28 18:16:00 [kernel] [28169.605149] Call Trace:
Feb 28 18:16:00 [kernel] [28169.605170]  [<c036b4b7>] schedule_hrtimeout_range+0xb7/0x100
Feb 28 18:16:00 [kernel] [28169.605185]  [<c0133610>] ? hrtimer_wakeup+0x0/0x20
Feb 28 18:16:00 [kernel] [28169.605195]  [<c036b4a0>] ? schedule_hrtimeout_range+0xa0/0x100
Feb 28 18:16:00 [kernel] [28169.605207]  [<c0175b80>] poll_schedule_timeout+0x30/0x60
Feb 28 18:16:00 [kernel] [28169.605216]  [<c01762b8>] do_select+0x448/0x5b0
Feb 28 18:16:00 [kernel] [28169.605226]  [<c0175c50>] ? __pollwait+0x0/0xd0
Feb 28 18:16:00 [kernel] [28169.605234]  [<c0175d20>] ? pollwake+0x0/0x60
Feb 28 18:16:00 [kernel] [28169.605245]  [<c01e30e1>] ? cfq_insert_request+0x31/0x390
Feb 28 18:16:00 [kernel] [28169.605256]  [<c01d8f76>] ? elv_insert+0x116/0x180
Feb 28 18:16:00 [kernel] [28169.605266]  [<c01d9b4f>] ? part_round_stats+0x3f/0x50
Feb 28 18:16:00 [kernel] [28169.605274]  [<c01d9049>] ? __elv_add_request+0x69/0xb0
Feb 28 18:16:00 [kernel] [28169.605284]  [<c01db466>] ? __make_request+0xa6/0x300
Feb 28 18:16:00 [kernel] [28169.605294]  [<c01489fe>] ? mempool_alloc_slab+0xe/0x10
Feb 28 18:16:00 [kernel] [28169.605303]  [<c01489fe>] ? mempool_alloc_slab+0xe/0x10
Feb 28 18:16:00 [kernel] [28169.605311]  [<c0148adc>] ? mempool_alloc+0x2c/0xc0
Feb 28 18:16:00 [kernel] [28169.605323]  [<c028be95>] ? scsi_pool_alloc_command+0x45/0x70
Feb 28 18:16:00 [kernel] [28169.605334]  [<c0290467>] ? scsi_init_sgtable+0x47/0xa0
Feb 28 18:16:00 [kernel] [28169.605343]  [<c0291fc0>] ? scsi_sg_alloc+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.605351]  [<c0290674>] ? scsi_init_io+0x14/0xa0
Feb 28 18:16:00 [kernel] [28169.605361]  [<c014a6ca>] ? __rmqueue+0x9a/0x1a0
Feb 28 18:16:00 [kernel] [28169.605371]  [<c0178b39>] ? __d_lookup+0xa9/0xe0
Feb 28 18:16:00 [kernel] [28169.605383]  [<c0171f40>] ? do_lookup+0x60/0x190
Feb 28 18:16:00 [kernel] [28169.605392]  [<c0163c8d>] ? shmem_permission+0xd/0x10
Feb 28 18:16:00 [kernel] [28169.605402]  [<c0176bca>] core_sys_select+0x1ba/0x2e0
Feb 28 18:16:00 [kernel] [28169.605412]  [<c0170cf0>] ? path_put+0x20/0x30
Feb 28 18:16:00 [kernel] [28169.605421]  [<c0172b4b>] ? path_walk+0x4b/0x90
Feb 28 18:16:00 [kernel] [28169.605431]  [<c01ea1e7>] ? __copy_to_user_ll+0x57/0x60
Feb 28 18:16:00 [kernel] [28169.605440]  [<c01ea61e>] ? copy_to_user+0x3e/0x60
Feb 28 18:16:00 [kernel] [28169.605449]  [<c016cc8b>] ? cp_new_stat64+0xeb/0x100
Feb 28 18:16:00 [kernel] [28169.605460]  [<c0115fed>] ? read_hpet+0xd/0x20
Feb 28 18:16:00 [kernel] [28169.605470]  [<c0136f9b>] ? getnstimeofday+0x4b/0x110
Feb 28 18:16:00 [kernel] [28169.605482]  [<c0123997>] ? timespec_add_safe+0x27/0x50
Feb 28 18:16:00 [kernel] [28169.605491]  [<c0176e9c>] sys_select+0x2c/0xb0
Feb 28 18:16:00 [kernel] [28169.605501]  [<c01030c5>] sysenter_do_call+0x12/0x25
Feb 28 18:16:00 [kernel] [28169.605508] kthreadd      S 00000286  3384     2      0
Feb 28 18:16:00 [kernel] [28169.605522]  dd81dfc0 00000046 da996bd8 00000286 000008b1 dd82a330 dd81dfc0 00000286
Feb 28 18:16:00 [kernel] [28169.605536]  c046e6d0 000008b1 da996bbc dd81dfe0 c0130a25 00000000 00000001 00000001
Feb 28 18:16:00 [kernel] [28169.605550]  c01309a0 00000000 00000000 00000000 c01037df 00000000 00000000 00000000
Feb 28 18:16:00 [kernel] [28169.605563] Call Trace:
Feb 28 18:16:00 [kernel] [28169.605573]  [<c0130a25>] kthreadd+0x85/0x120
Feb 28 18:16:00 [kernel] [28169.605582]  [<c01309a0>] ? kthreadd+0x0/0x120
Feb 28 18:16:00 [kernel] [28169.605590]  [<c01037df>] kernel_thread_helper+0x7/0x18
Feb 28 18:16:00 [kernel] [28169.605596] ksoftirqd/0   S 00000246  3764     3      2
Feb 28 18:16:00 [kernel] [28169.605611]  dd81ffc0 00000046 00000000 00000246 c0124040 dd82a660 00000246 dd81ffb8
Feb 28 18:16:00 [kernel] [28169.605625]  00000000 00000000 c0124040 dd81ffcc c01240bc fffffffc dd81ffe0 c0130972
Feb 28 18:16:00 [kernel] [28169.605639]  c0130930 00000000 00000000 00000000 c01037df dd814f00 00000000 00000000
Feb 28 18:16:00 [kernel] [28169.605652] Call Trace:
Feb 28 18:16:00 [kernel] [28169.605661]  [<c0124040>] ? ksoftirqd+0x0/0xb0
Feb 28 18:16:00 [kernel] [28169.605671]  [<c0124040>] ? ksoftirqd+0x0/0xb0
Feb 28 18:16:00 [kernel] [28169.605680]  [<c01240bc>] ksoftirqd+0x7c/0xb0
Feb 28 18:16:00 [kernel] [28169.605688]  [<c0130972>] kthread+0x42/0x70
Feb 28 18:16:00 [kernel] [28169.605697]  [<c0130930>] ? kthread+0x0/0x70
Feb 28 18:16:00 [kernel] [28169.605705]  [<c01037df>] kernel_thread_helper+0x7/0x18
Feb 28 18:16:00 [kernel] [28169.605711] events/0      D c011b912  2496     4      2
Feb 28 18:16:00 [kernel] [28169.605724]  dd830f34 00000046 dd830f34 c011b912 dd8c0cc0 dd82a990 2ad00b71 000007af
Feb 28 18:16:00 [kernel] [28169.605738]  dd914810 ffffffff dd914814 dd830f58 c036b22e dd82a990 dd914814 dd914814
Feb 28 18:16:00 [kernel] [28169.605752]  dd82a990 dd914810 dd822000 dd822ea8 dd830f68 c036b199 dd8e9828 dd914800
Feb 28 18:16:00 [kernel] [28169.605766] Call Trace:
Feb 28 18:16:00 [kernel] [28169.605797]  [<c036b199>] mutex_lock+0x19/0x20
Feb 28 18:16:00 [kernel] [28169.605808]  [<c0271998>] i915_gem_retire_work_handler+0x28/0x70
Feb 28 18:16:00 [kernel] [28169.605818]  [<c0271970>] ? i915_gem_retire_work_handler+0x0/0x70
Feb 28 18:16:00 [kernel] [28169.605827]  [<c012d937>] run_workqueue+0x67/0xe0
Feb 28 18:16:00 [kernel] [28169.605835]  [<c012dbf7>] worker_thread+0x97/0xf0
Feb 28 18:16:00 [kernel] [28169.605845]  [<c0130cf0>] ? autoremove_wake_function+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.605854]  [<c012db60>] ? worker_thread+0x0/0xf0
Feb 28 18:16:00 [kernel] [28169.605862]  [<c0130972>] kthread+0x42/0x70
Feb 28 18:16:00 [kernel] [28169.605871]  [<c0130930>] ? kthread+0x0/0x70
Feb 28 18:16:00 [kernel] [28169.605879]  [<c01037df>] kernel_thread_helper+0x7/0x18
Feb 28 18:16:00 [kernel] [28169.605885] khelper       S c012d937  3300     5      2
Feb 28 18:16:00 [kernel] [28169.605899]  dd831fa4 00000046 dd831fa4 c012d937 dd845980 dd82acc0 c0464180 00000246
Feb 28 18:16:00 [kernel] [28169.605913]  dd808208 dd808200 dd831fac dd831fcc c012dc27 00000000 dd82acc0 c0130cf0
Feb 28 18:16:00 [kernel] [28169.605927]  dd808208 dd808208 fffffffc dd808200 c012db60 dd831fe0 c0130972 c0130930
Feb 28 18:16:00 [kernel] [28169.605941] Call Trace:
Feb 28 18:16:00 [kernel] [28169.605948]  [<c012d937>] ? run_workqueue+0x67/0xe0
Feb 28 18:16:00 [kernel] [28169.605957]  [<c012dc27>] worker_thread+0xc7/0xf0
Feb 28 18:16:00 [kernel] [28169.605966]  [<c0130cf0>] ? autoremove_wake_function+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.605975]  [<c012db60>] ? worker_thread+0x0/0xf0
Feb 28 18:16:00 [kernel] [28169.605983]  [<c0130972>] kthread+0x42/0x70
Feb 28 18:16:00 [kernel] [28169.605992]  [<c0130930>] ? kthread+0x0/0x70
Feb 28 18:16:00 [kernel] [28169.606000]  [<c01037df>] kernel_thread_helper+0x7/0x18
  (... skipping over non-related processes ...)
Feb 28 18:16:00 [kernel] [28169.614590] agetty        S c0489794  1552  6664      1
Feb 28 18:16:00 [kernel] [28169.614590]  da94aea4 00000082 c1029ba0 c0489794 00000292 dd0ddcb0 00000292 00000001
Feb 28 18:16:00 [kernel] [28169.614590]  7fffffff 00000000 da8a5800 da94aeec c036adb5 00000202 da94aeb8 00000046
Feb 28 18:16:00 [kernel] [28169.614590]  da8a5800 da94aedc 00000246 00000246 000091c3 00000000 da8a5800 da8a5800
Feb 28 18:16:00 [kernel] [28169.614590] Call Trace:
Feb 28 18:16:00 [kernel] [28169.614590]  [<c036adb5>] schedule_timeout+0x75/0xc0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c024264b>] n_tty_read+0x1ab/0x5a0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c011d490>] ? default_wake_function+0x0/0x10
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0240156>] tty_read+0x76/0xb0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c02424a0>] ? n_tty_read+0x0/0x5a0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c016a000>] vfs_read+0x90/0x110
Feb 28 18:16:00 [kernel] [28169.614590]  [<c02400e0>] ? tty_read+0x0/0xb0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c016a4fd>] sys_read+0x3d/0x70
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01030c5>] sysenter_do_call+0x12/0x25
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0360000>] ? __inet6_lookup_established+0x2e0/0x4d0
Feb 28 18:16:00 [kernel] [28169.614590] Xorg          S 000223f2  1588  6673      1
Feb 28 18:16:00 [kernel] [28169.614590]  dd082e74 00003082 00006d52 000223f2 00000000 dd8c0cc0 00000007 00003246
Feb 28 18:16:00 [kernel] [28169.614590]  00000000 dd914800 0038476b dd082ebc c0271891 5d343732 ffff0020 00000000
Feb 28 18:16:00 [kernel] [28169.614590]  80000000 0000bffc 00000000 dd822084 dd822000 00000000 dd8c0cc0 c0130cf0
Feb 28 18:16:00 [kernel] [28169.614590] Call Trace:
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0271891>] i915_wait_request+0x141/0x1a0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0130cf0>] ? autoremove_wake_function+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0271925>] i915_gem_throttle_ioctl+0x35/0x50
Feb 28 18:16:00 [kernel] [28169.614590]  [<c025e0f0>] drm_ioctl+0xe0/0x2b0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c02718f0>] ? i915_gem_throttle_ioctl+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.614590]  [<c025e010>] ? drm_ioctl+0x0/0x2b0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0174b27>] vfs_ioctl+0x67/0x70
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01750fa>] do_vfs_ioctl+0x1fa/0x520
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0109751>] ? restore_i387_xstate+0xd1/0x1d0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0115fed>] ? read_hpet+0xd/0x20
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0136f9b>] ? getnstimeofday+0x4b/0x110
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0102518>] ? restore_sigcontext+0xf8/0x120
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0175459>] sys_ioctl+0x39/0x60
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01030c5>] sysenter_do_call+0x12/0x25
Feb 28 18:16:00 [kernel] [28169.614590] enlightenment S 0001579d  1544  6681      1
Feb 28 18:16:00 [kernel] [28169.614590]  dd23eb84 00000082 c01763e8 0001579d dd23ee68 da5e9cb0 c4e2db00 00000000
Feb 28 18:16:00 [kernel] [28169.614590]  00000000 00000000 00000000 dd23ebec c036b4df 00000000 00000001 00000000
Feb 28 18:16:00 [kernel] [28169.614590]  dd23ee70 dd23ee74 dd23ee78 dd23ee68 dd23ee6c dd23ee70 50008ab0 00000246
Feb 28 18:16:00 [kernel] [28169.614590] Call Trace:
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01763e8>] ? do_select+0x578/0x5b0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c036b4df>] schedule_hrtimeout_range+0xdf/0x100
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0175cbf>] ? __pollwait+0x6f/0xd0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0175b80>] poll_schedule_timeout+0x30/0x60
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0176662>] do_sys_poll+0x242/0x3e0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0175c50>] ? __pollwait+0x0/0xd0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0175d20>] ? pollwake+0x0/0x60
                - Last output repeated 2 times -
Feb 28 18:16:00 [kernel] [28169.614590]  [<c014a6ca>] ? __rmqueue+0x9a/0x1a0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c014ad2b>] ? get_page_from_freelist+0x3db/0x450
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01489fe>] ? mempool_alloc_slab+0xe/0x10
Feb 28 18:16:00 [kernel] [28169.614590]  [<c014b194>] ? __alloc_pages_internal+0x94/0x400
Feb 28 18:16:00 [kernel] [28169.614590]  [<c014b51c>] ? __get_free_pages+0x1c/0x40
Feb 28 18:16:00 [kernel] [28169.614590]  [<c016741b>] ? __kmalloc_track_caller+0xbb/0xe0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c02daf8b>] ? __alloc_skb+0x4b/0x100
Feb 28 18:16:00 [kernel] [28169.614590]  [<c036b18e>] ? mutex_lock+0xe/0x20
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0334cea>] ? unix_stream_recvmsg+0x1ea/0x450
Feb 28 18:16:00 [kernel] [28169.614590]  [<c02d6b36>] ? sock_aio_read+0xe6/0x110
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0169e7c>] ? do_sync_read+0xcc/0x110
Feb 28 18:16:00 [kernel] [28169.614590]  [<c033464e>] ? unix_ioctl+0x9e/0xd0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0130cf0>] ? autoremove_wake_function+0x0/0x50
Feb 28 18:16:00 [kernel] [28169.614590]  [<c016a07b>] ? vfs_read+0x10b/0x110
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0176964>] sys_poll+0x54/0xb0
Feb 28 18:16:00 [kernel] [28169.614590]  [<c01030c5>] sysenter_do_call+0x12/0x25
Feb 28 18:16:00 [kernel] [28169.614590]  [<c0360000>] ? __inet6_lookup_established+0x2e0/0x4d0
  (... skipping over unrelated processes ...)


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28 18:24                 ` Bruno Prémont
@ 2009-02-28 19:57                   ` Eric Anholt
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Anholt @ 2009-02-28 19:57 UTC (permalink / raw)
  To: Bruno Prémont
  Cc: Andrew Morton, Jiri Slaby, Sitsofe Wheeler, airlied, keithp,
	dri-devel, Linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 2415 bytes --]

On Sat, 2009-02-28 at 19:24 +0100, Bruno Prémont wrote:
> On Sat, 28 February 2009 Eric Anholt <eric@anholt.net> wrote:
> > On Sat, 2009-02-28 at 00:47 -0800, Andrew Morton wrote:
> > > The kernel deadlocked on struct_mutex, did it not?  That's a kernel
> > > bug regardless of what userspace you're running.
> > > 
> > > Do we know why this happened?
> > 
> > Userland went stomping all over the device state that the kernel
> > thinks it controls since you went and turned on the KMS option
> > asserting "I'm not going to run old userland", so the GPU got hung,
> > and further software using the GPU hung, and then somebody waiting
> > for someone else finishing using the GPU (struct_mutex) got hung.
> > 
> > The only proposal to prevent it is to use the "don't let userland map
> > my PCI device any more" support we now have available to us, which
> > would make X fail early on.  The unfortunate side-effect of that is
> > that we lose the ability to run incredibly useful userland debug
> > tools that do read-only access to registers.  We're moving bits of
> > those into debugfs for 2.6.30, but it's work that's not done even for
> > the tools we have today.
> 
> I also saw/see Xorg lockup...
> 
> I'm running xf86-video-intel-2.6.1 (2.6.2 released only very recently)
> and kernel with KMS disabled (KMS not capable of getting framebuffer
> properly configured it seems, at least display remains black)
> 
> For me it's a deadlock between intel driver dispatching GEM requests
> and events (see log below).
> For me each time it happends was while interacting with a webbrowser,
> though it does not happend very often.
> 
> Connecting to the notebook via ssh I can kill -KILL Xorg (kill -TERM
> does not work).
> When doing this GPU gets at least confused though in order to get
> vesafb back working it's sufficient to start Xorg with vesa driver
> (running with intel driver before rebooting leads to locked X and
> need to retry the killing)

You have a completely different problem.

Your GPU is hung in doing something that should work.  8xx support is
notoriously bad right now, and we haven't managed to get the developer
time on fixing it that we need, though I keep pushing for it.  My only
855 has a dead disk, or I probably would have poked at it by now.

-- 
Eric Anholt
eric@anholt.net                         eric.anholt@intel.com



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: i915 X lockup
  2009-02-28  8:47             ` Andrew Morton
  2009-02-28  9:00               ` Eric Anholt
@ 2009-02-28 17:11               ` Keith Packard
  1 sibling, 0 replies; 15+ messages in thread
From: Keith Packard @ 2009-02-28 17:11 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Keith Packard, Jiri Slaby, Eric Anholt, Sitsofe Wheeler, airlied,
	dri-devel, Linux kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 929 bytes --]

On Sat, 2009-02-28 at 00:47 -0800, Andrew Morton wrote:

> The kernel deadlocked on struct_mutex, did it not?  That's a kernel bug
> regardless of what userspace you're running.

No, it didn't deadlock on struct_mutex, it deadlocked because the
hardware got wedged, and we still don't know how to unwedge the hardware
and get it working again other than turning it off and back on again.

> Do we know why this happened?

Yes, the hardware will happily lock up when user space maps the PCI BAR
covering the device registers and the application pokes various internal
device registers directly.

That's the fundamental contract KMS requires -- if the kernel is going
to manage the device, then user space isn't supposed to manipulate it
directly anymore.

I suspect most any other device in the machine could be made to do 'bad
things' if userspace went and poked it directly.

-- 
keith.packard@intel.com

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2009-02-28 19:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-02-27  9:28 i915 X lockup Jiri Slaby
2009-02-27 10:01 ` Peter Zijlstra
2009-02-27 10:12   ` Jiri Slaby
2009-02-27 10:14     ` Jiri Slaby
2009-02-27 10:32 ` Andrew Morton
2009-02-27 13:04   ` Sitsofe Wheeler
2009-02-27 13:49     ` Jiri Slaby
2009-02-27 23:12       ` Sitsofe Wheeler
2009-02-28  0:20         ` Eric Anholt
2009-02-28  8:31           ` Jiri Slaby
2009-02-28  8:47             ` Andrew Morton
2009-02-28  9:00               ` Eric Anholt
2009-02-28 18:24                 ` Bruno Prémont
2009-02-28 19:57                   ` Eric Anholt
2009-02-28 17:11               ` Keith Packard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox