All of lore.kernel.org
 help / color / mirror / Atom feed
From: Knut Petersen <Knut_Petersen@t-online.de>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: airlied@linux.ie, jesse.barnes@intel.com,
	linux-kernel@vger.kernel.org,
	intel-gfx <intel-gfx@lists.freedesktop.org>,
	Mike Galbraith <efault@gmx.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Ingo Molnar <mingo@elte.hu>
Subject: Re: [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated cgroup scheduling
Date: Thu, 20 Jan 2011 23:57:56 +0100	[thread overview]
Message-ID: <4D38BDF4.6040509@t-online.de> (raw)
In-Reply-To: <AANLkTi=GjnyE1_RosS_L_sn=QCTDSgO7v9EL+1bpJTu7@mail.gmail.com>

There is an additional problem: The video signal on the framebuffer
console is switched off at random(?) intervals and then switched on
again. Most of the time it's only for fractions of a second. No entries
in the logs for that.

I changed the kernel configuration to select SLUB and enabled
SLUB debugging. Result: No change, 100% reproducible lockup

Then I edited i915_gem.c the way you suggested. No lockup,
but X is unusable. Switching to framebuffer console works, switching
back to X still shows the framebuffer image ... and the X mouse cursor.
Switching back to framebuffer console works.

I verified the problem on a 2nd PC with an identical motherboard.

Let's have look at the lockup logs:

/var/log/boot.msg

<6>[    2.050143] Linux agpgart interface v0.103
<6>[    2.050244] agpgart-intel 0000:00:00.0: Intel 915GM Chipset
<6>[    2.051099] agpgart-intel 0000:00:00.0: detected 7932K stolen memory
<6>[    2.053815] agpgart-intel 0000:00:00.0: AGP aperture is 256M @
0xc0000000
<6>[    2.053972] [drm] Initialized drm 1.1.0 20060810
<6>[    2.054073] i915 0000:00:02.0: PCI INT A -> GSI 16 (level, low) ->
IRQ 16
<7>[    2.054164] i915 0000:00:02.0: setting latency timer to 64
<6>[    2.766096] [drm] DAC-6: set mode 640x480 0
<6>[    3.240122] [drm] TV-12: set mode NTSC 480i 0
<3>[    3.473071] render error detected, EIR: 0x00000010
<3>[    3.473074] page table error
<3>[    3.473076]   PGTBL_ER: 0x00000010
<3>[    3.473079] [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010,
masking
<3>[    3.473088] render error detected, EIR: 0x00000010
<3>[    3.473090] page table error
<3>[    3.473092]   PGTBL_ER: 0x00000010
<6>[    3.479417] [drm] TMDS-8: set mode 1280x1024 1b
<4>[    3.756503] Console: switching to colour frame buffer device 160x64
<6>[    3.762767] [drm] fb0: inteldrmfb frame buffer device
<6>[    3.762816] [drm] Initialized i915 1.6.0 20080730 for 0000:00:02.0
on minor 0

/var/log/messages

an 20 23:22:29 golem kernel: [   64.838521] ------------[ cut here
]------------
Jan 20 23:22:29 golem kernel: [   64.838535] WARNING: at
drivers/gpu/drm/i915/i915_gem.c:3256 i915_gem_object_pin+0x4f/0x16c()
Jan 20 23:22:29 golem kernel: [   64.838538] Hardware name:
Jan 20 23:22:29 golem kernel: [   64.838540] Modules linked in: pppoe
pppox ip6t_LOG ipt_MASQUERADE xt_pkttype xt_TCPMSS xt_tcpudp ipt_LOG
xt_limit iptable_nat nf_nat af_packet ppp_generic slhc ip6t_REJECT
nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT
xt_state iptable_raw iptable_filter ip6table_mangle
nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4
ip_tables ip6table_filter ip6_tables x_tables ipv6
Jan 20 23:22:29 golem kernel: [   64.838576] Pid: 2031, comm: Xorg Not
tainted 2.6.38-rc1-git1-kape #27
Jan 20 23:22:29 golem kernel: [   64.838578] Call Trace:
Jan 20 23:22:29 golem kernel: [   64.838586]  [<c0124dee>] ?
warn_slowpath_common+0x65/0x7a
Jan 20 23:22:29 golem kernel: [   64.838590]  [<c02f9264>] ?
i915_gem_object_pin+0x4f/0x16c
Jan 20 23:22:29 golem kernel: [   64.838594]  [<c0124e12>] ?
warn_slowpath_null+0xf/0x13
Jan 20 23:22:29 golem kernel: [   64.838598]  [<c02f9264>] ?
i915_gem_object_pin+0x4f/0x16c
Jan 20 23:22:29 golem kernel: [   64.838603]  [<c02fb1df>] ?
i915_gem_execbuffer_reserve+0x123/0x2ab
Jan 20 23:22:29 golem kernel: [   64.838607]  [<c02fb99a>] ?
i915_gem_do_execbuffer+0x3f4/0xea1
Jan 20 23:22:29 golem kernel: [   64.838614]  [<c018889a>] ?
check_object+0x147/0x19e
Jan 20 23:22:29 golem kernel: [   64.838619]  [<c0188e1e>] ?
alloc_debug_processing+0xd8/0x11b
Jan 20 23:22:29 golem kernel: [   64.838623]  [<c02fc864>] ?
i915_gem_execbuffer2+0x55/0x17d
Jan 20 23:22:29 golem kernel: [   64.838627]  [<c02fc907>] ?
i915_gem_execbuffer2+0xf8/0x17d
Jan 20 23:22:29 golem kernel: [   64.838633]  [<c02df12c>] ?
drm_ioctl+0x283/0x33d
Jan 20 23:22:29 golem kernel: [   64.838637]  [<c02fc80f>] ?
i915_gem_execbuffer2+0x0/0x17d
Jan 20 23:22:29 golem kernel: [   64.838642]  [<c018c009>] ?
do_sync_read+0x89/0xc4
Jan 20 23:22:29 golem kernel: [   64.838646]  [<c02deea9>] ?
drm_ioctl+0x0/0x33d
Jan 20 23:22:29 golem kernel: [   64.838651]  [<c0198c3b>] ?
do_vfs_ioctl+0x4a5/0x4d6
Jan 20 23:22:29 golem kernel: [   64.838657]  [<c023921c>] ?
security_file_permission+0x6f/0x7a
Jan 20 23:22:29 golem kernel: [   64.838662]  [<c013e3c4>] ?
ktime_get_ts+0xe2/0xec
Jan 20 23:22:29 golem kernel: [   64.838666]  [<c0198cad>] ?
sys_ioctl+0x41/0x64
Jan 20 23:22:29 golem kernel: [   64.838671]  [<c010270c>] ?
sysenter_do_call+0x12/0x22
Jan 20 23:22:29 golem kernel: [   64.838674] ---[ end trace
a26556fb5c34bd18 ]---

Xorg.0.log

[    53.085] (II) intel(0): EDID vendor "ENC", prod id 5769
[    53.085] (II) intel(0): Using hsync ranges from config file
[    53.085] (II) intel(0): Using vrefresh ranges from config file
[    53.085] (II) intel(0): Printing DDC gathered Modelines:
[    53.085] (II) intel(0): Modeline "1280x1024"x0.0  108.00  1280 1328
1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz)
[    53.085] (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968
1056  600 601 605 628 +hsync +vsync (37.9 kHz)
[    53.085] (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752
800  480 490 492 525 -hsync -vsync (31.5 kHz)
[    53.085] (II) intel(0): Modeline "720x400"x0.0   28.32  720 738 846
900  400 412 414 449 -hsync +vsync (31.5 kHz)
[    53.085] (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048
1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz)
[    54.013] (II) intel(0): EDID vendor "ENC", prod id 5769
[    54.013] (II) intel(0): Using hsync ranges from config file
[    54.013] (II) intel(0): Using vrefresh ranges from config file
[    54.013] (II) intel(0): Printing DDC gathered Modelines:
[    54.013] (II) intel(0): Modeline "1280x1024"x0.0  108.00  1280 1328
1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz)
[    54.013] (II) intel(0): Modeline "800x600"x0.0   40.00  800 840 968
1056  600 601 605 628 +hsync +vsync (37.9 kHz)
[    54.014] (II) intel(0): Modeline "640x480"x0.0   25.18  640 656 752
800  480 490 492 525 -hsync -vsync (31.5 kHz)
[    54.014] (II) intel(0): Modeline "720x400"x0.0   28.32  720 738 846
900  400 412 414 449 -hsync +vsync (31.5 kHz)
[    54.014] (II) intel(0): Modeline "1024x768"x0.0   65.00  1024 1048
1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz)
[    64.838] (WW) intel(0): flip queue failed: Cannot allocate memory
[    64.838] (WW) intel(0): Page flip failed: Cannot allocate memory
[    64.838] (EE) intel(0): Failed to submit batch buffer, expect
rendering corruption or even a frozen display: Cannot allocate memory.
[    64.843] (WW) intel(0): flip queue failed: Cannot allocate memory
[    64.843] (WW) intel(0): Page flip failed: Cannot allocate memory
[    64.843] (EE) intel(0): Failed to submit batch buffer, expect
rendering corruption or even a frozen display: Cannot allocate memory.
[    64.846] (WW) intel(0): flip queue failed: Cannot allocate memory
[    64.846] (WW) intel(0): Page flip failed: Cannot allocate memory
[    64.846] (EE) intel(0): Failed to submit batch buffer, expect
rendering corruption or even a frozen display: Cannot allocate memory.
[    64.853] (WW) intel(0): flip queue failed: Cannot allocate memory
[    64.853] (WW) intel(0): Page flip failed: Cannot allocate memory
[    64.853] (EE) intel(0): Failed to submit batch buffer, expect
rendering corruption or even a frozen display: Cannot allocate memory.
[    64.859] (WW) intel(0): flip queue failed: Cannot allocate memory
[    64.859] (WW) intel(0): Page flip failed: Cannot allocate memory
[    64.859] (EE) intel(0): Failed to submit batch buffer, expect
rendering corruption or even a frozen display: Cannot allocate memory.
[    64.866] (WW) intel(0): flip queue failed: Cannot allocate memory





> On Thu, Jan 20, 2011 at 9:29 AM, Knut Petersen
> <Knut_Petersen@t-online.de> wrote:
>   
>> Kernel 2.6.38-rc1 and -git1 will lock my AOpen i915GMm-HFS
>> at the end of  KDE startup if automatic process group scheduling
>> is actived in kernel config. A hard reset is necessary.
>> Without automatic process group scheduling everything is ok.
>>     
> Interesting. Most likely timing-related, but maybe there's some actual
> memory corruption. Adding the scheduler guys just in case.
>
> It might be interesting to see if enabling SLUB debugging makes any
> difference. Interesting for two reasons:
>
>  - it may just make the problem go away because it changes timings
> radically enough (which is the bad case, since that doesn't really
> help us very much)
>
>  - maybe it's not timing-related, and instead shows some slab misuse
> and corruption that explains the problem.
>
> I dunno.
>
>   
>> Reproducibility of bug: 100 %
>> System: AOpen i915GMm-Hfs, 2GB, Pentium M
>> Distribution: openSuSE 11.3
>>
>> cu,
>>  Knut
>>
>> Jan 20 17:57:07 golem kernel: [   58.087054] ------------[ cut here ]------------
>> Jan 20 17:57:07 golem kernel: [   58.087117] kernel BUG at drivers/gpu/drm/i915/i915_gem.c:3254!
>>     
> Grr. Hate people who do BUG_ON() calls that kill the machine and make
> things harder to debug.
>
> What happens if you replace that
>
>   BUG_ON(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT);
>
> with a
>
>   if (WARN_ON_ONCE(obj->pin_count == DRM_I915_GEM_OBJECT_MAX_PIN_COUNT))
>     return -ENOMEM;
>
> or similar? Does it limp along? I'm not suggesting that as a fix
> (obviously), but I do think that we have way too many BUG_ON's, and
> too few people thinking about "how can I make the machine possibly
> limp on so that the oops is easier to see and report"
>
>                      Linus
>
>   


      parent reply	other threads:[~2011-01-20 22:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-20 17:29 [BUG] 2.6.38-rc1-git1: hard lockup related to i915 / automated cgroup scheduling Knut Petersen
2011-01-20 17:58 ` Linus Torvalds
2011-01-20 17:58   ` Linus Torvalds
2011-01-20 19:17   ` Mike Galbraith
2011-01-20 22:57   ` Knut Petersen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4D38BDF4.6040509@t-online.de \
    --to=knut_petersen@t-online.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=airlied@linux.ie \
    --cc=efault@gmx.de \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=jesse.barnes@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.