Re: I've got the RC6 bug

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Daniel Vetter <daniel@ffwll.ch>
To: CC <ccomren@gmail.com>
Cc: intel-gfx@lists.freedesktop.org, Ben Widawsky <ben@bwidawsk.net>
Subject: Re: I've got the RC6 bug
Date: Fri, 20 Jan 2012 11:46:07 +0100	[thread overview]
Message-ID: <20120120104607.GE4151@phenom.ffwll.local> (raw)
In-Reply-To: <20120120103024.GD4151@phenom.ffwll.local>

On Fri, Jan 20, 2012 at 11:30:24AM +0100, Daniel Vetter wrote:
> On Wed, Jan 18, 2012 at 01:24:26AM +0100, Daniel Vetter wrote:
> > On Wed, Jan 18, 2012 at 01:16:02AM +0100, CC wrote:
> > > On Mon, Jan 16, 2012 at 5:36 PM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > 
> > > > On Mon, Jan 16, 2012 at 05:18:17PM +0100, CC wrote:
> > > > > Hi,
> > > > >
> > > > > I've heard that you need users having the RC6 bug.
> > > > >
> > > > > I have the following setup:
> > > > > CPU: Intel Core i5-2500K
> > > > > Mainboard: ASRock Z68 Pro3-M
> > > > > Memory: Corsair Vengeance CMZ8GX3M2A1866C9
> > > > >
> > > > > Although the CPU doesn't support VT-d, I disabled all virtualization
> > > > > support in the UEFI setup.
> > > > >
> > > > > I use Arch Linux and Gnome 3 in the fallback mode. The problem is more
> > > > > drastic without fallback mode, however.
> > > > >
> > > > > Whenever I enable RC6, I get the a few of these errors in dmesg:
> > > > >
> > > > > [   48.900000] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387
> > > > > __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
> > > > > [   48.900002] Hardware name: To Be Filled By O.E.M.
> > > > > [   48.900002] Modules linked in: ipv6 fuse ext2 snd_hda_codec_hdmi
> > > > > snd_hda_codec_realtek mei(C) joydev r8169 shpchp pci_hotplug usbhid hid
> > > > > snd_hda_intel iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_hda_codec
> > > > > processor snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc
> > > > psmouse
> > > > > serio_raw pcspkr evdev ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore
> > > > > i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core
> > > > > video sd_mod ahci libahci libata scsi_mod
> > > > > [   48.900019] Pid: 623, comm: Xorg Tainted: G        WC  3.1.9-2-ARCH #1
> > > > > [   48.900020] Call Trace:
> > > > > [   48.900023]  [<ffffffff81061bef>] warn_slowpath_common+0x7f/0xc0
> > > > > [   48.900025]  [<ffffffff81061c4a>] warn_slowpath_null+0x1a/0x20
> > > > > [   48.900028]  [<ffffffffa00e0764>] __gen6_gt_wait_for_fifo+0x94/0xa0
> > > > > [i915]
> > > > > [   48.900032]  [<ffffffffa015d2d5>] ring_write_tail+0x65/0x120 [i915]
> > > > > [   48.900036]  [<ffffffffa01619bc>] render_ring_flush+0xbc/0xe0 [i915]
> > > > > [   48.900040]  [<ffffffffa010b803>] i915_gem_flush_ring+0x43/0x250
> > > > [i915]
> > > > > [   48.900044]  [<ffffffffa0112b50>]
> > > > > i915_gem_do_execbuffer.isra.7+0x1020/0x16d0 [i915]
> > > > > [   48.900048]  [<ffffffffa01136bb>] i915_gem_execbuffer2+0x8b/0x240
> > > > [i915]
> > > > > [   48.900051]  [<ffffffffa0098434>] drm_ioctl+0x3e4/0x4c0 [drm]
> > > > > [   48.900053]  [<ffffffff810746cb>] ? recalc_sigpending+0x1b/0x50
> > > > > [   48.900057]  [<ffffffffa0113630>] ? i915_gem_execbuffer+0x430/0x430
> > > > > [i915]
> > > > > [   48.900059]  [<ffffffff8101e9b1>] ? fpu_finit+0x21/0x40
> > > > > [   48.900061]  [<ffffffff8116fddf>] do_vfs_ioctl+0x8f/0x500
> > > > > [   48.900063]  [<ffffffff81014beb>] ? sys_rt_sigreturn+0x1eb/0x200
> > > > > [   48.900064]  [<ffffffff811702e1>] sys_ioctl+0x91/0xa0
> > > > > [   48.900066]  [<ffffffff8140c3c2>] system_call_fastpath+0x16/0x1b
> > > > > [   48.900067] ---[ end trace 9a23b8b32b16a424 ]---
> > > >
> > > > This is a known side-effect of a dying gpu. It essentially means that the
> > > > gpu refuses to wake up from deep-sleep states.
> > > >
> > > > > and then
> > > > >
> > > > > [   53.163526] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [   53.165046] [drm] capturing error event; look for more information in
> > > > > /debug/dri/0/i915_error_state
> > > > > [   53.177356] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1593 at 1592, next 1594)
> > > > > [   53.181979] [drm:init_ring_common] *ERROR* render ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   53.185522] [drm:init_ring_common] *ERROR* gen6 bsd ring
> > > > initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   53.188558] [drm:init_ring_common] *ERROR* blt ring initialization
> > > > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > > > [   55.330146] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > > > elapsed... GPU hung
> > > > > [   55.332202] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > > > -11 (awaiting 1594 at 1591, next 1595)
> > > > > [   55.333258] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring
> > > > > wedged!
> > > > > [   55.333260] [drm:i915_reset] *ERROR* Failed to reset chip.
> > > > >
> > > > > Of course, I'd be willing to test out stuff. I'd need a bit of guide,
> > > > > however.
> > > >
> > > > Can you please attach i915_error_state from debugfs (you need to retrigger
> > > > the issue)? It contains a gpu dump which is useful to diagnose the bug.
> > > >
> > > > Yours, Daniel
> > > > --
> > > > Daniel Vetter
> > > > Mail: daniel@ffwll.ch
> > > > Mobile: +41 (0)79 365 57 48
> > > >
> > > 
> > > I attached the error state.
> > 
> > Nice one, your gpu seems to have simply disappeared. And the ringbuffer
> > contains a rather peculiar cmd sequence. Putting Chris (maybe he
> > recognizes the pattern) and Ben (he's got a patch in the works to dump a
> > debug register that might be interesting here) on cc. It's too late atm
> > for me to think about this some more.
> 
> Chris and me looked some more at this one and it's a keeper. Can you
> please file a bug report on bugs.freedesktop.org against drm/i915 with the
> usual details and these 2 error_states attached.

Chris just had a new idea that would explain your error_state rather
neatly. Can you try the latest drm-intel-fixes branch from

https://git.kernel.org/?p=linux/kernel/git/keithp/linux.git;a=summary

That contains a forcewake locking fix, the lack of which would explain all
the 0s in the registers of your dump (assuming the gpu went to sleep for
whatever reasons).

Thanks, Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

     prev parent reply	other threads:[~2012-01-20 10:46 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-16 16:18 I've got the RC6 bug CC
2012-01-16 16:36 ` Daniel Vetter
2012-01-18  0:16   ` CC
2012-01-18  0:24     ` Daniel Vetter
2012-01-18 11:17       ` Chris Wilson
2012-01-18 17:51         ` Eric Anholt
2012-01-18 20:09           ` Daniel Vetter
2012-01-20 10:30       ` Daniel Vetter
2012-01-20 10:46         ` Daniel Vetter [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120120104607.GE4151@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=ben@bwidawsk.net \
    --cc=ccomren@gmail.com \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.