public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: Daniel Vetter <daniel@ffwll.ch>
To: Chris Wilson <chris@chris-wilson.co.uk>,
	Daniel Vetter <daniel@ffwll.ch>,
	intel-gfx@lists.freedesktop.org,
	John Harrison <John.C.Harrison@Intel.com>,
	Daniel Vetter <daniel.vetter@ffwll.ch>
Subject: Re: [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent
Date: Fri, 20 Mar 2015 15:32:52 +0100	[thread overview]
Message-ID: <20150320143252.GO1349@phenom.ffwll.local> (raw)
In-Reply-To: <20150320133951.GF10812@nuc-i3427.alporthouse.com>

On Fri, Mar 20, 2015 at 01:39:51PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 01:02:10PM +0000, Chris Wilson wrote:
> > On Fri, Mar 20, 2015 at 11:06:57AM +0100, Daniel Vetter wrote:
> > > On Thu, Mar 19, 2015 at 10:17:42PM +0000, Chris Wilson wrote:
> > > > On Thu, Mar 19, 2015 at 06:37:28PM +0100, Daniel Vetter wrote:
> > > > > On Wed, Mar 18, 2015 at 06:19:22PM +0000, Chris Wilson wrote:
> > > > > > 	WARNING: CPU: 0 PID: 1383 at drivers/gpu/drm/i915/i915_gem_evict.c:279 i915_gem_evict_vm+0x10c/0x140()
> > > > > > 	WARN_ON(!list_empty(&vm->active_list))
> > > > > 
> > > > > How does this come about - we call gpu_idle before this seems to blow up,
> > > > > so all requests should be completed?
> > > > 
> > > > Honestly, I couldn't figure it out either. I had an epiphany when I saw
> > > > that we could now have an empty request list but non-empty active list
> > > > added a test to detect when that happens and shouted eureka when the
> > > > WARN fired. I could trigger the WARN in evict_vm pretty reliably, but
> > > > not since this patch. It could just be masking another bug.
> > > 
> > > Can you perhaps double-check the theory by putting a
> > > WARN_ON(list_empty(active_list) != list_empyt(request_list)) into
> > > gpu_idle? Ofc with this patch reverted so that the bug surfaces again.
> > 
> > [ 5215.567573] [drm:i915_verify_lists] *ERROR* render ring: active list not empty, but no requests
> > [ 5215.567586] ------------[ cut here ]------------
> > [ 5215.567598] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem.c:3166 i915_gpu_idle+0x88/0x90()
> > [ 5215.567602] WARN_ON(i915_verify_lists(dev))
> > [ 5215.567606] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.567708] CPU: 0 PID: 1304 Comm: Xorg Tainted: G        W  OE   4.0.0-rc4+ #108
> > [ 5215.567713] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.567718]  00000000 00000000 f46e1b98 c16b3e19 f46e1bd8 f46e1bc8 c1047f17 c1937e78
> > [ 5215.567733]  f46e1bf4 00000518 c1937cec 00000c5e c14441e8 c14441e8 e733bdc8 00000000
> > [ 5215.567747]  f6346c00 f46e1be0 c1047f83 00000009 f46e1bd8 c1937e78 f46e1bf4 f46e1c00
> > [ 5215.567762] Call Trace:
> > [ 5215.567776]  [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.567788]  [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.567797]  [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567805]  [<c14441e8>] ? i915_gpu_idle+0x88/0x90
> > [ 5215.567815]  [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.567823]  [<c14441e8>] i915_gpu_idle+0x88/0x90
> > [ 5215.567833]  [<c1439949>] i915_gem_evict_something+0x269/0x300
> > [ 5215.567843]  [<c144754f>] i915_gem_object_do_pin+0x6ef/0xb20
> > [ 5215.567854]  [<c14479c5>] i915_gem_object_pin+0x45/0x50
> > [ 5215.567864]  [<c1439f08>] i915_gem_execbuffer_reserve_vma.isra.13+0x78/0x180
> > [ 5215.567874]  [<c143a2e5>] i915_gem_execbuffer_reserve+0x2d5/0x320
> > [ 5215.567884]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567894]  [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.567906]  [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.567915]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.567925]  [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.567934]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567944]  [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.567954]  [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.567963]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.567975]  [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.567984]  [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.567994]  [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568005]  [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568095]  [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568136]  [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568175]  [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568198] ---[ end trace ab3f7e4953cb9eb6 ]---
> > [ 5215.568272] ------------[ cut here ]------------
> > [ 5215.568288] WARNING: CPU: 0 PID: 1304 at drivers/gpu/drm/i915/i915_gem_evict.c:283 i915_gem_evict_vm+0x10c/0x140()
> > [ 5215.568292] WARN_ON(!list_empty(&vm->active_list))
> > [ 5215.568296] Modules linked in: ctr ccm arc4 ath9k ath9k_common ath9k_hw bnep ath mac80211 rfcomm snd_hda_codec_conexant snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec uvcvideo snd_hwdep snd_pcm gpio_ich videobuf2_vmalloc dell_wmi cfg80211 videobuf2_memops sparse_keymap videobuf2_core dell_laptop snd_seq_midi v4l2_common dcdbas snd_seq_midi_event btusb videodev i8k snd_rawmidi snd_seq hid_multitouch coretemp bluetooth microcode snd_seq_device joydev snd_timer serio_raw snd shpchp soundcore wmi lpc_ich usbhid hid psmouse ahci libahci
> > [ 5215.568383] CPU: 0 PID: 1304 Comm: Xorg Tainted: G        W  OE   4.0.0-rc4+ #108
> > [ 5215.568388] Hardware name: Dell Inc. Inspiron 1090/Inspiron 1090, BIOS A06 08/23/2011
> > [ 5215.568393]  00000000 00000000 f46e1cc0 c16b3e19 f46e1d00 f46e1cf0 c1047f17 c193712c
> > [ 5215.568407]  f46e1d1c 00000518 c19370d0 0000011b c1439c6c c1439c6c f3b225b0 e733c3ec
> > [ 5215.568421]  00000001 f46e1d08 c1047f83 00000009 f46e1d00 c193712c f46e1d1c f46e1d28
> > [ 5215.568435] Call Trace:
> > [ 5215.568445]  [<c16b3e19>] dump_stack+0x41/0x52
> > [ 5215.568455]  [<c1047f17>] warn_slowpath_common+0x87/0xc0
> > [ 5215.568465]  [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568474]  [<c1439c6c>] ? i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568483]  [<c1047f83>] warn_slowpath_fmt+0x33/0x40
> > [ 5215.568492]  [<c1439c6c>] i915_gem_evict_vm+0x10c/0x140
> > [ 5215.568502]  [<c143a236>] i915_gem_execbuffer_reserve+0x226/0x320
> > [ 5215.568511]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568521]  [<c143b6d9>] i915_gem_do_execbuffer.isra.17+0x5c9/0xdd0
> > [ 5215.568532]  [<c112efdb>] ? vm_mmap_pgoff+0x7b/0xa0
> > [ 5215.568541]  [<c11594cd>] ? __kmalloc+0x14d/0x190
> > [ 5215.568550]  [<c143cfeb>] i915_gem_execbuffer2+0x8b/0x2c0
> > [ 5215.568560]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568568]  [<c1401d67>] drm_ioctl+0x1b7/0x510
> > [ 5215.568577]  [<c1120a9a>] ? balance_dirty_pages_ratelimited+0x1a/0x6a0
> > [ 5215.568587]  [<c143cf60>] ? i915_gem_execbuffer+0x4e0/0x4e0
> > [ 5215.568599]  [<c113cef9>] ? handle_mm_fault+0x329/0x1250
> > [ 5215.568607]  [<c1401bb0>] ? drm_getmap+0xb0/0xb0
> > [ 5215.568616]  [<c117d9ca>] do_vfs_ioctl+0x30a/0x530
> > [ 5215.568626]  [<c10a9e92>] ? ktime_get_ts64+0x52/0x1a0
> > [ 5215.568635]  [<c1185f62>] ? __fget_light+0x22/0x60
> > [ 5215.568644]  [<c117dc50>] SyS_ioctl+0x60/0x90
> > [ 5215.568653]  [<c16b9bc8>] sysenter_do_call+0x12/0x12
> > [ 5215.568659] ---[ end trace ab3f7e4953cb9eb7 ]---

Ok, at least we have clear evidence now that the lists indeed seem to get
out of sync.

> Ah, so what it boils down to is that i915_gpu_idle() is a no-op here is
> list_empty(ring->request_list)) [intel_ring_idle:2176].
> 
> Missing link discovered, I think the bug fixed by the patch is indeed
> the same one that triggered the first WARN.

But if we do that short-circuiting in ring_idle the all the requests
_should_ be completed. Which meanse retire_request_ring should move all
buffers to the inactive list, even when we do that before retiring
requests.

I'm still baffled and don't really understand what's going on here ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2015-03-20 14:31 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-18 18:19 [PATCH] drm/i915: Keep ring->active_list and ring->requests_list consistent Chris Wilson
2015-03-19 11:18 ` shuang.he
2015-03-19 17:37 ` Daniel Vetter
2015-03-19 22:17   ` Chris Wilson
2015-03-20 10:06     ` Daniel Vetter
2015-03-20 13:02       ` Chris Wilson
2015-03-20 13:39         ` Chris Wilson
2015-03-20 14:32           ` Daniel Vetter [this message]
2015-03-20 14:45             ` Chris Wilson
2015-03-20 15:00               ` Daniel Vetter
2015-03-20 15:04                 ` Chris Wilson
2015-03-20 15:33                   ` Daniel Vetter
2015-03-20 15:36                     ` Chris Wilson
2015-03-23  8:43                       ` Daniel Vetter
2015-03-23  8:49 ` Daniel Vetter
2015-03-23  9:13   ` Chris Wilson
2015-03-23  9:15     ` Chris Wilson
2015-03-23  9:40       ` Daniel Vetter
2015-03-25 11:43   ` Jani Nikula

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150320143252.GO1349@phenom.ffwll.local \
    --to=daniel@ffwll.ch \
    --cc=John.C.Harrison@Intel.com \
    --cc=chris@chris-wilson.co.uk \
    --cc=daniel.vetter@ffwll.ch \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox