dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 109692] deadlock occurs during GPU reset
Date: Wed, 20 Feb 2019 17:30:54 +0000	[thread overview]
Message-ID: <bug-109692-502@http.bugs.freedesktop.org/> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 8239 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

            Bug ID: 109692
           Summary: deadlock occurs during GPU reset
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mikhail.v.gavrilov@gmail.com

Created attachment 143419
  --> https://bugs.freedesktop.org/attachment.cgi?id=143419&action=edit
dmesg

Steps for reproduce:
1. $ git clone git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next
2. $ make bzImage && make module
3. # make modules_install && make install
4. Launch "Shadow of the Tomb Raider"
--- Here GPU hung occurs ---
and after few time 
--- Here start GPU reset ---
--- Here Deadlock occurs ---

[  291.746741] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:158 vmid:7 pasid:32774, for process SOTTR.exe pid 5250 thread SOTTR.exe
pid 5250)
[  291.746750] amdgpu 0000:0b:00.0:   in page starting at address
0x0000000000002000 from 27
[  291.746754] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C
[  297.135183] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.255032] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.265813] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=13292, emitted seq=13293
[  302.265950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process SOTTR.exe pid 5250 thread SOTTR.exe pid 5250
[  302.265974] amdgpu 0000:0b:00.0: GPU reset begin!

[  302.266337] ======================================================
[  302.266338] WARNING: possible circular locking dependency detected
[  302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G         C       
[  302.266341] ------------------------------------------------------
[  302.266343] kworker/5:2/871 is trying to acquire lock:
[  302.266345] 000000000abbb16a (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[  302.266352] 
               but task is already holding lock:
[  302.266353] 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266358] 
               which lock already depends on the new lock.

[  302.266360] 
               the existing dependency chain (in reverse order) is:
[  302.266361] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[  302.266366]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[  302.266368]        dma_fence_signal+0x111/0x1a0
[  302.266414]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[  302.266470]        sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
[  302.266523]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[  302.266576]        amdgpu_ih_process+0x84/0xf0 [amdgpu]
[  302.266628]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[  302.266632]        __handle_irq_event_percpu+0x3f/0x290
[  302.266635]        handle_irq_event_percpu+0x31/0x80
[  302.266637]        handle_irq_event+0x34/0x51
[  302.266639]        handle_edge_irq+0x7c/0x1a0
[  302.266643]        handle_irq+0xbf/0x100
[  302.266646]        do_IRQ+0x61/0x120
[  302.266648]        ret_from_intr+0x0/0x22
[  302.266651]        cpuidle_enter_state+0xbf/0x470
[  302.266654]        do_idle+0x1ec/0x280
[  302.266657]        cpu_startup_entry+0x19/0x20
[  302.266660]        start_secondary+0x1b3/0x200
[  302.266663]        secondary_startup_64+0xa4/0xb0
[  302.266664] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[  302.266668]        _raw_spin_lock_irqsave+0x49/0x83
[  302.266670]        dma_fence_remove_callback+0x1a/0x60
[  302.266673]        drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.266717]        amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.266761]        amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.266822]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.266827]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.266831]        process_one_work+0x272/0x5d0
[  302.266834]        worker_thread+0x50/0x3b0
[  302.266836]        kthread+0x108/0x140
[  302.266839]        ret_from_fork+0x27/0x50
[  302.266840] 
               other info that might help us debug this:

[  302.266841]  Possible unsafe locking scenario:

[  302.266842]        CPU0                    CPU1
[  302.266843]        ----                    ----
[  302.266844]   lock(&(&sched->job_list_lock)->rlock);
[  302.266846]                               
lock(&(&ring->fence_drv.lock)->rlock);
[  302.266847]                               
lock(&(&sched->job_list_lock)->rlock);
[  302.266849]   lock(&(&ring->fence_drv.lock)->rlock);
[  302.266850] 
                *** DEADLOCK ***

[  302.266852] 5 locks held by kworker/5:2/871:
[  302.266853]  #0: 00000000d133fb6e ((wq_completion)"events"){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266857]  #1: 000000008a5c3f7e
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266862]  #2: 00000000b9b2c76f (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
[  302.266908]  #3: 00000000ac637728 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[  302.266965]  #4: 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266971] 
               stack backtrace:
[  302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G         C       
5.0.0-rc1-drm-next-kernel+ #1
[  302.266976] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[  302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
[  302.266982] Call Trace:
[  302.266987]  dump_stack+0x85/0xc0
[  302.266991]  print_circular_bug.isra.0.cold+0x15c/0x195
[  302.266994]  __lock_acquire+0x134c/0x1660
[  302.266998]  ? add_lock_to_list.isra.0+0x67/0xb0
[  302.267003]  lock_acquire+0xa2/0x1b0
[  302.267006]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267011]  _raw_spin_lock_irqsave+0x49/0x83
[  302.267013]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267016]  dma_fence_remove_callback+0x1a/0x60
[  302.267020]  drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.267065]  amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.267110]  amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.267173]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.267178]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.267183]  process_one_work+0x272/0x5d0
[  302.267188]  worker_thread+0x50/0x3b0
[  302.267191]  kthread+0x108/0x140
[  302.267194]  ? process_one_work+0x5d0/0x5d0
[  302.267196]  ? kthread_park+0x90/0x90
[  302.267199]  ret_from_fork+0x27/0x50
[  302.692194] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  302.692234] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  302.768931] amdgpu 0000:0b:00.0: GPU BACO reset
[  303.278874] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[  303.279006] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[  303.279072] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  303.279234] [drm] PSP is resuming...
[  303.426601] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[  303.572227] [drm] UVD and UVD ENC initialized successfully.
[  303.687727] [drm] VCE initialized successfully.
[  303.689585] [drm] recover vram bo from shadow start
[  303.722757] [drm] recover vram bo from shadow done
[  303.722761] [drm] Skip scheduling IBs!
[  303.722791] amdgpu 0000:0b:00.0: GPU reset(2) succeeded!
[  303.722811] [drm] Skip scheduling IBs!
[  303.722838] [drm] Skip scheduling IBs!
[  303.722846] [drm] Skip scheduling IBs!
[  303.722854] [drm] Skip scheduling IBs!
[  303.722863] [drm] Skip scheduling IBs!
[  303.722871] [drm] Skip scheduling IBs!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 9817 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

             reply	other threads:[~2019-02-20 17:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-20 17:30 bugzilla-daemon [this message]
2019-02-20 19:27 ` [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
2019-02-20 19:41 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:02 ` bugzilla-daemon
2019-02-20 21:06 ` bugzilla-daemon
2019-02-21 11:46 ` bugzilla-daemon
2019-02-21 14:23 ` bugzilla-daemon
2019-02-21 17:23 ` bugzilla-daemon
2019-02-26 10:42 ` bugzilla-daemon
2019-02-26 10:43 ` bugzilla-daemon
2019-02-26 15:17 ` bugzilla-daemon
2019-02-26 15:20 ` bugzilla-daemon
2019-02-27  4:24 ` bugzilla-daemon
2019-02-27  4:25 ` bugzilla-daemon
2019-02-27  9:23 ` bugzilla-daemon
2019-02-28 16:43 ` bugzilla-daemon
2019-02-28 16:45 ` bugzilla-daemon
2019-04-05 20:41 ` bugzilla-daemon
2019-04-05 20:51 ` bugzilla-daemon
2019-04-06 14:44 ` bugzilla-daemon
2019-04-06 14:48 ` bugzilla-daemon
2019-04-08 14:46 ` bugzilla-daemon
2019-04-08 14:48 ` bugzilla-daemon
2019-04-08 21:28 ` bugzilla-daemon
2019-04-09  1:51 ` bugzilla-daemon
2019-04-09  3:15 ` bugzilla-daemon
2019-04-09  3:16 ` bugzilla-daemon
2019-04-09 16:34 ` bugzilla-daemon
2019-04-10  3:41 ` bugzilla-daemon
2019-04-10 14:25 ` bugzilla-daemon
2019-04-10 19:01 ` bugzilla-daemon
2019-04-10 20:11 ` bugzilla-daemon
2019-04-10 20:14 ` bugzilla-daemon
2019-04-11  3:53 ` bugzilla-daemon
2019-04-11  4:03 ` bugzilla-daemon
2019-04-11 14:25 ` bugzilla-daemon
2019-04-11 14:41 ` bugzilla-daemon
2019-04-11 14:52 ` bugzilla-daemon
2019-04-11 16:32 ` bugzilla-daemon
2019-04-11 16:42 ` bugzilla-daemon
2019-05-12 20:58 ` bugzilla-daemon
2019-05-12 21:00 ` bugzilla-daemon
2019-11-19  9:14 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109692-502@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).