From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 109692] deadlock occurs during GPU reset
Date: Wed, 20 Feb 2019 17:30:54 +0000 [thread overview]
Message-ID: <bug-109692-502@http.bugs.freedesktop.org/> (raw)
[-- Attachment #1.1: Type: text/plain, Size: 8239 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=109692
Bug ID: 109692
Summary: deadlock occurs during GPU reset
Product: DRI
Version: XOrg git
Hardware: Other
OS: All
Status: NEW
Severity: normal
Priority: medium
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: mikhail.v.gavrilov@gmail.com
Created attachment 143419
--> https://bugs.freedesktop.org/attachment.cgi?id=143419&action=edit
dmesg
Steps for reproduce:
1. $ git clone git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next
2. $ make bzImage && make module
3. # make modules_install && make install
4. Launch "Shadow of the Tomb Raider"
--- Here GPU hung occurs ---
and after few time
--- Here start GPU reset ---
--- Here Deadlock occurs ---
[ 291.746741] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:158 vmid:7 pasid:32774, for process SOTTR.exe pid 5250 thread SOTTR.exe
pid 5250)
[ 291.746750] amdgpu 0000:0b:00.0: in page starting at address
0x0000000000002000 from 27
[ 291.746754] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C
[ 297.135183] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[ 302.255032] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[ 302.265813] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=13292, emitted seq=13293
[ 302.265950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process SOTTR.exe pid 5250 thread SOTTR.exe pid 5250
[ 302.265974] amdgpu 0000:0b:00.0: GPU reset begin!
[ 302.266337] ======================================================
[ 302.266338] WARNING: possible circular locking dependency detected
[ 302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G C
[ 302.266341] ------------------------------------------------------
[ 302.266343] kworker/5:2/871 is trying to acquire lock:
[ 302.266345] 000000000abbb16a (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[ 302.266352]
but task is already holding lock:
[ 302.266353] 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x140 [gpu_sched]
[ 302.266358]
which lock already depends on the new lock.
[ 302.266360]
the existing dependency chain (in reverse order) is:
[ 302.266361]
-> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[ 302.266366] drm_sched_process_job+0x4d/0x180 [gpu_sched]
[ 302.266368] dma_fence_signal+0x111/0x1a0
[ 302.266414] amdgpu_fence_process+0xa3/0x100 [amdgpu]
[ 302.266470] sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
[ 302.266523] amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[ 302.266576] amdgpu_ih_process+0x84/0xf0 [amdgpu]
[ 302.266628] amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[ 302.266632] __handle_irq_event_percpu+0x3f/0x290
[ 302.266635] handle_irq_event_percpu+0x31/0x80
[ 302.266637] handle_irq_event+0x34/0x51
[ 302.266639] handle_edge_irq+0x7c/0x1a0
[ 302.266643] handle_irq+0xbf/0x100
[ 302.266646] do_IRQ+0x61/0x120
[ 302.266648] ret_from_intr+0x0/0x22
[ 302.266651] cpuidle_enter_state+0xbf/0x470
[ 302.266654] do_idle+0x1ec/0x280
[ 302.266657] cpu_startup_entry+0x19/0x20
[ 302.266660] start_secondary+0x1b3/0x200
[ 302.266663] secondary_startup_64+0xa4/0xb0
[ 302.266664]
-> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[ 302.266668] _raw_spin_lock_irqsave+0x49/0x83
[ 302.266670] dma_fence_remove_callback+0x1a/0x60
[ 302.266673] drm_sched_stop+0x59/0x140 [gpu_sched]
[ 302.266717] amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[ 302.266761] amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[ 302.266822] amdgpu_job_timedout+0x109/0x130 [amdgpu]
[ 302.266827] drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[ 302.266831] process_one_work+0x272/0x5d0
[ 302.266834] worker_thread+0x50/0x3b0
[ 302.266836] kthread+0x108/0x140
[ 302.266839] ret_from_fork+0x27/0x50
[ 302.266840]
other info that might help us debug this:
[ 302.266841] Possible unsafe locking scenario:
[ 302.266842] CPU0 CPU1
[ 302.266843] ---- ----
[ 302.266844] lock(&(&sched->job_list_lock)->rlock);
[ 302.266846]
lock(&(&ring->fence_drv.lock)->rlock);
[ 302.266847]
lock(&(&sched->job_list_lock)->rlock);
[ 302.266849] lock(&(&ring->fence_drv.lock)->rlock);
[ 302.266850]
*** DEADLOCK ***
[ 302.266852] 5 locks held by kworker/5:2/871:
[ 302.266853] #0: 00000000d133fb6e ((wq_completion)"events"){+.+.}, at:
process_one_work+0x1e9/0x5d0
[ 302.266857] #1: 000000008a5c3f7e
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5d0
[ 302.266862] #2: 00000000b9b2c76f (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
[ 302.266908] #3: 00000000ac637728 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[ 302.266965] #4: 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x140 [gpu_sched]
[ 302.266971]
stack backtrace:
[ 302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G C
5.0.0-rc1-drm-next-kernel+ #1
[ 302.266976] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[ 302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
[ 302.266982] Call Trace:
[ 302.266987] dump_stack+0x85/0xc0
[ 302.266991] print_circular_bug.isra.0.cold+0x15c/0x195
[ 302.266994] __lock_acquire+0x134c/0x1660
[ 302.266998] ? add_lock_to_list.isra.0+0x67/0xb0
[ 302.267003] lock_acquire+0xa2/0x1b0
[ 302.267006] ? dma_fence_remove_callback+0x1a/0x60
[ 302.267011] _raw_spin_lock_irqsave+0x49/0x83
[ 302.267013] ? dma_fence_remove_callback+0x1a/0x60
[ 302.267016] dma_fence_remove_callback+0x1a/0x60
[ 302.267020] drm_sched_stop+0x59/0x140 [gpu_sched]
[ 302.267065] amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[ 302.267110] amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[ 302.267173] amdgpu_job_timedout+0x109/0x130 [amdgpu]
[ 302.267178] drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[ 302.267183] process_one_work+0x272/0x5d0
[ 302.267188] worker_thread+0x50/0x3b0
[ 302.267191] kthread+0x108/0x140
[ 302.267194] ? process_one_work+0x5d0/0x5d0
[ 302.267196] ? kthread_park+0x90/0x90
[ 302.267199] ret_from_fork+0x27/0x50
[ 302.692194] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[ 302.692234] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[ 302.768931] amdgpu 0000:0b:00.0: GPU BACO reset
[ 303.278874] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[ 303.279006] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[ 303.279072] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[ 303.279234] [drm] PSP is resuming...
[ 303.426601] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[ 303.572227] [drm] UVD and UVD ENC initialized successfully.
[ 303.687727] [drm] VCE initialized successfully.
[ 303.689585] [drm] recover vram bo from shadow start
[ 303.722757] [drm] recover vram bo from shadow done
[ 303.722761] [drm] Skip scheduling IBs!
[ 303.722791] amdgpu 0000:0b:00.0: GPU reset(2) succeeded!
[ 303.722811] [drm] Skip scheduling IBs!
[ 303.722838] [drm] Skip scheduling IBs!
[ 303.722846] [drm] Skip scheduling IBs!
[ 303.722854] [drm] Skip scheduling IBs!
[ 303.722863] [drm] Skip scheduling IBs!
[ 303.722871] [drm] Skip scheduling IBs!
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 9817 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
next reply other threads:[~2019-02-20 17:31 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-02-20 17:30 bugzilla-daemon [this message]
2019-02-20 19:27 ` [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
2019-02-20 19:41 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:02 ` bugzilla-daemon
2019-02-20 21:06 ` bugzilla-daemon
2019-02-21 11:46 ` bugzilla-daemon
2019-02-21 14:23 ` bugzilla-daemon
2019-02-21 17:23 ` bugzilla-daemon
2019-02-26 10:42 ` bugzilla-daemon
2019-02-26 10:43 ` bugzilla-daemon
2019-02-26 15:17 ` bugzilla-daemon
2019-02-26 15:20 ` bugzilla-daemon
2019-02-27 4:24 ` bugzilla-daemon
2019-02-27 4:25 ` bugzilla-daemon
2019-02-27 9:23 ` bugzilla-daemon
2019-02-28 16:43 ` bugzilla-daemon
2019-02-28 16:45 ` bugzilla-daemon
2019-04-05 20:41 ` bugzilla-daemon
2019-04-05 20:51 ` bugzilla-daemon
2019-04-06 14:44 ` bugzilla-daemon
2019-04-06 14:48 ` bugzilla-daemon
2019-04-08 14:46 ` bugzilla-daemon
2019-04-08 14:48 ` bugzilla-daemon
2019-04-08 21:28 ` bugzilla-daemon
2019-04-09 1:51 ` bugzilla-daemon
2019-04-09 3:15 ` bugzilla-daemon
2019-04-09 3:16 ` bugzilla-daemon
2019-04-09 16:34 ` bugzilla-daemon
2019-04-10 3:41 ` bugzilla-daemon
2019-04-10 14:25 ` bugzilla-daemon
2019-04-10 19:01 ` bugzilla-daemon
2019-04-10 20:11 ` bugzilla-daemon
2019-04-10 20:14 ` bugzilla-daemon
2019-04-11 3:53 ` bugzilla-daemon
2019-04-11 4:03 ` bugzilla-daemon
2019-04-11 14:25 ` bugzilla-daemon
2019-04-11 14:41 ` bugzilla-daemon
2019-04-11 14:52 ` bugzilla-daemon
2019-04-11 16:32 ` bugzilla-daemon
2019-04-11 16:42 ` bugzilla-daemon
2019-05-12 20:58 ` bugzilla-daemon
2019-05-12 21:00 ` bugzilla-daemon
2019-11-19 9:14 ` bugzilla-daemon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=bug-109692-502@http.bugs.freedesktop.org/ \
--to=bugzilla-daemon@freedesktop.org \
--cc=dri-devel@lists.freedesktop.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.