All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@freedesktop.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 109692] deadlock occurs during GPU reset
Date: Wed, 20 Feb 2019 17:30:54 +0000	[thread overview]
Message-ID: <bug-109692-502@http.bugs.freedesktop.org/> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 8239 bytes --]

https://bugs.freedesktop.org/show_bug.cgi?id=109692

            Bug ID: 109692
           Summary: deadlock occurs during GPU reset
           Product: DRI
           Version: XOrg git
          Hardware: Other
                OS: All
            Status: NEW
          Severity: normal
          Priority: medium
         Component: DRM/AMDgpu
          Assignee: dri-devel@lists.freedesktop.org
          Reporter: mikhail.v.gavrilov@gmail.com

Created attachment 143419
  --> https://bugs.freedesktop.org/attachment.cgi?id=143419&action=edit
dmesg

Steps for reproduce:
1. $ git clone git://people.freedesktop.org/~agd5f/linux -b
amd-staging-drm-next
2. $ make bzImage && make module
3. # make modules_install && make install
4. Launch "Shadow of the Tomb Raider"
--- Here GPU hung occurs ---
and after few time 
--- Here start GPU reset ---
--- Here Deadlock occurs ---

[  291.746741] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:158 vmid:7 pasid:32774, for process SOTTR.exe pid 5250 thread SOTTR.exe
pid 5250)
[  291.746750] amdgpu 0000:0b:00.0:   in page starting at address
0x0000000000002000 from 27
[  291.746754] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0070113C
[  297.135183] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.255032] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[  302.265813] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=13292, emitted seq=13293
[  302.265950] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process SOTTR.exe pid 5250 thread SOTTR.exe pid 5250
[  302.265974] amdgpu 0000:0b:00.0: GPU reset begin!

[  302.266337] ======================================================
[  302.266338] WARNING: possible circular locking dependency detected
[  302.266340] 5.0.0-rc1-drm-next-kernel+ #1 Tainted: G         C       
[  302.266341] ------------------------------------------------------
[  302.266343] kworker/5:2/871 is trying to acquire lock:
[  302.266345] 000000000abbb16a (&(&ring->fence_drv.lock)->rlock){-.-.}, at:
dma_fence_remove_callback+0x1a/0x60
[  302.266352] 
               but task is already holding lock:
[  302.266353] 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.}, at:
drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266358] 
               which lock already depends on the new lock.

[  302.266360] 
               the existing dependency chain (in reverse order) is:
[  302.266361] 
               -> #1 (&(&sched->job_list_lock)->rlock){-.-.}:
[  302.266366]        drm_sched_process_job+0x4d/0x180 [gpu_sched]
[  302.266368]        dma_fence_signal+0x111/0x1a0
[  302.266414]        amdgpu_fence_process+0xa3/0x100 [amdgpu]
[  302.266470]        sdma_v4_0_process_trap_irq+0x6e/0xa0 [amdgpu]
[  302.266523]        amdgpu_irq_dispatch+0xc0/0x250 [amdgpu]
[  302.266576]        amdgpu_ih_process+0x84/0xf0 [amdgpu]
[  302.266628]        amdgpu_irq_handler+0x1b/0x50 [amdgpu]
[  302.266632]        __handle_irq_event_percpu+0x3f/0x290
[  302.266635]        handle_irq_event_percpu+0x31/0x80
[  302.266637]        handle_irq_event+0x34/0x51
[  302.266639]        handle_edge_irq+0x7c/0x1a0
[  302.266643]        handle_irq+0xbf/0x100
[  302.266646]        do_IRQ+0x61/0x120
[  302.266648]        ret_from_intr+0x0/0x22
[  302.266651]        cpuidle_enter_state+0xbf/0x470
[  302.266654]        do_idle+0x1ec/0x280
[  302.266657]        cpu_startup_entry+0x19/0x20
[  302.266660]        start_secondary+0x1b3/0x200
[  302.266663]        secondary_startup_64+0xa4/0xb0
[  302.266664] 
               -> #0 (&(&ring->fence_drv.lock)->rlock){-.-.}:
[  302.266668]        _raw_spin_lock_irqsave+0x49/0x83
[  302.266670]        dma_fence_remove_callback+0x1a/0x60
[  302.266673]        drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.266717]        amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.266761]        amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.266822]        amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.266827]        drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.266831]        process_one_work+0x272/0x5d0
[  302.266834]        worker_thread+0x50/0x3b0
[  302.266836]        kthread+0x108/0x140
[  302.266839]        ret_from_fork+0x27/0x50
[  302.266840] 
               other info that might help us debug this:

[  302.266841]  Possible unsafe locking scenario:

[  302.266842]        CPU0                    CPU1
[  302.266843]        ----                    ----
[  302.266844]   lock(&(&sched->job_list_lock)->rlock);
[  302.266846]                               
lock(&(&ring->fence_drv.lock)->rlock);
[  302.266847]                               
lock(&(&sched->job_list_lock)->rlock);
[  302.266849]   lock(&(&ring->fence_drv.lock)->rlock);
[  302.266850] 
                *** DEADLOCK ***

[  302.266852] 5 locks held by kworker/5:2/871:
[  302.266853]  #0: 00000000d133fb6e ((wq_completion)"events"){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266857]  #1: 000000008a5c3f7e
((work_completion)(&(&sched->work_tdr)->work)){+.+.}, at:
process_one_work+0x1e9/0x5d0
[  302.266862]  #2: 00000000b9b2c76f (&adev->lock_reset){+.+.}, at:
amdgpu_device_lock_adev+0x17/0x40 [amdgpu]
[  302.266908]  #3: 00000000ac637728 (&dqm->lock_hidden){+.+.}, at:
kgd2kfd_pre_reset+0x30/0x60 [amdgpu]
[  302.266965]  #4: 000000006e32ba38 (&(&sched->job_list_lock)->rlock){-.-.},
at: drm_sched_stop+0x34/0x140 [gpu_sched]
[  302.266971] 
               stack backtrace:
[  302.266975] CPU: 5 PID: 871 Comm: kworker/5:2 Tainted: G         C       
5.0.0-rc1-drm-next-kernel+ #1
[  302.266976] Hardware name: System manufacturer System Product Name/ROG STRIX
X470-I GAMING, BIOS 1103 11/16/2018
[  302.266980] Workqueue: events drm_sched_job_timedout [gpu_sched]
[  302.266982] Call Trace:
[  302.266987]  dump_stack+0x85/0xc0
[  302.266991]  print_circular_bug.isra.0.cold+0x15c/0x195
[  302.266994]  __lock_acquire+0x134c/0x1660
[  302.266998]  ? add_lock_to_list.isra.0+0x67/0xb0
[  302.267003]  lock_acquire+0xa2/0x1b0
[  302.267006]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267011]  _raw_spin_lock_irqsave+0x49/0x83
[  302.267013]  ? dma_fence_remove_callback+0x1a/0x60
[  302.267016]  dma_fence_remove_callback+0x1a/0x60
[  302.267020]  drm_sched_stop+0x59/0x140 [gpu_sched]
[  302.267065]  amdgpu_device_pre_asic_reset+0x4f/0x240 [amdgpu]
[  302.267110]  amdgpu_device_gpu_recover+0x88/0x7d0 [amdgpu]
[  302.267173]  amdgpu_job_timedout+0x109/0x130 [amdgpu]
[  302.267178]  drm_sched_job_timedout+0x40/0x70 [gpu_sched]
[  302.267183]  process_one_work+0x272/0x5d0
[  302.267188]  worker_thread+0x50/0x3b0
[  302.267191]  kthread+0x108/0x140
[  302.267194]  ? process_one_work+0x5d0/0x5d0
[  302.267196]  ? kthread_park+0x90/0x90
[  302.267199]  ret_from_fork+0x27/0x50
[  302.692194] amdgpu 0000:0b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]]
*ERROR* ring kiq_2.1.0 test failed (-110)
[  302.692234] [drm:gfx_v9_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
[  302.768931] amdgpu 0000:0b:00.0: GPU BACO reset
[  303.278874] amdgpu 0000:0b:00.0: GPU reset succeeded, trying to resume
[  303.279006] [drm] PCIE GART of 512M enabled (table at 0x000000F400900000).
[  303.279072] [drm:amdgpu_device_gpu_recover [amdgpu]] *ERROR* VRAM is lost!
[  303.279234] [drm] PSP is resuming...
[  303.426601] [drm] reserve 0x400000 from 0xf400d00000 for PSP TMR SIZE
[  303.572227] [drm] UVD and UVD ENC initialized successfully.
[  303.687727] [drm] VCE initialized successfully.
[  303.689585] [drm] recover vram bo from shadow start
[  303.722757] [drm] recover vram bo from shadow done
[  303.722761] [drm] Skip scheduling IBs!
[  303.722791] amdgpu 0000:0b:00.0: GPU reset(2) succeeded!
[  303.722811] [drm] Skip scheduling IBs!
[  303.722838] [drm] Skip scheduling IBs!
[  303.722846] [drm] Skip scheduling IBs!
[  303.722854] [drm] Skip scheduling IBs!
[  303.722863] [drm] Skip scheduling IBs!
[  303.722871] [drm] Skip scheduling IBs!

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #1.2: Type: text/html, Size: 9817 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

             reply	other threads:[~2019-02-20 17:31 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-20 17:30 bugzilla-daemon [this message]
2019-02-20 19:27 ` [Bug 109692] deadlock occurs during GPU reset bugzilla-daemon
2019-02-20 19:41 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:01 ` bugzilla-daemon
2019-02-20 21:02 ` bugzilla-daemon
2019-02-20 21:06 ` bugzilla-daemon
2019-02-21 11:46 ` bugzilla-daemon
2019-02-21 14:23 ` bugzilla-daemon
2019-02-21 17:23 ` bugzilla-daemon
2019-02-26 10:42 ` bugzilla-daemon
2019-02-26 10:43 ` bugzilla-daemon
2019-02-26 15:17 ` bugzilla-daemon
2019-02-26 15:20 ` bugzilla-daemon
2019-02-27  4:24 ` bugzilla-daemon
2019-02-27  4:25 ` bugzilla-daemon
2019-02-27  9:23 ` bugzilla-daemon
2019-02-28 16:43 ` bugzilla-daemon
2019-02-28 16:45 ` bugzilla-daemon
2019-04-05 20:41 ` bugzilla-daemon
2019-04-05 20:51 ` bugzilla-daemon
2019-04-06 14:44 ` bugzilla-daemon
2019-04-06 14:48 ` bugzilla-daemon
2019-04-08 14:46 ` bugzilla-daemon
2019-04-08 14:48 ` bugzilla-daemon
2019-04-08 21:28 ` bugzilla-daemon
2019-04-09  1:51 ` bugzilla-daemon
2019-04-09  3:15 ` bugzilla-daemon
2019-04-09  3:16 ` bugzilla-daemon
2019-04-09 16:34 ` bugzilla-daemon
2019-04-10  3:41 ` bugzilla-daemon
2019-04-10 14:25 ` bugzilla-daemon
2019-04-10 19:01 ` bugzilla-daemon
2019-04-10 20:11 ` bugzilla-daemon
2019-04-10 20:14 ` bugzilla-daemon
2019-04-11  3:53 ` bugzilla-daemon
2019-04-11  4:03 ` bugzilla-daemon
2019-04-11 14:25 ` bugzilla-daemon
2019-04-11 14:41 ` bugzilla-daemon
2019-04-11 14:52 ` bugzilla-daemon
2019-04-11 16:32 ` bugzilla-daemon
2019-04-11 16:42 ` bugzilla-daemon
2019-05-12 20:58 ` bugzilla-daemon
2019-05-12 21:00 ` bugzilla-daemon
2019-11-19  9:14 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-109692-502@http.bugs.freedesktop.org/ \
    --to=bugzilla-daemon@freedesktop.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.