All of lore.kernel.org
 help / color / mirror / Atom feed
From: bugzilla-daemon@kernel.org
To: dri-devel@lists.freedesktop.org
Subject: [Bug 216200] New: AMDGPU hung after enabling HIP for gpu acceleration in Blender Cycles 3.2
Date: Sun, 03 Jul 2022 22:50:07 +0000	[thread overview]
Message-ID: <bug-216200-2300@https.bugzilla.kernel.org/> (raw)

https://bugzilla.kernel.org/show_bug.cgi?id=216200

            Bug ID: 216200
           Summary: AMDGPU hung after enabling HIP for gpu acceleration in
                    Blender Cycles 3.2
           Product: Drivers
           Version: 2.5
    Kernel Version: 5.18.9
          Hardware: AMD
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: normal
          Priority: P1
         Component: Video(DRI - non Intel)
          Assignee: drivers_video-dri@kernel-bugs.osdl.org
          Reporter: toadron@yandex.ru
        Regression: No

Created attachment 301326
  --> https://bugzilla.kernel.org/attachment.cgi?id=301326&action=edit
Full journal from the moment the system was launched

Description:

HIP for gpu acceleration in Blender render cycles 3.2 causes the screen to
freeze.

Video showing the problem on Youtube video hosting:
https://www.youtube.com/watch?v=tZzTuvRn3cw

Hardware:

CPU: AMD Ryzen™ 5 3600
MOTHERBOARD: MSI X470 GAMING PLUS MAX
GPU: SAPPHIRE Radeon RX 6600 8192Mb PULSE (11310-01-20G)

Software version:

Arch Linux x86-64
linux 5.18.9.arch1-1
xf86-video-amdgpu 22.0.0-1
mesa 22.1.3-1
rocm-llvm 5.2.0-1
hip-runtime-amd 5.2.0-3
blender 3.2.0-4

Partial log with the problem (see attachment for full log):

Jul 04 01:01:55 sanka kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
Jul 04 01:01:55 sanka kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring
gfx_0.0.0 timeout, signaled seq=6213, emitted seq=6215
Jul 04 01:01:55 sanka kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process blender pid 2776 thread blender:cs0 pid 2798
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Jul 04 01:01:55 sanka kernel: amdgpu: Failed to suspend process 0x800c
Jul 04 01:01:55 sanka /usr/lib/gdm-x-session[1604]: [2022-07-04 01:01:55.072]
[1649] (device_info_linux.cc:45): NumberOfDevices
Jul 04 01:01:55 sanka /usr/lib/gdm-x-session[1604]: [2022-07-04 01:01:55.189]
[1649] (device_info_linux.cc:45): NumberOfDevices
Jul 04 01:01:55 sanka /usr/lib/gdm-x-session[1604]: [2022-07-04 01:01:55.189]
[1649] (device_info_linux.cc:78): GetDeviceName
Jul 04 01:01:55 sanka kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]]
*ERROR* Waiting for fences timed out!
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jul 04 01:01:55 sanka kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ
disable failed
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: [drm:amdgpu_ring_test_helper
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jul 04 01:01:55 sanka kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ
disable failed
Jul 04 01:01:55 sanka kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed
to halt cp gfx
Jul 04 01:01:55 sanka kernel: [drm] free PSP TMR buffer
Jul 04 01:01:55 sanka kernel: CPU: 5 PID: 158 Comm: kworker/u64:7 Tainted: G   
       OE     5.18.9-arch1-1 #1 137f0035b2ece06cb65382579db27e9de66af504
Jul 04 01:01:55 sanka kernel: Hardware name: Micro-Star International Co., Ltd.
MS-7B79/X470 GAMING PLUS MAX (MS-7B79), BIOS H.F1 05/24/2022
Jul 04 01:01:55 sanka kernel: Workqueue: amdgpu-reset-dev
drm_sched_job_timedout [gpu_sched]
Jul 04 01:01:55 sanka kernel: Call Trace:
Jul 04 01:01:55 sanka kernel:  <TASK>
Jul 04 01:01:55 sanka kernel:  dump_stack_lvl+0x48/0x5d
Jul 04 01:01:55 sanka kernel:  amdgpu_do_asic_reset+0x2a/0x470 [amdgpu
c3399060640045ce33894f35f697ceceab8d3be0]
Jul 04 01:01:55 sanka kernel:  amdgpu_device_gpu_recover_imp.cold+0x537/0x8cc
[amdgpu c3399060640045ce33894f35f697ceceab8d3be0]
Jul 04 01:01:55 sanka kernel:  amdgpu_job_timedout+0x18c/0x1c0 [amdgpu
c3399060640045ce33894f35f697ceceab8d3be0]
Jul 04 01:01:55 sanka kernel:  drm_sched_job_timedout+0x76/0x100 [gpu_sched
b54a976254cd79f6332eedc913d0037b3c33b883]
Jul 04 01:01:55 sanka kernel:  process_one_work+0x1c7/0x380
Jul 04 01:01:55 sanka kernel:  worker_thread+0x51/0x380
Jul 04 01:01:55 sanka kernel:  ? rescuer_thread+0x3a0/0x3a0
Jul 04 01:01:55 sanka kernel:  kthread+0xde/0x110
Jul 04 01:01:55 sanka kernel:  ? kthread_complete_and_exit+0x20/0x20
Jul 04 01:01:55 sanka kernel:  ret_from_fork+0x22/0x30
Jul 04 01:01:55 sanka kernel:  </TASK>
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: amdgpu: MODE1 reset
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: amdgpu: GPU mode1 reset
Jul 04 01:01:55 sanka kernel: amdgpu 0000:29:00.0: amdgpu: GPU smu mode1 reset
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset succeeded,
trying to resume
Jul 04 01:01:56 sanka kernel: [drm] PCIE GART of 512M enabled (table at
0x0000008000300000).
Jul 04 01:01:56 sanka kernel: [drm] VRAM is lost due to GPU reset!
Jul 04 01:01:56 sanka kernel: [drm] PSP is resuming...
Jul 04 01:01:56 sanka kernel: [drm] reserve 0xa00000 from 0x81fe000000 for PSP
TMR
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta
ucode is not available
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: SECUREDISPLAY:
securedisplay ta ucode is not available
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: SMU is resuming...
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: smu driver if
version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0,
version = 0x003b2900 (59.41.0)
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: SMU driver if
version not matched
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: SMU is resumed
successfully!
Jul 04 01:01:56 sanka kernel: [drm] DMUB hardware initialized:
version=0x0202000F
Jul 04 01:01:56 sanka kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jul 04 01:01:56 sanka kernel: [drm] VCN decode and encode initialized
successfully(under DPG Mode).
Jul 04 01:01:56 sanka kernel: [drm] JPEG decode initialized successfully.
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring gfx_0.0.0 uses
VM inv eng 0 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.0 uses
VM inv eng 1 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.0 uses
VM inv eng 4 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.0 uses
VM inv eng 5 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.0 uses
VM inv eng 6 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.0.1 uses
VM inv eng 7 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.1.1 uses
VM inv eng 8 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.2.1 uses
VM inv eng 9 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring comp_1.3.1 uses
VM inv eng 10 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring kiq_2.1.0 uses
VM inv eng 11 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring sdma0 uses VM
inv eng 12 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring sdma1 uses VM
inv eng 13 on hub 0
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring vcn_dec_0 uses
VM inv eng 0 on hub 1
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring vcn_enc_0.0
uses VM inv eng 1 on hub 1
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring vcn_enc_0.1
uses VM inv eng 4 on hub 1
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: ring jpeg_dec uses
VM inv eng 5 on hub 1
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: recover vram bo from
shadow start
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: recover vram bo from
shadow done
Jul 04 01:01:56 sanka kernel: [drm] Skip scheduling IBs!
Jul 04 01:01:56 sanka kernel: [drm] Skip scheduling IBs!
Jul 04 01:01:56 sanka kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset(2)
succeeded!
Jul 04 01:01:56 sanka kernel: [drm] Skip scheduling IBs!
Jul 04 01:01:56 sanka kernel: [drm] Skip scheduling IBs!
Jul 04 01:01:56 sanka kernel: [drm] Skip scheduling IBs!
...I skip repeated lines...
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[2747]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka /usr/lib/gdm-x-session[1000]: amdgpu: The CS has been
cancelled because the context is lost.
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!
Jul 04 01:01:56 sanka kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to
initialize parser -125!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

             reply	other threads:[~2022-07-04 16:19 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-03 22:50 bugzilla-daemon [this message]
2022-07-04  5:04 ` [Bug 216200] AMDGPU hung after enabling HIP for gpu acceleration in Blender Cycles 3.2 bugzilla-daemon
2022-07-04 22:59 ` bugzilla-daemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-216200-2300@https.bugzilla.kernel.org/ \
    --to=bugzilla-daemon@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.