From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 109403] amdgpu randomly hangs while streaming or when CPU is busy on X399 with TR 1950X Date: Mon, 21 Jan 2019 10:36:52 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0620610802==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id 034516E7CA for ; Mon, 21 Jan 2019 10:36:54 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0620610802== Content-Type: multipart/alternative; boundary="15480670130.C6eB3e3.5595" Content-Transfer-Encoding: 7bit --15480670130.C6eB3e3.5595 Date: Mon, 21 Jan 2019 10:36:53 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D109403 Bug ID: 109403 Summary: amdgpu randomly hangs while streaming or when CPU is busy on X399 with TR 1950X Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: 1@provod.gl I've been experiencing random GPU hangs since I upgraded to Threadripper ab= out a year ago. Specs: - Motherboard: ASUS Prime X399-A, all bios versions from stock until current 0808 - CPU: Threadripper 1950X, 32 threads - GPU: MSI Radeon RX Vega 64 Air Boost 8G OC (was also happening on ASUS R9 Fury X on the same machine; this GPU was generally stable on previous box) - Displays: - 2x DELL U2412M 1920x1200x60 (DP) - 1x ASUS MG279Q 2560x1440x144 (DP) - Kernel versions: 4.20, 5.0-rc2 (has been happening since from at least 4.= 14; earlier versions weren't tried). - linux-firmware: 20181218 - Mesa: 18.3.1 - X: 1.20.3 - libdrm: 2.4.96 - Possibly relevant kernel options: amd_iommu=3Don vfio-pci.ids=3D10de:1005,10de:0e1a,1912:0014,1106:3483 iommu=3Dpt vfio-pci.disable_vga=3D1 hpet=3Ddisable nohpet amdgpu.ppfeaturemask=3D0xfff= d7fff amdgpu.gpu_recovery=3D1 pcie_aspm=3Doff The problem manifests itself usually like this: 1. Screen suddenly freezes (sometimes it is possible to move mouse cursor f= or a few seconds, but it will freeze eventually too) 2. GPU fan speeds up and remain high 3. Every process that talks to GPU freezes and becomes impossible to kill. 4. Can SSH into the machine and everything else besides the GPU works ok. 5. dmesg contains a message like this: [Jan21 00:03] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, signaled seq=3D17188686, emitted seq=3D17188689 [ +0.000032] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process X pid 9315 thread X:cs0 pid 9335 or with a bit more stuff happening before: [Jan18 19:43] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000003] amdgpu 0000:44:00.0: in page starting at address 0x0000800010607000 from 27 [ +0.000002] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0060153D [ +0.000005] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000002] amdgpu 0000:44:00.0: in page starting at address 0x0000800010609000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010607000 from 27 [ +0.000002] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010609000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000002] amdgpu 0000:44:00.0: in page starting at address 0x0000800010607000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010609000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010607000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000002] amdgpu 0000:44:00.0: in page starting at address 0x0000800010609000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010607000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault (src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225 thread superposit:cs0 pid 11308) [ +0.000001] amdgpu 0000:44:00.0: in page starting at address 0x0000800010609000 from 27 [ +0.000001] amdgpu 0000:44:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [Jan18 19:44] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri= ng gfx timeout, signaled seq=3D40554, emitted seq=3D40556 [ +0.000047] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process superposition pid 11225 thread superposit:cs0 = pid 11308 6. amdgpu reports near 100% cpu usage and high power draw, even it was completely idle before the freeze. If I enable amdgpu.gpu_recovery, then it tries to reset the GPU but fails m= ost of the time: [ +0.000005] amdgpu 0000:44:00.0: GPU reset begin! [ +10.230091] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:51:crtc-2] hw_done or flip_done timed out (there are no further logs) (I've seen it succesfully reset the GPU only *once*, and that obvio= usly required X restart) These freezes happen pretty much randomly: - Sometimes the GPU remains stable for weeks - It will generally remain stable while just playing games or running benchmarks like Unigine Superposition for many hours - There have been a couple of freezes when just watching youtube using fire= fox and not doing anything else - It will sometimes freeze with GPU being completely idle (but outputs on), while CPU is at 100% - It will sometimes freeze when opening shadertoy shaders. Not specific one= s, just randomly. - It will likely freeze within 1-2 hours of streaming using OBS: - XSHM is used to grab 2560x1440 screen at 60fps - image downscaled to 1080p60 using whatever OBS does - a bunch of minor stuff added to the frame - software encoding using x264 medium preset resulting in 10-30% CPU load - It can freeze both when doing live shader programming (and GPU is= at 100% with heavy pathtracing compute), and when just editing text in vim. - It is still pretty random: sometimes it remains stable for a week= of 2-4 hours of almost everyday streaming, but on some days it will freeze 2-3 times within one evening. This would suggest a hardware issue, but strangely enough I have never experienced this problem on Windows using the same PC. This also prevents me from RMA because there's no plausible way reproduce the issue. Other hardware is stable: - CPU being 100% busy compiling some huge C++ codebases for hours remains stable - many-hours memtest doesn't show any errors - there's also an NVidia GPU installed in this machine that is being passed through to Windows running under qemu. This GPU is also stable under any lo= ad. - although it was throwing PCI AER errors into dmesg (without any o= ther symptoms). This is believed to be benign X399 issue, and is suppressed using pcie_aspm=3Doff kernel parameter - Loading the entire system for 100% (simultaneously running GPU benchmarks= on host and vm, and also compiling something on CPU) generally doesn't trigger= the issue. Adding OBS to that likely does. - Three different PSUs were used on this system, no behaviour difference. Other things: - Power management on Linux is significantly different from one on Windows. - on Windows idle means idle: all clocks and voltages are as low as= pp allows, power draw is ~20W - on Linux even idle (nothing is feeding GPU with any work) will ha= ve slck at 3 (1138Mhz 1000mV) and mclk at 3 (max, 945MHz 1100mV), power draw is 40W - I am unable to dump BIOS of this card properly on Linux: - Both /sys/kernel/debug/dri/0/amdgpu_vbios and /sys/class/drm/card0/device/rom are truncated at 60928 - Contents are different from what I could dump on Windows, e.g: @@ -1,6 +1,6 @@ -00000000: 55aa 77e9 eb02 0000 0000 0000 0000 0000=20 U.w............. -00000010: 0000 0000 0000 0000 9c02 0000 0000 4942=20 ..............IB -00000020: 4d9d ac8a 0000 0000 0000 0000 0000 0004=20 M............... +00000000: 55aa 77e9 eb02 0000 00c0 0000 0000 0000=20 U.w............. +00000010: 0000 0000 0044 0000 9c02 0000 0000 4942=20 .....D........IB +00000020: 4d43 ac8a 0000 0000 0000 0000 0000 0004=20 MC.............. 00000030: 2037 3631 3239 3535 3230 0000 0000 0000=20=20 761295520...... 00000040: 0000 0000 0000 0000 7402 0000 0000 0000=20 ........t....... 00000050: 3132 2f31 322f 3137 2030 313a 3237 0000 12/12/17 01:27.. @@ -38,13 +38,13 @@ 00000250: 315f 4d42 415f 4131 5f48 424d 5f38 4742=20 1_MBA_A1_HBM_8GB 00000260: 5f56 3336 3831 305c 636f 6e66 6967 2e68=20 _V36810\config.h 00000270: 0000 0090 2800 0202 4154 4f4d 00c0 eb03=20 ....(...ATOM.... -00000280: 1802 c102 6c01 1e04 0000 0000 6214 8036=20 ....l.......b..6 +00000280: 1802 c102 6c01 1e04 0000 0030 6214 8036=20 ....l......0b..6 - Under/over-volting doesn't work: any however insignificant change to any = of the default voltages result in severe throttling, see https://github.com/RadeonOpenCompute/ROCm/issues/681 Is there anything else I could try? Is there a way to collect more info? Links to (probably, superficially) similar problems: - https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 - https://bugs.freedesktop.org/show_bug.cgi?id=3D105819 - https://bugs.freedesktop.org/show_bug.cgi?id=3D109022 - https://bugs.freedesktop.org/show_bug.cgi?id=3D105251 - https://bugs.freedesktop.org/show_bug.cgi?id=3D108493 - https://github.com/RadeonOpenCompute/ROCm/issues/348 --=20 You are receiving this mail because: You are the assignee for the bug.= --15480670130.C6eB3e3.5595 Date: Mon, 21 Jan 2019 10:36:53 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 109403
Summary amdgpu randomly hangs while streaming or when CPU is busy on = X399 with TR 1950X
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter 1@provod.gl

I've been experiencing random GPU hangs since I upgraded to Th=
readripper about
a year ago.

Specs:
- Motherboard: ASUS Prime X399-A, all bios versions from stock until current
0808
- CPU: Threadripper 1950X, 32 threads
- GPU: MSI Radeon RX Vega 64 Air Boost 8G OC (was also happening on ASUS R9
Fury X on the same machine; this GPU was generally stable on previous box)
- Displays:
   - 2x DELL U2412M 1920x1200x60 (DP)
   - 1x ASUS MG279Q 2560x1440x144 (DP)
- Kernel versions: 4.20, 5.0-rc2 (has been happening since from at least 4.=
14;
earlier versions weren't tried).
- linux-firmware: 20181218
- Mesa: 18.3.1
- X: 1.20.3
- libdrm: 2.4.96
- Possibly relevant kernel options: amd_iommu=3Don
vfio-pci.ids=3D10de:1005,10de:0e1a,1912:0014,1106:3483 iommu=3Dpt
vfio-pci.disable_vga=3D1 hpet=3Ddisable nohpet amdgpu.ppfeaturemask=3D0xfff=
d7fff
amdgpu.gpu_recovery=3D1 pcie_aspm=3Doff

The problem manifests itself usually like this:
1. Screen suddenly freezes (sometimes it is possible to move mouse cursor f=
or a
few seconds, but it will freeze eventually too)
2. GPU fan speeds up and remain high
3. Every process that talks to GPU freezes and becomes impossible to kill.
4. Can SSH into the machine and everything else besides the GPU works ok.
5. dmesg contains a message like this:
                [Jan21 00:03] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, signaled seq=3D17188686, emitted seq=3D17188689
                [  +0.000032] [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process X pid 9315 thread X:cs0 pid 9335
        or with a bit more stuff happening before:
                [Jan18 19:43] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000003] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010607000 from 27
                [  +0.000002] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x0060153D
                [  +0.000005] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000002] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010609000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010607000 from 27
                [  +0.000002] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010609000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000002] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010607000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010609000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010607000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000002] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010609000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010607000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [  +0.000004] amdgpu 0000:44:00.0: [gfxhub] VMC page fault
(src_id:0 ring:158 vmid:6 pasid:32771, for process superposition pid 11225
thread superposit:cs0 pid 11308)
                [  +0.000001] amdgpu 0000:44:00.0:   in page starting at
address 0x0000800010609000 from 27
                [  +0.000001] amdgpu 0000:44:00.0:
VM_L2_PROTECTION_FAULT_STATUS:0x00000000
                [Jan18 19:44] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ri=
ng
gfx timeout, signaled seq=3D40554, emitted seq=3D40556
                [  +0.000047] [drm:amdgpu_job_timedout [amdgpu]] *ERROR*
Process information: process superposition pid 11225 thread superposit:cs0 =
pid
11308
6. amdgpu reports near 100% cpu usage and high power draw, even it was
completely idle before the freeze.

If I enable amdgpu.gpu_recovery, then it tries to reset the GPU but fails m=
ost
of the time:
                [  +0.000005] amdgpu 0000:44:00.0: GPU reset begin!
                [ +10.230091] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR*
[CRTC:51:crtc-2] hw_done or flip_done timed out
                (there are no further logs)
        (I've seen it succesfully reset the GPU only *once*, and that obvio=
usly
required X restart)

These freezes happen pretty much randomly:
- Sometimes the GPU remains stable for weeks
- It will generally remain stable while just playing games or running
benchmarks like Unigine Superposition for many hours
- There have been a couple of freezes when just watching youtube using fire=
fox
and not doing anything else
- It will sometimes freeze with GPU being completely idle (but outputs on),
while CPU is at 100%
- It will sometimes freeze when opening shadertoy shaders. Not specific one=
s,
just randomly.
- It will likely freeze within 1-2 hours of streaming using OBS:
                - XSHM is used to grab 2560x1440 screen at 60fps
                - image downscaled to 1080p60 using whatever OBS does
                - a bunch of minor stuff added to the frame
                - software encoding using x264 medium preset resulting in
10-30% CPU load
        - It can freeze both when doing live shader programming (and GPU is=
 at
100% with heavy pathtracing compute), and when just editing text in vim.
        - It is still pretty random: sometimes it remains stable for a week=
 of
2-4 hours of almost everyday streaming, but on some days it will freeze 2-3
times within one evening.

This would suggest a hardware issue, but strangely enough I have never
experienced this problem on Windows using the same PC. This also prevents me
from RMA because there's no plausible way reproduce the issue.

Other hardware is stable:
- CPU being 100% busy compiling some huge C++ codebases for hours remains
stable
- many-hours memtest doesn't show any errors
- there's also an NVidia GPU installed in this machine that is being passed
through to Windows running under qemu. This GPU is also stable under any lo=
ad.
        - although it was throwing PCI AER errors into dmesg (without any o=
ther
symptoms). This is believed to be benign X399 issue, and is suppressed using
pcie_aspm=3Doff kernel parameter
- Loading the entire system for 100% (simultaneously running GPU benchmarks=
 on
host and vm, and also compiling something on CPU) generally doesn't trigger=
 the
issue. Adding OBS to that likely does.
- Three different PSUs were used on this system, no behaviour difference.

Other things:
- Power management on Linux is significantly different from one on Windows.
        - on Windows idle means idle: all clocks and voltages are as low as=
 pp
allows, power draw is ~20W
        - on Linux even idle (nothing is feeding GPU with any work) will ha=
ve
slck at 3 (1138Mhz 1000mV) and mclk at 3 (max, 945MHz 1100mV), power draw is
40W
- I am unable to dump BIOS of this card properly on Linux:
        - Both /sys/kernel/debug/dri/0/amdgpu_vbios and
/sys/class/drm/card0/device/rom are truncated at 60928
        - Contents are different from what I could dump on Windows, e.g:
                @@ -1,6 +1,6 @@
                -00000000: 55aa 77e9 eb02 0000 0000 0000 0000 0000=20
U.w.............
                -00000010: 0000 0000 0000 0000 9c02 0000 0000 4942=20
..............IB
                -00000020: 4d9d ac8a 0000 0000 0000 0000 0000 0004=20
M...............
                +00000000: 55aa 77e9 eb02 0000 00c0 0000 0000 0000=20
U.w.............
                +00000010: 0000 0000 0044 0000 9c02 0000 0000 4942=20
.....D........IB
                +00000020: 4d43 ac8a 0000 0000 0000 0000 0000 0004=20
MC..............
                 00000030: 2037 3631 3239 3535 3230 0000 0000 0000=20=20
761295520......
                 00000040: 0000 0000 0000 0000 7402 0000 0000 0000=20
........t.......
                 00000050: 3132 2f31 322f 3137 2030 313a 3237 0000  12/12/17
01:27..
                @@ -38,13 +38,13 @@
                 00000250: 315f 4d42 415f 4131 5f48 424d 5f38 4742=20
1_MBA_A1_HBM_8GB
                 00000260: 5f56 3336 3831 305c 636f 6e66 6967 2e68=20
_V36810\config.h
                 00000270: 0000 0090 2800 0202 4154 4f4d 00c0 eb03=20
....(...ATOM....
                -00000280: 1802 c102 6c01 1e04 0000 0000 6214 8036=20
....l.......b..6
                +00000280: 1802 c102 6c01 1e04 0000 0030 6214 8036=20
....l......0b..6
- Under/over-volting doesn't work: any however insignificant change to any =
of
the default voltages result in severe throttling, see
https://gi=
thub.com/RadeonOpenCompute/ROCm/issues/681

Is there anything else I could try?
Is there a way to collect more info?

Links to (probably, superficially) similar problems:
- https://bugs.freedesktop.org/show_bug.=
cgi?id=3D105733
- https://bugs.freedesktop.org/show_bug.=
cgi?id=3D105819
- https://bugs.freedesktop.org/show_bug.=
cgi?id=3D109022
- https://bugs.freedesktop.org/show_bug.=
cgi?id=3D105251
- https://bugs.freedesktop.org/show_bug.=
cgi?id=3D108493
- https://=
github.com/RadeonOpenCompute/ROCm/issues/348


You are receiving this mail because:
  • You are the assignee for the bug.
= --15480670130.C6eB3e3.5595-- --===============0620610802== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0620610802==--