From mboxrd@z Thu Jan 1 00:00:00 1970
From: bugzilla-daemon@freedesktop.org
Subject: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor
moves sometimes but does nothing. Keyboard stops working.
Date: Tue, 05 Feb 2019 16:28:03 +0000
Message-ID:
References:
Mime-Version: 1.0
Content-Type: multipart/mixed; boundary="===============1678610462=="
Return-path:
Received: from culpepper.freedesktop.org (culpepper.freedesktop.org
[131.252.210.165])
by gabe.freedesktop.org (Postfix) with ESMTP id A12A76E73B
for ; Tue, 5 Feb 2019 16:28:04 +0000 (UTC)
In-Reply-To:
List-Unsubscribe: ,
List-Archive:
List-Post:
List-Help:
List-Subscribe: ,
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel"
To: dri-devel@lists.freedesktop.org
List-Id: dri-devel@lists.freedesktop.org
--===============1678610462==
Content-Type: multipart/alternative; boundary="15493840846.cd3F71aF9.2574"
Content-Transfer-Encoding: 7bit
--15493840846.cd3F71aF9.2574
Date: Tue, 5 Feb 2019 16:28:04 +0000
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
https://bugs.freedesktop.org/show_bug.cgi?id=3D105733
--- Comment #71 from Garry Hurley Jr ---
What I want to know is what is calling your machine =E2=80=98localhorst=E2=
=80=99?=20
Sent from my iPhone
> On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@freedesktop.org wrote:
>=20
> Comment # 47 on bug 105733 from Allan
> I have really bad news.
>=20
> I'm delaying a lot to answer because I literally sent for warranty or rep=
laced
> ALL of my components in the PC.
>=20
> The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself ba=
tched
> 35.
>=20
> But OK, let's talk about the amdgpu :
>=20
> (In reply to Andrey Grodzovsky from comment #25)
> > (In reply to Allan from comment #12)
> > Can you build latest kernel (4.18) and grab again latest firmware and t=
ry
> > again ?
> > Links to kernel and firmware:
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware=
.git/=20
>=20
> For reasons already explained here I couldn't either compile or test it b=
efore,
> so please don't be mad with me :
> - Sold my old PC.
> - My notebook was completely filled with files.
> - Components on warranty. Testing everything else.
>=20
> So I managed to borrow a PC to test the video cards. I have tested only t=
he
> nvidia one to prove for AMD that the GPU is working and the pci-controlle=
r (a
> guess of mine) of the CPU/chipset that is broken. Going to test the RX480=
on
> this PC as soon as possible. My warranties are expiring and I had to enum=
erate
> priorities.
>=20
> I already said it here but, with the 1800X I couldn't even clone the git
> repository (the checksum always fails, tried many times).
>=20
> Then I managed to free some space on my notebook and started to build
> yesterday.
> - Included amd-ucode firmware.
> - Included polaris10 firmware (for RX480).
> - Made some optimizations for ryzen as descbribed on the gentoo's dedicat=
ed
> page.
>=20
> Compiled, version 4.20-rc1 as present in the branch. No errors reported.
>=20
> There are 2 main applications that are easier to test right now to find t=
he
> problems :
> - Metro 2033 Redux through steam.
> - Left for Dead 2 through steam.
>=20
> Started Metro 2033, worked for some minutes with no issue, but it was for=
some
> reason without any sound. Closed. Turned off the HDMI audio on pavucontro=
l to
> use only the default output. Restarted steam.
>=20
> Started Left for Dead 2 this time. Was able to change graphics settings t=
o max
> without AA and vsync. Played for 15 seconds and got a screen freeze. Wait=
ed for
> a script to record properly the logs and temps. Hard rebooted. This time =
even
> my BIOS/EFI screen had a green background, but still operational. Everyth=
ing
> was green except the text. Rebooted again, got back to normal colors.
>=20
> And here are the logs :
>=20
> kern.log about Firefox usage :
> > Nov 14 05:26:50 desk kernel: [ 324.714998] Chrome_~dThread[1788]: segf=
ault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.so[7fbf=
ee5cf000+3a2c000]
>=20
> It points that the CPU stills with either a problematic microcode or is
> defective.
>=20
> dmesg about amdgpu screen freeze :
> > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x0000080c =
for process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> > [ 3323.920799] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR=
0x00000000
> > [ 3323.920801] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_STAT=
US 0x0200800C
> > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid 32774=
) at page 0, read from 'TC0' (0x54433000) (8)
> > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx time=
out, signaled seq=3D274140, emitted seq=3D274142
> > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin!
> > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:c=
rtc-0] hw_done or flip_done timed out
> > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 =
seconds.
> > [ 3504.834103] Not tainted 4.20.0-rc1-amd #2
> > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disab=
les this message.
> > [ 3504.834107] kworker/u32:2 D 0 3872 2 0x80000000
> > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper]
> > [ 3504.834126] Call Trace:
> > [ 3504.834133] ? __schedule+0x2a0/0x880
> > [ 3504.834136] schedule+0x28/0x80
> > [ 3504.834139] schedule_timeout+0x25d/0x380
> > [ 3504.834217] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgp=
u]
> > [ 3504.834292] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0=
[amdgpu]
> > [ 3504.834297] dma_fence_default_wait+0x23b/0x2a0
> > [ 3504.834301] ? dma_fence_release+0x90/0x90
> > [ 3504.834304] dma_fence_wait_timeout+0xdd/0x100
> > [ 3504.834308] reservation_object_wait_timeout_rcu+0x161/0x270
> > [ 3504.834387] amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> > [ 3504.834468] amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> > [ 3504.834472] ? __switch_to_asm+0x40/0x70
> > [ 3504.834475] ? wait_for_completion_timeout+0x3b/0x1a0
> > [ 3504.834477] ? __switch_to_asm+0x34/0x70
> > [ 3504.834480] ? __switch_to_asm+0x40/0x70
> > [ 3504.834483] ? __switch_to+0x1ba/0x450
> > [ 3504.834492] commit_tail+0x3d/0x70 [drm_kms_helper]
> > [ 3504.834497] process_one_work+0x1aa/0x3a0
> > [ 3504.834500] worker_thread+0x30/0x3a0
> > [ 3504.834503] ? drain_workqueue+0x130/0x130
> > [ 3504.834506] kthread+0x11d/0x140
> > [ 3504.834509] ? kthread_park+0x80/0x80
> > [ 3504.834512] ret_from_fork+0x22/0x40
> > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:501 kthr=
ead_park+0x6c/0x80
> > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_ascii n=
ls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec=
_hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwdep chas=
h gpu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit sp510=
0_tco soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof serio_ra=
w pcspkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng_core =
acpi_cpufreq button parport_pc ppdev lp parport efivarfs ip_tables x_tables=
autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_co=
mpress xxhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg dm_cr=
ypt dm_mod sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pclmul c=
rc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pci aes_=
x86_64 libahci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i2c_pii=
x4 libphy usbcore scsi_mod thermal wmi gpio_amdpt gpio_generic
> > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not tainted 4.2=
0.0-rc1-amd #2
> > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS 5.13 =
08/07/2018
> > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80
> > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8 8b c3=
00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff eb f0=
<0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 0f 1f
> > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202
> > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: dead000=
000000200
> > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: ffff9dc=
d8a9049c0
> > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0000000=
000000000
> > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: ffff9dc=
d92ba27a0
> > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: dead000=
000000100
> > [ 3516.645347] FS: 00007efea483e700(0000) GS:ffff9dcd96d80000(0000) kn=
lGS:0000000000000000
> > [ 3516.645349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 0000000=
0003406e0
> > [ 3516.645352] Call Trace:
> > [ 3516.645362] drm_sched_entity_fini+0x37/0x190 [gpu_sched]
> > [ 3516.645423] amdgpu_vm_fini+0xad/0x530 [amdgpu]
> > [ 3516.645429] ? idr_destroy+0x78/0xc0
> > [ 3516.645481] amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu]
> > [ 3516.645496] drm_file_free.part.5+0x21f/0x300 [drm]
> > [ 3516.645510] drm_release+0xaa/0x120 [drm]
> > [ 3516.645514] __fput+0xac/0x1e0
> > [ 3516.645518] task_work_run+0x8f/0xb0
> > [ 3516.645522] do_exit+0x2e6/0xb30
> > [ 3516.645525] do_group_exit+0x3a/0xb0
> > [ 3516.645528] get_signal+0x27a/0x5f0
> > [ 3516.645532] do_signal+0x30/0x6d0
> > [ 3516.645537] exit_to_usermode_loop+0x89/0xf0
> > [ 3516.645540] do_syscall_64+0xda/0xe0
> > [ 3516.645544] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a
> > [ 3516.645553] Code: Bad RIP value.
> > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_RAX: 00=
000000000000ca
> > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 00007ef=
eb6b9d19a
> > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ef=
ea483d980
> > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff=
e661d7080
> > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0000000=
000000000
> > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 00007ef=
ea483d930
> > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]---
>=20
> It is important to note that the most common code that appears while using
> generic kernels is 147 despite of 146 that is being shown here.
>=20
> Xorg.0.log reports nothing.
>=20
> I said that these were bad news because seems to me that both CPU and amd=
gpu
> driver are defective.
>=20
> I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk and
> sclk) all the time while with this new kernel the GPU tries to scale the
> performance.
>=20
> Also, it is important to note that the nvidia GTX 1070 throws a lot of xid
> error codes ( see
> https://devtalk.nvidia.com/default/topic/1043483/linux/xid-errors-on-gtx-=
1070-linux/post/5293440
> ). And this is why I'm thinking that the 1800X has a defective pci-contro=
ller.
> And it is also the second part of the "really bad news". Maybe it is happ=
ening
> mostly with ryzen processors? I'll test the RX480 with the other computer=
ASAP,
> need to send informations about the CPU for AMD to proceed with the warra=
nty
> process.
>=20
> The GTX 1070 works without a single problem outside of this PC. The other=
cards
> that I had tested before follows the same pattern ( 2 RX480, 1 RX 580, 1 =
GTX
> 970, 1 GTX 1070).
>=20
> Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the ca=
rds
> don't have any problem I'm selling the cards and soon I'll have only one =
or
> none. The seller told me off because of requesting warranty for the RX 48=
0 when
> I thought it was defective, he sent me another different and the one that=
I
> sent was working without any issues according to him.
>=20
> I'm already in a new stage of re-sending the CPU for AMD, and praying to =
solve
> my endless torment. I think that they'll have to refund me (and then I'll=
have
> a loss with the motherboard).
>=20
> Please tell me any other step that you may want to be done.
>=20
> I can also provide a full description of the kernel compilation (paramete=
rs)
> and even provide a link to the generated .deb packages.
> You are receiving this mail because:
> You are the assignee for the bug.
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
--=20
You are receiving this mail because:
You are the assignee for the bug.=
--15493840846.cd3F71aF9.2574
Date: Tue, 5 Feb 2019 16:28:04 +0000
MIME-Version: 1.0
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://bugs.freedesktop.org/
Auto-Submitted: auto-generated
Comme=
nt # 71
on bug 10573=
3
from Garry Hurley Jr
What I want to know is what is calling your machine =E2=80=98l=
ocalhorst=E2=80=99?=20
Sent from my iPhone
> On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@freedesktop.org=
wrote:
>=20
> Comment # 47 on bug 105733 from Allan
> I have really bad news.
>=20
> I'm delaying a lot to answer because I literally sent for warranty or =
replaced
> ALL of my components in the PC.
>=20
> The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself=
batched
> 35.
>=20
> But OK, let's talk about the amdgpu :
>=20
> (In reply to Andrey Grodzovsky from comment #25)
> > (In reply to Allan from =
comment #12)
> > Can you build latest kernel (4.18) and grab again latest firmware=
and try
> > again ?
> > Links to kernel and firmware:
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-s=
taging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/fir=
mware/linux-firmware.git/=20
>=20
> For reasons already explained here I couldn't either compile or test i=
t before,
> so please don't be mad with me :
> - Sold my old PC.
> - My notebook was completely filled with files.
> - Components on warranty. Testing everything else.
>=20
> So I managed to borrow a PC to test the video cards. I have tested onl=
y the
> nvidia one to prove for AMD that the GPU is working and the pci-contro=
ller (a
> guess of mine) of the CPU/chipset that is broken. Going to test the RX=
480 on
> this PC as soon as possible. My warranties are expiring and I had to e=
numerate
> priorities.
>=20
> I already said it here but, with the 1800X I couldn't even clone the g=
it
> repository (the checksum always fails, tried many times).
>=20
> Then I managed to free some space on my notebook and started to build
> yesterday.
> - Included amd-ucode firmware.
> - Included polaris10 firmware (for RX480).
> - Made some optimizations for ryzen as descbribed on the gentoo's dedi=
cated
> page.
>=20
> Compiled, version 4.20-rc1 as present in the branch. No errors reporte=
d.
>=20
> There are 2 main applications that are easier to test right now to fin=
d the
> problems :
> - Metro 2033 Redux through steam.
> - Left for Dead 2 through steam.
>=20
> Started Metro 2033, worked for some minutes with no issue, but it was =
for some
> reason without any sound. Closed. Turned off the HDMI audio on pavucon=
trol to
> use only the default output. Restarted steam.
>=20
> Started Left for Dead 2 this time. Was able to change graphics setting=
s to max
> without AA and vsync. Played for 15 seconds and got a screen freeze. W=
aited for
> a script to record properly the logs and temps. Hard rebooted. This ti=
me even
> my BIOS/EFI screen had a green background, but still operational. Ever=
ything
> was green except the text. Rebooted again, got back to normal colors.
>=20
> And here are the logs :
>=20
> kern.log about Firefox usage :
> > Nov 14 05:26:50 desk kernel: [ 324.714998] Chrome_~dThread[1788]=
: segfault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.s=
o[7fbfee5cf000+3a2c000]
>=20
> It points that the CPU stills with either a problematic microcode or is
> defective.
>=20
> dmesg about amdgpu screen freeze :
> > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x000=
0080c for process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> > [ 3323.920799] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAUL=
T_ADDR 0x00000000
> > [ 3323.920801] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAUL=
T_STATUS 0x0200800C
> > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid=
32774) at page 0, read from 'TC0' (0x54433000) (8)
> > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gf=
x timeout, signaled seq=3D274140, emitted seq=3D274142
> > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin!
> > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRT=
C:46:crtc-0] hw_done or flip_done timed out
> > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more tha=
n 120 seconds.
> > [ 3504.834103] Not tainted 4.20.0-rc1-amd #2
> > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeo=
ut_secs" disables this message.
> > [ 3504.834107] kworker/u32:2 D 0 3872 2 0x80000000
> > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_hel=
per]
> > [ 3504.834126] Call Trace:
> > [ 3504.834133] ? __schedule+0x2a0/0x880
> > [ 3504.834136] schedule+0x28/0x80
> > [ 3504.834139] schedule_timeout+0x25d/0x380
> > [ 3504.834217] ? dce110_timing_generator_get_position+0x5b/0x70 =
[amdgpu]
> > [ 3504.834292] ? dce110_timing_generator_get_crtc_scanoutpos+0x7=
0/0xb0 [amdgpu]
> > [ 3504.834297] dma_fence_default_wait+0x23b/0x2a0
> > [ 3504.834301] ? dma_fence_release+0x90/0x90
> > [ 3504.834304] dma_fence_wait_timeout+0xdd/0x100
> > [ 3504.834308] reservation_object_wait_timeout_rcu+0x161/0x270
> > [ 3504.834387] amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> > [ 3504.834468] amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> > [ 3504.834472] ? __switch_to_asm+0x40/0x70
> > [ 3504.834475] ? wait_for_completion_timeout+0x3b/0x1a0
> > [ 3504.834477] ? __switch_to_asm+0x34/0x70
> > [ 3504.834480] ? __switch_to_asm+0x40/0x70
> > [ 3504.834483] ? __switch_to+0x1ba/0x450
> > [ 3504.834492] commit_tail+0x3d/0x70 [drm_kms_helper]
> > [ 3504.834497] process_one_work+0x1aa/0x3a0
> > [ 3504.834500] worker_thread+0x30/0x3a0
> > [ 3504.834503] ? drain_workqueue+0x130/0x130
> > [ 3504.834506] kthread+0x11d/0x140
> > [ 3504.834509] ? kthread_park+0x80/0x80
> > [ 3504.834512] ret_from_fork+0x22/0x40
> > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:50=
1 kthread_park+0x6c/0x80
> > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_a=
scii nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda=
_codec_hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwde=
p chash gpu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit=
sp5100_tco soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof se=
rio_raw pcspkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng=
_core acpi_cpufreq button parport_pc ppdev lp parport efivarfs ip_tables x_=
tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress z=
std_compress xxhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg=
dm_crypt dm_mod sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pc=
lmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pc=
i aes_x86_64 libahci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i=
2c_piix4 libphy usbcore scsi_mod thermal wmi gpio_amdpt gpio_generic
> > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not taint=
ed 4.20.0-rc1-amd #2
> > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS=
5.13 08/07/2018
> > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80
> > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8=
8b c3 00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff=
eb f0 <0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 =
0f 1f
> > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202
> > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: d=
ead000000000200
> > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: f=
fff9dcd8a9049c0
> > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0=
000000000000000
> > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: f=
fff9dcd92ba27a0
> > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: d=
ead000000000100
> > [ 3516.645347] FS: 00007efea483e700(0000) GS:ffff9dcd96d80000(00=
00) knlGS:0000000000000000
> > [ 3516.645349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 0=
0000000003406e0
> > [ 3516.645352] Call Trace:
> > [ 3516.645362] drm_sched_entity_fini+0x37/0x190 [gpu_sched]
> > [ 3516.645423] amdgpu_vm_fini+0xad/0x530 [amdgpu]
> > [ 3516.645429] ? idr_destroy+0x78/0xc0
> > [ 3516.645481] amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu]
> > [ 3516.645496] drm_file_free.part.5+0x21f/0x300 [drm]
> > [ 3516.645510] drm_release+0xaa/0x120 [drm]
> > [ 3516.645514] __fput+0xac/0x1e0
> > [ 3516.645518] task_work_run+0x8f/0xb0
> > [ 3516.645522] do_exit+0x2e6/0xb30
> > [ 3516.645525] do_group_exit+0x3a/0xb0
> > [ 3516.645528] get_signal+0x27a/0x5f0
> > [ 3516.645532] do_signal+0x30/0x6d0
> > [ 3516.645537] exit_to_usermode_loop+0x89/0xf0
> > [ 3516.645540] do_syscall_64+0xda/0xe0
> > [ 3516.645544] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a
> > [ 3516.645553] Code: Bad RIP value.
> > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_R=
AX: 00000000000000ca
> > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 0=
0007efeb6b9d19a
> > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0=
0007efea483d980
> > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 0=
0007ffe661d7080
> > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0=
000000000000000
> > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 0=
0007efea483d930
> > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]---
>=20
> It is important to note that the most common code that appears while u=
sing
> generic kernels is 147 despite of 146 that is being shown here.
>=20
> Xorg.0.log reports nothing.
>=20
> I said that these were bad news because seems to me that both CPU and =
amdgpu
> driver are defective.
>=20
> I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk=
and
> sclk) all the time while with this new kernel the GPU tries to scale t=
he
> performance.
>=20
> Also, it is important to note that the nvidia GTX 1070 throws a lot of=
xid
> error codes ( see
> https://devtalk.nvidia.com/default/t=
opic/1043483/linux/xid-errors-on-gtx-1070-linux/post/5293440
> ). And this is why I'm thinking that the 1800X has a defective pci-con=
troller.
> And it is also the second part of the "really bad news". May=
be it is happening
> mostly with ryzen processors? I'll test the RX480 with the other compu=
ter ASAP,
> need to send informations about the CPU for AMD to proceed with the wa=
rranty
> process.
>=20
> The GTX 1070 works without a single problem outside of this PC. The ot=
her cards
> that I had tested before follows the same pattern ( 2 RX480, 1 RX 580,=
1 GTX
> 970, 1 GTX 1070).
>=20
> Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the=
cards
> don't have any problem I'm selling the cards and soon I'll have only o=
ne or
> none. The seller told me off because of requesting warranty for the RX=
480 when
> I thought it was defective, he sent me another different and the one t=
hat I
> sent was working without any issues according to him.
>=20
> I'm already in a new stage of re-sending the CPU for AMD, and praying =
to solve
> my endless torment. I think that they'll have to refund me (and then I=
'll have
> a loss with the motherboard).
>=20
> Please tell me any other step that you may want to be done.
>=20
> I can also provide a full description of the kernel compilation (param=
eters)
> and even provide a link to the generated .deb packages.
> You are receiving this mail because:
> You are the assignee for the bug.
> _______________________________________________
> dri-devel mailing list
> dri-devel@l=
ists.freedesktop.org
> h=
ttps://lists.freedesktop.org/mailman/listinfo/dri-devel
You are receiving this mail because:
- You are the assignee for the bug.
=
--15493840846.cd3F71aF9.2574--
--===============1678610462==
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64
Content-Disposition: inline
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs
IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz
dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg==
--===============1678610462==--