From mboxrd@z Thu Jan 1 00:00:00 1970 From: Garry Hurley Subject: Re: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working. Date: Tue, 5 Feb 2019 11:27:58 -0500 Message-ID: References: Mime-Version: 1.0 (1.0) Content-Type: multipart/mixed; boundary="===============0515352702==" Return-path: Received: from mail-qt1-x843.google.com (mail-qt1-x843.google.com [IPv6:2607:f8b0:4864:20::843]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7BA5F6E732 for ; Tue, 5 Feb 2019 16:28:01 +0000 (UTC) Received: by mail-qt1-x843.google.com with SMTP id r9so4503470qtt.3 for ; Tue, 05 Feb 2019 08:28:01 -0800 (PST) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: bugzilla-daemon@freedesktop.org Cc: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============0515352702== Content-Type: multipart/alternative; boundary=Apple-Mail-18BA1764-0536-4ED8-BC46-7B20ABC85084 Content-Transfer-Encoding: 7bit --Apple-Mail-18BA1764-0536-4ED8-BC46-7B20ABC85084 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable What I want to know is what is calling your machine =E2=80=98localhorst=E2=80= =99?=20 Sent from my iPhone > On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@freedesktop.org wrote: >=20 > Comment # 47 on bug 105733 from Allan > I have really bad news. >=20 > I'm delaying a lot to answer because I literally sent for warranty or repl= aced > ALL of my components in the PC. >=20 > The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself bat= ched > 35. >=20 > But OK, let's talk about the amdgpu : >=20 > (In reply to Andrey Grodzovsky from comment #25) > > (In reply to Allan from comment #12) > > Can you build latest kernel (4.18) and grab again latest firmware and tr= y > > again ? > > Links to kernel and firmware: > > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next > > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.= git/=20 >=20 > For reasons already explained here I couldn't either compile or test it be= fore, > so please don't be mad with me : > - Sold my old PC. > - My notebook was completely filled with files. > - Components on warranty. Testing everything else. >=20 > So I managed to borrow a PC to test the video cards. I have tested only th= e > nvidia one to prove for AMD that the GPU is working and the pci-controller= (a > guess of mine) of the CPU/chipset that is broken. Going to test the RX480 o= n > this PC as soon as possible. My warranties are expiring and I had to enume= rate > priorities. >=20 > I already said it here but, with the 1800X I couldn't even clone the git > repository (the checksum always fails, tried many times). >=20 > Then I managed to free some space on my notebook and started to build > yesterday. > - Included amd-ucode firmware. > - Included polaris10 firmware (for RX480). > - Made some optimizations for ryzen as descbribed on the gentoo's dedicate= d > page. >=20 > Compiled, version 4.20-rc1 as present in the branch. No errors reported. >=20 > There are 2 main applications that are easier to test right now to find th= e > problems : > - Metro 2033 Redux through steam. > - Left for Dead 2 through steam. >=20 > Started Metro 2033, worked for some minutes with no issue, but it was for s= ome > reason without any sound. Closed. Turned off the HDMI audio on pavucontrol= to > use only the default output. Restarted steam. >=20 > Started Left for Dead 2 this time. Was able to change graphics settings to= max > without AA and vsync. Played for 15 seconds and got a screen freeze. Waite= d for > a script to record properly the logs and temps. Hard rebooted. This time e= ven > my BIOS/EFI screen had a green background, but still operational. Everythi= ng > was green except the text. Rebooted again, got back to normal colors. >=20 > And here are the logs : >=20 > kern.log about Firefox usage : > > Nov 14 05:26:50 desk kernel: [ 324.714998] Chrome_~dThread[1788]: segfa= ult at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.so[7fbfee= 5cf000+3a2c000] >=20 > It points that the CPU stills with either a problematic microcode or is > defective. >=20 > dmesg about amdgpu screen freeze : > > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x0000080c f= or process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653 > > [ 3323.920799] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR = 0x00000000 > > [ 3323.920801] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATU= S 0x0200800C > > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid 32774)= at page 0, read from 'TC0' (0x54433000) (8) > > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeo= ut, signaled seq=3D274140, emitted seq=3D274142 > > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin! > > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:cr= tc-0] hw_done or flip_done timed out > > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 s= econds. > > [ 3504.834103] Not tainted 4.20.0-rc1-amd #2 > > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disabl= es this message. > > [ 3504.834107] kworker/u32:2 D 0 3872 2 0x80000000 > > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper] > > [ 3504.834126] Call Trace: > > [ 3504.834133] ? __schedule+0x2a0/0x880 > > [ 3504.834136] schedule+0x28/0x80 > > [ 3504.834139] schedule_timeout+0x25d/0x380 > > [ 3504.834217] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu= ] > > [ 3504.834292] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [= amdgpu] > > [ 3504.834297] dma_fence_default_wait+0x23b/0x2a0 > > [ 3504.834301] ? dma_fence_release+0x90/0x90 > > [ 3504.834304] dma_fence_wait_timeout+0xdd/0x100 > > [ 3504.834308] reservation_object_wait_timeout_rcu+0x161/0x270 > > [ 3504.834387] amdgpu_dm_do_flip+0x112/0x370 [amdgpu] > > [ 3504.834468] amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu] > > [ 3504.834472] ? __switch_to_asm+0x40/0x70 > > [ 3504.834475] ? wait_for_completion_timeout+0x3b/0x1a0 > > [ 3504.834477] ? __switch_to_asm+0x34/0x70 > > [ 3504.834480] ? __switch_to_asm+0x40/0x70 > > [ 3504.834483] ? __switch_to+0x1ba/0x450 > > [ 3504.834492] commit_tail+0x3d/0x70 [drm_kms_helper] > > [ 3504.834497] process_one_work+0x1aa/0x3a0 > > [ 3504.834500] worker_thread+0x30/0x3a0 > > [ 3504.834503] ? drain_workqueue+0x130/0x130 > > [ 3504.834506] kthread+0x11d/0x140 > > [ 3504.834509] ? kthread_park+0x80/0x80 > > [ 3504.834512] ret_from_fork+0x22/0x40 > > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:501 kthre= ad_park+0x6c/0x80 > > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_ascii nl= s_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_h= dmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwdep chash g= pu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit sp5100_tc= o soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof serio_raw pcs= pkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng_core acpi_c= pufreq button parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs= 4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress x= xhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mo= d sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmu= l crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pci aes_x86_64 liba= hci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i2c_piix4 libphy us= bcore scsi_mod thermal wmi gpio_amdpt gpio_generic > > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not tainted 4.20= .0-rc1-amd #2 > > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS 5.13 0= 8/07/2018 > > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80 > > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8 8b c3 0= 0 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff eb f0 <0= f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 0f 1f > > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202 > > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: dead0000= 00000200 > > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: ffff9dcd= 8a9049c0 > > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 00000000= 00000000 > > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: ffff9dcd= 92ba27a0 > > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: dead0000= 00000100 > > [ 3516.645347] FS: 00007efea483e700(0000) GS:ffff9dcd96d80000(0000) knl= GS:0000000000000000 > > [ 3516.645349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 00000000= 003406e0 > > [ 3516.645352] Call Trace: > > [ 3516.645362] drm_sched_entity_fini+0x37/0x190 [gpu_sched] > > [ 3516.645423] amdgpu_vm_fini+0xad/0x530 [amdgpu] > > [ 3516.645429] ? idr_destroy+0x78/0xc0 > > [ 3516.645481] amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu] > > [ 3516.645496] drm_file_free.part.5+0x21f/0x300 [drm] > > [ 3516.645510] drm_release+0xaa/0x120 [drm] > > [ 3516.645514] __fput+0xac/0x1e0 > > [ 3516.645518] task_work_run+0x8f/0xb0 > > [ 3516.645522] do_exit+0x2e6/0xb30 > > [ 3516.645525] do_group_exit+0x3a/0xb0 > > [ 3516.645528] get_signal+0x27a/0x5f0 > > [ 3516.645532] do_signal+0x30/0x6d0 > > [ 3516.645537] exit_to_usermode_loop+0x89/0xf0 > > [ 3516.645540] do_syscall_64+0xda/0xe0 > > [ 3516.645544] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a > > [ 3516.645553] Code: Bad RIP value. > > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_RAX: 000= 00000000000ca > > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 00007efe= b6b9d19a > > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007efe= a483d980 > > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ffe= 661d7080 > > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 00000000= 00000000 > > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 00007efe= a483d930 > > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]--- >=20 > It is important to note that the most common code that appears while using= > generic kernels is 147 despite of 146 that is being shown here. >=20 > Xorg.0.log reports nothing. >=20 > I said that these were bad news because seems to me that both CPU and amdg= pu > driver are defective. >=20 > I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk and= > sclk) all the time while with this new kernel the GPU tries to scale the > performance. >=20 > Also, it is important to note that the nvidia GTX 1070 throws a lot of xid= > error codes ( see > https://devtalk.nvidia.com/default/topic/1043483/linux/xid-errors-on-gtx-1= 070-linux/post/5293440 > ). And this is why I'm thinking that the 1800X has a defective pci-control= ler. > And it is also the second part of the "really bad news". Maybe it is happe= ning > mostly with ryzen processors? I'll test the RX480 with the other computer A= SAP, > need to send informations about the CPU for AMD to proceed with the warran= ty > process. >=20 > The GTX 1070 works without a single problem outside of this PC. The other c= ards > that I had tested before follows the same pattern ( 2 RX480, 1 RX 580, 1 G= TX > 970, 1 GTX 1070). >=20 > Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the car= ds > don't have any problem I'm selling the cards and soon I'll have only one o= r > none. The seller told me off because of requesting warranty for the RX 480= when > I thought it was defective, he sent me another different and the one that I= > sent was working without any issues according to him. >=20 > I'm already in a new stage of re-sending the CPU for AMD, and praying to s= olve > my endless torment. I think that they'll have to refund me (and then I'll h= ave > a loss with the motherboard). >=20 > Please tell me any other step that you may want to be done. >=20 > I can also provide a full description of the kernel compilation (parameter= s) > and even provide a link to the generated .deb packages. > You are receiving this mail because: > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel --Apple-Mail-18BA1764-0536-4ED8-BC46-7B20ABC85084 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable What I want to know is what is calling your= machine =E2=80=98localhorst=E2=80=99? 

Sent from my iPhone

On Nov 20, 2= 018, at 9:15 AM, bugzilla= -daemon@freedesktop.org wrote:

=20 =20 =20

Comment= # 47 on bug 105733<= /a> from "> Allan
I have really bad news.

I'm delaying a lot to answer because I literally sent for warranty or replac=
ed
ALL of my components in the PC.

The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself batch=
ed
35.

But OK, let's talk about the amdgpu :

(In reply to Andrey Grodzovsky from comment #25)
> (In reply to Allan from comment #12)
> Can you build latest kernel (4.18) and grab again latest firmware and t=
ry
> again ?
> Links to kernel and firmware:
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-=
drm-next
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/l=
inux-firmware.git/ 

For reasons already explained here I couldn't either compile or test it befo=
re,
so please don't be mad with me :
- Sold my old PC.
- My notebook was completely filled with files.
- Components on warranty. Testing everything else.

So I managed to borrow a PC to test the video cards. I have tested only the
nvidia one to prove for AMD that the GPU is working and the pci-controller (=
a
guess of mine) of the CPU/chipset that is broken. Going to test the RX480 on=

this PC as soon as possible. My warranties are expiring and I had to enumera=
te
priorities.

I already said it here but, with the 1800X I couldn't even clone the git
repository (the checksum always fails, tried many times).

Then I managed to free some space on my notebook and started to build
yesterday.
- Included amd-ucode firmware.
- Included polaris10 firmware (for RX480).
- Made some optimizations for ryzen as descbribed on the gentoo's dedicated
page.

Compiled, version 4.20-rc1 as present in the branch. No errors reported.

There are 2 main applications that are easier to test right now to find the
problems :
- Metro 2033 Redux through steam.
- Left for Dead 2 through steam.

Started Metro 2033, worked for some minutes with no issue, but it was for so=
me
reason without any sound. Closed. Turned off the HDMI audio on pavucontrol t=
o
use only the default output. Restarted steam.

Started Left for Dead 2 this time. Was able to change graphics settings to m=
ax
without AA and vsync. Played for 15 seconds and got a screen freeze. Waited f=
or
a script to record properly the logs and temps. Hard rebooted. This time eve=
n
my BIOS/EFI screen had a green background, but still operational. Everything=

was green except the text. Rebooted again, got back to normal colors.

And here are the logs :

kern.log about Firefox usage :
> Nov 14 05:26:50 desk kernel: [  324.714998] Chrom=
e_~dThread[1788]: segfault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 erro=
r 6 in libxul.so[7fbfee5cf000+3a2c000]

It points that the CPU stills with either a problematic microcode or is
defective.

dmesg about amdgpu screen freeze :
> [ 3323.920795] amdgpu 0000:09:00.0: GPU fault det=
ected: 146 0x0000080c for process hl2_linux pid 14648 thread amdgpu_cs:0 pid=
 14653
> [ 3323.920799] amdgpu 0000:09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR=
   0x00000000
> [ 3323.920801] amdgpu 0000:09:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STAT=
US 0x0200800C
> [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid 32774=
) at page 0, read from 'TC0' (0x54433000) (8)
> [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx time=
out, signaled seq=3D274140, emitted seq=3D274142
> [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin!
> [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:c=
rtc-0] hw_done or flip_done timed out
> [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 s=
econds.
> [ 3504.834103]       Not tainted 4.20.0-rc1-amd #2
> [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" di=
sables this message.
> [ 3504.834107] kworker/u32:2   D    0  3872      2 0x80000000
> [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper]
> [ 3504.834126] Call Trace:
> [ 3504.834133]  ? __schedule+0x2a0/0x880
> [ 3504.834136]  schedule+0x28/0x80
> [ 3504.834139]  schedule_timeout+0x25d/0x380
> [ 3504.834217]  ? dce110_timing_generator_get_position+0x5b/0x70 [amdgp=
u]
> [ 3504.834292]  ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0=
 [amdgpu]
> [ 3504.834297]  dma_fence_default_wait+0x23b/0x2a0
> [ 3504.834301]  ? dma_fence_release+0x90/0x90
> [ 3504.834304]  dma_fence_wait_timeout+0xdd/0x100
> [ 3504.834308]  reservation_object_wait_timeout_rcu+0x161/0x270
> [ 3504.834387]  amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> [ 3504.834468]  amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> [ 3504.834472]  ? __switch_to_asm+0x40/0x70
> [ 3504.834475]  ? wait_for_completion_timeout+0x3b/0x1a0
> [ 3504.834477]  ? __switch_to_asm+0x34/0x70
> [ 3504.834480]  ? __switch_to_asm+0x40/0x70
> [ 3504.834483]  ? __switch_to+0x1ba/0x450
> [ 3504.834492]  commit_tail+0x3d/0x70 [drm_kms_helper]
> [ 3504.834497]  process_one_work+0x1aa/0x3a0
> [ 3504.834500]  worker_thread+0x30/0x3a0
> [ 3504.834503]  ? drain_workqueue+0x130/0x130
> [ 3504.834506]  kthread+0x11d/0x140
> [ 3504.834509]  ? kthread_park+0x80/0x80
> [ 3504.834512]  ret_from_fork+0x22/0x40
> [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:501 kthr=
ead_park+0x6c/0x80
> [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_ascii n=
ls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_=
hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwdep chash g=
pu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit sp5100_tc=
o soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof serio_raw pcs=
pkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng_core acpi_c=
pufreq button parport_pc ppdev lp parport efivarfs ip_tables x_tables autofs=
4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_compress x=
xhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg dm_crypt dm_mo=
d sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pclmul crc32_pclmu=
l crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pci aes_x86_64 liba=
hci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i2c_piix4 libphy us=
bcore scsi_mod thermal wmi gpio_amdpt gpio_generic
> [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not tainted 4.2=
0.0-rc1-amd #2
> [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS 5.13 0=
8/07/2018
> [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80
> [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8 8b c3=
 00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff eb f0 &=
lt;0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 0f 1f
> [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202
> [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: dead000=
000000200
> [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: ffff9dc=
d8a9049c0
> [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0000000=
000000000
> [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: ffff9dc=
d92ba27a0
> [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: dead000=
000000100
> [ 3516.645347] FS:  00007efea483e700(0000) GS:ffff9dcd96d80000(0000) kn=
lGS:0000000000000000
> [ 3516.645349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 0000000=
0003406e0
> [ 3516.645352] Call Trace:
> [ 3516.645362]  drm_sched_entity_fini+0x37/0x190 [gpu_sched]
> [ 3516.645423]  amdgpu_vm_fini+0xad/0x530 [amdgpu]
> [ 3516.645429]  ? idr_destroy+0x78/0xc0
> [ 3516.645481]  amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu]
> [ 3516.645496]  drm_file_free.part.5+0x21f/0x300 [drm]
> [ 3516.645510]  drm_release+0xaa/0x120 [drm]
> [ 3516.645514]  __fput+0xac/0x1e0
> [ 3516.645518]  task_work_run+0x8f/0xb0
> [ 3516.645522]  do_exit+0x2e6/0xb30
> [ 3516.645525]  do_group_exit+0x3a/0xb0
> [ 3516.645528]  get_signal+0x27a/0x5f0
> [ 3516.645532]  do_signal+0x30/0x6d0
> [ 3516.645537]  exit_to_usermode_loop+0x89/0xf0
> [ 3516.645540]  do_syscall_64+0xda/0xe0
> [ 3516.645544]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 3516.645547] RIP: 0033:0x7efeb6b9d19a
> [ 3516.645553] Code: Bad RIP value.
> [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_RAX: 00=
000000000000ca
> [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 00007ef=
eb6b9d19a
> [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ef=
ea483d980
> [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff=
e661d7080
> [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0000000=
000000000
> [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 00007ef=
ea483d930
> [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]---

It is important to note that the most common code that appears while using
generic kernels is 147 despite of 146 that is being shown here.

Xorg.0.log reports nothing.

I said that these were bad news because seems to me that both CPU and amdgpu=

driver are defective.

I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk and
sclk) all the time while with this new kernel the GPU tries to scale the
performance.

Also, it is important to note that the nvidia GTX 1070 throws a lot of xid
error codes ( see
https://devtalk.nvidia.com/default/topic/10=
43483/linux/xid-errors-on-gtx-1070-linux/post/5293440
). And this is why I'm thinking that the 1800X has a defective pci-controlle=
r.
And it is also the second part of the "really bad news". Maybe it is happeni=
ng
mostly with ryzen processors? I'll test the RX480 with the other computer AS=
AP,
need to send informations about the CPU for AMD to proceed with the warranty=

process.

The GTX 1070 works without a single problem outside of this PC. The other ca=
rds
that I had tested before follows the same pattern ( 2 RX480, 1 RX 580, 1 GTX=

970, 1 GTX 1070).

Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the cards=

don't have any problem I'm selling the cards and soon I'll have only one or
none. The seller told me off because of requesting warranty for the RX 480 w=
hen
I thought it was defective, he sent me another different and the one that I
sent was working without any issues according to him.

I'm already in a new stage of re-sending the CPU for AMD, and praying to sol=
ve
my endless torment. I think that they'll have to refund me (and then I'll ha=
ve
a loss with the motherboard).

Please tell me any other step that you may want to be done.

I can also provide a full description of the kernel compilation (parameters)=

and even provide a link to the generated .deb packages.


You are receiving this mail because:
  • You are the assignee for the bug.
=20
________= _______________________________________
dri-devel mailing li= st
dri-de= vel@lists.freedesktop.org
https://lists.freedesktop.org/mailman= /listinfo/dri-devel
= --Apple-Mail-18BA1764-0536-4ED8-BC46-7B20ABC85084-- --===============0515352702== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============0515352702==--