From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 105733] Amdgpu randomly hangs and only ssh works. Mouse cursor moves sometimes but does nothing. Keyboard stops working. Date: Tue, 05 Feb 2019 16:28:03 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1678610462==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [131.252.210.165]) by gabe.freedesktop.org (Postfix) with ESMTP id A12A76E73B for ; Tue, 5 Feb 2019 16:28:04 +0000 (UTC) In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1678610462== Content-Type: multipart/alternative; boundary="15493840846.cd3F71aF9.2574" Content-Transfer-Encoding: 7bit --15493840846.cd3F71aF9.2574 Date: Tue, 5 Feb 2019 16:28:04 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D105733 --- Comment #71 from Garry Hurley Jr --- What I want to know is what is calling your machine =E2=80=98localhorst=E2= =80=99?=20 Sent from my iPhone > On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@freedesktop.org wrote: >=20 > Comment # 47 on bug 105733 from Allan > I have really bad news. >=20 > I'm delaying a lot to answer because I literally sent for warranty or rep= laced > ALL of my components in the PC. >=20 > The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself ba= tched > 35. >=20 > But OK, let's talk about the amdgpu : >=20 > (In reply to Andrey Grodzovsky from comment #25) > > (In reply to Allan from comment #12) > > Can you build latest kernel (4.18) and grab again latest firmware and t= ry > > again ? > > Links to kernel and firmware: > > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-staging-drm-next > > https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware= .git/=20 >=20 > For reasons already explained here I couldn't either compile or test it b= efore, > so please don't be mad with me : > - Sold my old PC. > - My notebook was completely filled with files. > - Components on warranty. Testing everything else. >=20 > So I managed to borrow a PC to test the video cards. I have tested only t= he > nvidia one to prove for AMD that the GPU is working and the pci-controlle= r (a > guess of mine) of the CPU/chipset that is broken. Going to test the RX480= on > this PC as soon as possible. My warranties are expiring and I had to enum= erate > priorities. >=20 > I already said it here but, with the 1800X I couldn't even clone the git > repository (the checksum always fails, tried many times). >=20 > Then I managed to free some space on my notebook and started to build > yesterday. > - Included amd-ucode firmware. > - Included polaris10 firmware (for RX480). > - Made some optimizations for ryzen as descbribed on the gentoo's dedicat= ed > page. >=20 > Compiled, version 4.20-rc1 as present in the branch. No errors reported. >=20 > There are 2 main applications that are easier to test right now to find t= he > problems : > - Metro 2033 Redux through steam. > - Left for Dead 2 through steam. >=20 > Started Metro 2033, worked for some minutes with no issue, but it was for= some > reason without any sound. Closed. Turned off the HDMI audio on pavucontro= l to > use only the default output. Restarted steam. >=20 > Started Left for Dead 2 this time. Was able to change graphics settings t= o max > without AA and vsync. Played for 15 seconds and got a screen freeze. Wait= ed for > a script to record properly the logs and temps. Hard rebooted. This time = even > my BIOS/EFI screen had a green background, but still operational. Everyth= ing > was green except the text. Rebooted again, got back to normal colors. >=20 > And here are the logs : >=20 > kern.log about Firefox usage : > > Nov 14 05:26:50 desk kernel: [ 324.714998] Chrome_~dThread[1788]: segf= ault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.so[7fbf= ee5cf000+3a2c000] >=20 > It points that the CPU stills with either a problematic microcode or is > defective. >=20 > dmesg about amdgpu screen freeze : > > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x0000080c = for process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653 > > [ 3323.920799] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR= 0x00000000 > > [ 3323.920801] amdgpu 0000:09:00.0: VM_CONTEXT1_PROTECTION_FAULT_STAT= US 0x0200800C > > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid 32774= ) at page 0, read from 'TC0' (0x54433000) (8) > > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx time= out, signaled seq=3D274140, emitted seq=3D274142 > > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin! > > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:46:c= rtc-0] hw_done or flip_done timed out > > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more than 120 = seconds. > > [ 3504.834103] Not tainted 4.20.0-rc1-amd #2 > > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disab= les this message. > > [ 3504.834107] kworker/u32:2 D 0 3872 2 0x80000000 > > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_helper] > > [ 3504.834126] Call Trace: > > [ 3504.834133] ? __schedule+0x2a0/0x880 > > [ 3504.834136] schedule+0x28/0x80 > > [ 3504.834139] schedule_timeout+0x25d/0x380 > > [ 3504.834217] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgp= u] > > [ 3504.834292] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0= [amdgpu] > > [ 3504.834297] dma_fence_default_wait+0x23b/0x2a0 > > [ 3504.834301] ? dma_fence_release+0x90/0x90 > > [ 3504.834304] dma_fence_wait_timeout+0xdd/0x100 > > [ 3504.834308] reservation_object_wait_timeout_rcu+0x161/0x270 > > [ 3504.834387] amdgpu_dm_do_flip+0x112/0x370 [amdgpu] > > [ 3504.834468] amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu] > > [ 3504.834472] ? __switch_to_asm+0x40/0x70 > > [ 3504.834475] ? wait_for_completion_timeout+0x3b/0x1a0 > > [ 3504.834477] ? __switch_to_asm+0x34/0x70 > > [ 3504.834480] ? __switch_to_asm+0x40/0x70 > > [ 3504.834483] ? __switch_to+0x1ba/0x450 > > [ 3504.834492] commit_tail+0x3d/0x70 [drm_kms_helper] > > [ 3504.834497] process_one_work+0x1aa/0x3a0 > > [ 3504.834500] worker_thread+0x30/0x3a0 > > [ 3504.834503] ? drain_workqueue+0x130/0x130 > > [ 3504.834506] kthread+0x11d/0x140 > > [ 3504.834509] ? kthread_park+0x80/0x80 > > [ 3504.834512] ret_from_fork+0x22/0x40 > > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:501 kthr= ead_park+0x6c/0x80 > > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_ascii n= ls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec= _hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwdep chas= h gpu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit sp510= 0_tco soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof serio_ra= w pcspkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng_core = acpi_cpufreq button parport_pc ppdev lp parport efivarfs ip_tables x_tables= autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress zstd_co= mpress xxhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg dm_cr= ypt dm_mod sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pclmul c= rc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pci aes_= x86_64 libahci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i2c_pii= x4 libphy usbcore scsi_mod thermal wmi gpio_amdpt gpio_generic > > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not tainted 4.2= 0.0-rc1-amd #2 > > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS 5.13 = 08/07/2018 > > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80 > > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8 8b c3= 00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff eb f0= <0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 0f 1f > > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202 > > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: dead000= 000000200 > > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: ffff9dc= d8a9049c0 > > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0000000= 000000000 > > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: ffff9dc= d92ba27a0 > > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: dead000= 000000100 > > [ 3516.645347] FS: 00007efea483e700(0000) GS:ffff9dcd96d80000(0000) kn= lGS:0000000000000000 > > [ 3516.645349] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 0000000= 0003406e0 > > [ 3516.645352] Call Trace: > > [ 3516.645362] drm_sched_entity_fini+0x37/0x190 [gpu_sched] > > [ 3516.645423] amdgpu_vm_fini+0xad/0x530 [amdgpu] > > [ 3516.645429] ? idr_destroy+0x78/0xc0 > > [ 3516.645481] amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu] > > [ 3516.645496] drm_file_free.part.5+0x21f/0x300 [drm] > > [ 3516.645510] drm_release+0xaa/0x120 [drm] > > [ 3516.645514] __fput+0xac/0x1e0 > > [ 3516.645518] task_work_run+0x8f/0xb0 > > [ 3516.645522] do_exit+0x2e6/0xb30 > > [ 3516.645525] do_group_exit+0x3a/0xb0 > > [ 3516.645528] get_signal+0x27a/0x5f0 > > [ 3516.645532] do_signal+0x30/0x6d0 > > [ 3516.645537] exit_to_usermode_loop+0x89/0xf0 > > [ 3516.645540] do_syscall_64+0xda/0xe0 > > [ 3516.645544] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a > > [ 3516.645553] Code: Bad RIP value. > > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_RAX: 00= 000000000000ca > > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 00007ef= eb6b9d19a > > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00007ef= ea483d980 > > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 00007ff= e661d7080 > > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0000000= 000000000 > > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 00007ef= ea483d930 > > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]--- >=20 > It is important to note that the most common code that appears while using > generic kernels is 147 despite of 146 that is being shown here. >=20 > Xorg.0.log reports nothing. >=20 > I said that these were bad news because seems to me that both CPU and amd= gpu > driver are defective. >=20 > I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk and > sclk) all the time while with this new kernel the GPU tries to scale the > performance. >=20 > Also, it is important to note that the nvidia GTX 1070 throws a lot of xid > error codes ( see > https://devtalk.nvidia.com/default/topic/1043483/linux/xid-errors-on-gtx-= 1070-linux/post/5293440 > ). And this is why I'm thinking that the 1800X has a defective pci-contro= ller. > And it is also the second part of the "really bad news". Maybe it is happ= ening > mostly with ryzen processors? I'll test the RX480 with the other computer= ASAP, > need to send informations about the CPU for AMD to proceed with the warra= nty > process. >=20 > The GTX 1070 works without a single problem outside of this PC. The other= cards > that I had tested before follows the same pattern ( 2 RX480, 1 RX 580, 1 = GTX > 970, 1 GTX 1070). >=20 > Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the ca= rds > don't have any problem I'm selling the cards and soon I'll have only one = or > none. The seller told me off because of requesting warranty for the RX 48= 0 when > I thought it was defective, he sent me another different and the one that= I > sent was working without any issues according to him. >=20 > I'm already in a new stage of re-sending the CPU for AMD, and praying to = solve > my endless torment. I think that they'll have to refund me (and then I'll= have > a loss with the motherboard). >=20 > Please tell me any other step that you may want to be done. >=20 > I can also provide a full description of the kernel compilation (paramete= rs) > and even provide a link to the generated .deb packages. > You are receiving this mail because: > You are the assignee for the bug. > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel --=20 You are receiving this mail because: You are the assignee for the bug.= --15493840846.cd3F71aF9.2574 Date: Tue, 5 Feb 2019 16:28:04 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated

Comme= nt # 71 on bug 10573= 3 from Garry Hurley Jr
What I want to know is what is calling your machine =E2=80=98l=
ocalhorst=E2=80=99?=20

Sent from my iPhone

> On Nov 20, 2018, at 9:15 AM, bugzilla-daemon@freedesktop.org=
 wrote:
>=20
> Comment # 47 on bug 105733 from Allan
> I have really bad news.
>=20
> I'm delaying a lot to answer because I literally sent for warranty or =
replaced
> ALL of my components in the PC.
>=20
> The CPU (R7 1800X) was replaced from a batch 21 to a new by AMD itself=
 batched
> 35.
>=20
> But OK, let's talk about the amdgpu :
>=20
> (In reply to Andrey Grodzovsky from comment #25)
> > (In reply to Allan from =
comment #12)
> > Can you build latest kernel (4.18) and grab again latest firmware=
 and try
> > again ?
> > Links to kernel and firmware:
> > https://cgit.freedesktop.org/~agd5f/linux/log/?h=3Damd-s=
taging-drm-next
> > https://git.kernel.org/pub/scm/linux/kernel/git/fir=
mware/linux-firmware.git/=20
>=20
> For reasons already explained here I couldn't either compile or test i=
t before,
> so please don't be mad with me :
> - Sold my old PC.
> - My notebook was completely filled with files.
> - Components on warranty. Testing everything else.
>=20
> So I managed to borrow a PC to test the video cards. I have tested onl=
y the
> nvidia one to prove for AMD that the GPU is working and the pci-contro=
ller (a
> guess of mine) of the CPU/chipset that is broken. Going to test the RX=
480 on
> this PC as soon as possible. My warranties are expiring and I had to e=
numerate
> priorities.
>=20
> I already said it here but, with the 1800X I couldn't even clone the g=
it
> repository (the checksum always fails, tried many times).
>=20
> Then I managed to free some space on my notebook and started to build
> yesterday.
> - Included amd-ucode firmware.
> - Included polaris10 firmware (for RX480).
> - Made some optimizations for ryzen as descbribed on the gentoo's dedi=
cated
> page.
>=20
> Compiled, version 4.20-rc1 as present in the branch. No errors reporte=
d.
>=20
> There are 2 main applications that are easier to test right now to fin=
d the
> problems :
> - Metro 2033 Redux through steam.
> - Left for Dead 2 through steam.
>=20
> Started Metro 2033, worked for some minutes with no issue, but it was =
for some
> reason without any sound. Closed. Turned off the HDMI audio on pavucon=
trol to
> use only the default output. Restarted steam.
>=20
> Started Left for Dead 2 this time. Was able to change graphics setting=
s to max
> without AA and vsync. Played for 15 seconds and got a screen freeze. W=
aited for
> a script to record properly the logs and temps. Hard rebooted. This ti=
me even
> my BIOS/EFI screen had a green background, but still operational. Ever=
ything
> was green except the text. Rebooted again, got back to normal colors.
>=20
> And here are the logs :
>=20
> kern.log about Firefox usage :
> > Nov 14 05:26:50 desk kernel: [  324.714998] Chrome_~dThread[1788]=
: segfault at 0 ip 00007fbfee5e3181 sp 00007fbfec2d1ad0 error 6 in libxul.s=
o[7fbfee5cf000+3a2c000]
>=20
> It points that the CPU stills with either a problematic microcode or is
> defective.
>=20
> dmesg about amdgpu screen freeze :
> > [ 3323.920795] amdgpu 0000:09:00.0: GPU fault detected: 146 0x000=
0080c for process hl2_linux pid 14648 thread amdgpu_cs:0 pid 14653
> > [ 3323.920799] amdgpu 0000:09:00.0:   VM_CONTEXT1_PROTECTION_FAUL=
T_ADDR   0x00000000
> > [ 3323.920801] amdgpu 0000:09:00.0:   VM_CONTEXT1_PROTECTION_FAUL=
T_STATUS 0x0200800C
> > [ 3323.920804] amdgpu 0000:09:00.0: VM fault (0x0c, vmid 1, pasid=
 32774) at page 0, read from 'TC0' (0x54433000) (8)
> > [ 3334.103233] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gf=
x timeout, signaled seq=3D274140, emitted seq=3D274142
> > [ 3334.103239] amdgpu 0000:09:00.0: GPU reset begin!
> > [ 3344.332607] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRT=
C:46:crtc-0] hw_done or flip_done timed out
> > [ 3504.834097] INFO: task kworker/u32:2:3872 blocked for more tha=
n 120 seconds.
> > [ 3504.834103]       Not tainted 4.20.0-rc1-amd #2
> > [ 3504.834105] "echo 0 > /proc/sys/kernel/hung_task_timeo=
ut_secs" disables this message.
> > [ 3504.834107] kworker/u32:2   D    0  3872      2 0x80000000
> > [ 3504.834123] Workqueue: events_unbound commit_work [drm_kms_hel=
per]
> > [ 3504.834126] Call Trace:
> > [ 3504.834133]  ? __schedule+0x2a0/0x880
> > [ 3504.834136]  schedule+0x28/0x80
> > [ 3504.834139]  schedule_timeout+0x25d/0x380
> > [ 3504.834217]  ? dce110_timing_generator_get_position+0x5b/0x70 =
[amdgpu]
> > [ 3504.834292]  ? dce110_timing_generator_get_crtc_scanoutpos+0x7=
0/0xb0 [amdgpu]
> > [ 3504.834297]  dma_fence_default_wait+0x23b/0x2a0
> > [ 3504.834301]  ? dma_fence_release+0x90/0x90
> > [ 3504.834304]  dma_fence_wait_timeout+0xdd/0x100
> > [ 3504.834308]  reservation_object_wait_timeout_rcu+0x161/0x270
> > [ 3504.834387]  amdgpu_dm_do_flip+0x112/0x370 [amdgpu]
> > [ 3504.834468]  amdgpu_dm_atomic_commit_tail+0x68b/0xcd0 [amdgpu]
> > [ 3504.834472]  ? __switch_to_asm+0x40/0x70
> > [ 3504.834475]  ? wait_for_completion_timeout+0x3b/0x1a0
> > [ 3504.834477]  ? __switch_to_asm+0x34/0x70
> > [ 3504.834480]  ? __switch_to_asm+0x40/0x70
> > [ 3504.834483]  ? __switch_to+0x1ba/0x450
> > [ 3504.834492]  commit_tail+0x3d/0x70 [drm_kms_helper]
> > [ 3504.834497]  process_one_work+0x1aa/0x3a0
> > [ 3504.834500]  worker_thread+0x30/0x3a0
> > [ 3504.834503]  ? drain_workqueue+0x130/0x130
> > [ 3504.834506]  kthread+0x11d/0x140
> > [ 3504.834509]  ? kthread_park+0x80/0x80
> > [ 3504.834512]  ret_from_fork+0x22/0x40
> > [ 3516.645267] WARNING: CPU: 14 PID: 14694 at kernel/kthread.c:50=
1 kthread_park+0x6c/0x80
> > [ 3516.645271] Modules linked in: fuse edac_mce_amd kvm_amd nls_a=
scii nls_cp437 vfat fat snd_hda_codec_realtek snd_hda_codec_generic snd_hda=
_codec_hdmi snd_hda_intel snd_hda_codec joydev amdgpu snd_hda_core snd_hwde=
p chash gpu_sched snd_pcm snd_timer ttm drm_kms_helper snd drm i2c_algo_bit=
 sp5100_tco soundcore kvm efi_pstore efivars sg irqbypass evdev wmi_bmof se=
rio_raw pcspkr k10temp ccp tpm_crb pcc_cpufreq tpm_tis tpm_tis_core tpm rng=
_core acpi_cpufreq button parport_pc ppdev lp parport efivarfs ip_tables x_=
tables autofs4 ext4 crc16 mbcache jbd2 fscrypto btrfs xor zstd_decompress z=
std_compress xxhash raid6_pq libcrc32c crc32c_generic algif_skcipher af_alg=
 dm_crypt dm_mod sd_mod hid_generic usbhid hid uas usb_storage crct10dif_pc=
lmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ahci xhci_pc=
i aes_x86_64 libahci crypto_simd xhci_hcd cryptd glue_helper libata r8169 i=
2c_piix4 libphy usbcore scsi_mod thermal wmi gpio_amdpt gpio_generic
> > [ 3516.645324] CPU: 14 PID: 14694 Comm: TaskSchedulerFo Not taint=
ed 4.20.0-rc1-amd #2
> > [ 3516.645327] Hardware name: BIOSTAR Group X370GT7/X370GT7, BIOS=
 5.13 08/07/2018
> > [ 3516.645330] RIP: 0010:kthread_park+0x6c/0x80
> > [ 3516.645333] Code: 18 e8 88 6c 67 00 be 40 00 00 00 48 89 df e8=
 8b c3 00 00 48 85 c0 74 1b 31 c0 5b 5d c3 0f 0b eb ae 0f 0b b8 da ff ff ff=
 eb f0 <0f> 0b b8 f0 ff ff ff eb e7 0f 0b eb e3 0f 1f 80 00 00 00 00 =
0f 1f
> > [ 3516.645335] RSP: 0018:ffffbafdc3fcfb60 EFLAGS: 00010202
> > [ 3516.645338] RAX: 0000000000000004 RBX: ffff9dcd93f140c0 RCX: d=
ead000000000200
> > [ 3516.645339] RDX: ffff9dcd92ba7430 RSI: ffff9dcd93f140c0 RDI: f=
fff9dcd8a9049c0
> > [ 3516.645341] RBP: ffff9dcd940a5360 R08: ffff9dcd96da25a8 R09: 0=
000000000000000
> > [ 3516.645343] R10: 0000000000000000 R11: 000000000000019c R12: f=
fff9dcd92ba27a0
> > [ 3516.645344] R13: ffff9dcd76d34200 R14: 0000000000000206 R15: d=
ead000000000100
> > [ 3516.645347] FS:  00007efea483e700(0000) GS:ffff9dcd96d80000(00=
00) knlGS:0000000000000000
> > [ 3516.645349] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3516.645351] CR2: 00005654fe725e10 CR3: 0000000200d40000 CR4: 0=
0000000003406e0
> > [ 3516.645352] Call Trace:
> > [ 3516.645362]  drm_sched_entity_fini+0x37/0x190 [gpu_sched]
> > [ 3516.645423]  amdgpu_vm_fini+0xad/0x530 [amdgpu]
> > [ 3516.645429]  ? idr_destroy+0x78/0xc0
> > [ 3516.645481]  amdgpu_driver_postclose_kms+0x151/0x270 [amdgpu]
> > [ 3516.645496]  drm_file_free.part.5+0x21f/0x300 [drm]
> > [ 3516.645510]  drm_release+0xaa/0x120 [drm]
> > [ 3516.645514]  __fput+0xac/0x1e0
> > [ 3516.645518]  task_work_run+0x8f/0xb0
> > [ 3516.645522]  do_exit+0x2e6/0xb30
> > [ 3516.645525]  do_group_exit+0x3a/0xb0
> > [ 3516.645528]  get_signal+0x27a/0x5f0
> > [ 3516.645532]  do_signal+0x30/0x6d0
> > [ 3516.645537]  exit_to_usermode_loop+0x89/0xf0
> > [ 3516.645540]  do_syscall_64+0xda/0xe0
> > [ 3516.645544]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [ 3516.645547] RIP: 0033:0x7efeb6b9d19a
> > [ 3516.645553] Code: Bad RIP value.
> > [ 3516.645555] RSP: 002b:00007efea483d810 EFLAGS: 00000246 ORIG_R=
AX: 00000000000000ca
> > [ 3516.645557] RAX: fffffffffffffdfc RBX: 00007efea483d958 RCX: 0=
0007efeb6b9d19a
> > [ 3516.645559] RDX: 0000000000000000 RSI: 0000000000000080 RDI: 0=
0007efea483d980
> > [ 3516.645560] RBP: 0000000000000000 R08: 0000000000000000 R09: 0=
0007ffe661d7080
> > [ 3516.645562] R10: 00007efea483d860 R11: 0000000000000246 R12: 0=
000000000000000
> > [ 3516.645564] R13: 00007efea483d980 R14: 00007efea483d990 R15: 0=
0007efea483d930
> > [ 3516.645566] ---[ end trace 7da35ac4aa65c90d ]---
>=20
> It is important to note that the most common code that appears while u=
sing
> generic kernels is 147 despite of 146 that is being shown here.
>=20
> Xorg.0.log reports nothing.
>=20
> I said that these were bad news because seems to me that both CPU and =
amdgpu
> driver are defective.
>=20
> I noticed that while running kernel 4.18 the gpu is kept at 100% (mclk=
 and
> sclk) all the time while with this new kernel the GPU tries to scale t=
he
> performance.
>=20
> Also, it is important to note that the nvidia GTX 1070 throws a lot of=
 xid
> error codes ( see
> https://devtalk.nvidia.com/default/t=
opic/1043483/linux/xid-errors-on-gtx-1070-linux/post/5293440
> ). And this is why I'm thinking that the 1800X has a defective pci-con=
troller.
> And it is also the second part of the "really bad news". May=
be it is happening
> mostly with ryzen processors? I'll test the RX480 with the other compu=
ter ASAP,
> need to send informations about the CPU for AMD to proceed with the wa=
rranty
> process.
>=20
> The GTX 1070 works without a single problem outside of this PC. The ot=
her cards
> that I had tested before follows the same pattern ( 2 RX480, 1 RX 580,=
 1 GTX
> 970, 1 GTX 1070).
>=20
> Currently I have only 1 RX480 and 1 GTX 1070. Now that I know that the=
 cards
> don't have any problem I'm selling the cards and soon I'll have only o=
ne or
> none. The seller told me off because of requesting warranty for the RX=
 480 when
> I thought it was defective, he sent me another different and the one t=
hat I
> sent was working without any issues according to him.
>=20
> I'm already in a new stage of re-sending the CPU for AMD, and praying =
to solve
> my endless torment. I think that they'll have to refund me (and then I=
'll have
> a loss with the motherboard).
>=20
> Please tell me any other step that you may want to be done.
>=20
> I can also provide a full description of the kernel compilation (param=
eters)
> and even provide a link to the generated .deb packages.
> You are receiving this mail because:
> You are the assignee for the bug.
> _______________________________________________
> dri-devel mailing list
> dri-devel@l=
ists.freedesktop.org
> h=
ttps://lists.freedesktop.org/mailman/listinfo/dri-devel


You are receiving this mail because:
  • You are the assignee for the bug.
= --15493840846.cd3F71aF9.2574-- --===============1678610462== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1678610462==--