From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon@freedesktop.org Subject: [Bug 100399] Kernel invalid opcode on unbinding amdgpu Date: Sun, 26 Mar 2017 03:03:51 +0000 Message-ID: Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============1293305471==" Return-path: Received: from culpepper.freedesktop.org (culpepper.freedesktop.org [IPv6:2610:10:20:722:a800:ff:fe98:4b55]) by gabe.freedesktop.org (Postfix) with ESMTP id A1E9B6E00E for ; Sun, 26 Mar 2017 03:03:52 +0000 (UTC) List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" To: dri-devel@lists.freedesktop.org List-Id: dri-devel@lists.freedesktop.org --===============1293305471== Content-Type: multipart/alternative; boundary="14904974320.f550.7212"; charset="UTF-8" --14904974320.f550.7212 Date: Sun, 26 Mar 2017 03:03:52 +0000 MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated https://bugs.freedesktop.org/show_bug.cgi?id=3D100399 Bug ID: 100399 Summary: Kernel invalid opcode on unbinding amdgpu Product: DRI Version: unspecified Hardware: x86-64 (AMD64) OS: Linux (All) Status: NEW Severity: normal Priority: medium Component: DRM/AMDgpu Assignee: dri-devel@lists.freedesktop.org Reporter: nospam@kota.moe I'm not sure where is the best place to post this report, so let me know if there is a better place than here. I have a RX480 GPU that I use with amdgpu on linux 4.11.0-rc3+ (compiled wi= th the Ubuntu 4.8.0 lowlatency config), and everything seemingly works fine un= til I try to unbind amdgpu from the device. This also happened with linux 4.10.0-rc3+ I've reproduced this regardless of whether the amdgpu device is the primary= or secondary display device, and whether X is active or not. Observe: $ lspci | grep AMD 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480] (rev c7) 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0 $ echo 01:00.0 | sudo tee /sys/bus/pci/devices/01:00.0/driver/unbind Segmentation Fault At this point, the system becomes unstable and some system calls seems to j= ust hang (not sure which exactly, but sudo and ps a breaks). Trying to shut down the system also hangs. dmesg output: [ 86.993436] ------------[ cut here ]------------ [ 86.993439] kernel BUG at drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:6930! [ 86.993442] invalid opcode: 0000 [#1] PREEMPT SMP [ 86.993443] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323 nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c ip_tables x_tables bnep bridge stp llc binfmt_misc dm_snapshot dm_bufio nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf input_leds serio_raw joydev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi [ 86.993488] mei_me snd_hda_intel mei snd_hda_codec snd_hda_core intel_pch_thermal snd_hwdep snd_pcm snd_timer snd soundcore hci_uart btbcm btqca btintel bluetooth intel_lpss_acpi intel_lpss shpchp acpi_als acpi_pad mac_hid kfifo_buf tpm_infineon industrialio kvm_intel kvm irqbypass it87 hwmon_vid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq hid_generic usbhid mxm_wmi amdkfd amd_iommu_v2 i915 amdgpu ttm drm_kms_helper igb e1000e syscopyarea sysfillrect dca psmouse nvme sysimgblt ptp fb_sys_fops nvme_core firewire_ohci pps_core i2c_algo_bit drm ahci firewire_core crc_itu_t libahci wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes [ 86.993519] CPU: 5 PID: 2955 Comm: tee Not tainted 4.11.0-rc3+ #1 [ 86.993521] Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F4 10/21/2015 [ 86.993523] task: ffff8ee839f4d880 task.stack: ffffacb00624c000 [ 86.993539] RIP: 0010:gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu] [ 86.993541] RSP: 0018:ffffacb00624fb68 EFLAGS: 00010046 [ 86.993543] RAX: 0000000000000000 RBX: ffff8ee855f6b2d8 RCX: 0000000000000000 [ 86.993545] RDX: 0000000000000000 RSI: ffff8ee855f6c750 RDI: ffff8ee855f68000 [ 86.993546] RBP: ffffacb00624fba8 R08: 000000000001e640 R09: ffffffffc039bcb9 [ 86.993548] R10: fffff1f06155f200 R11: 0000000000000000 R12: ffff8ee855f68000 [ 86.993550] R13: ffff8ee855f6b548 R14: ffff8ee855f6c750 R15: 0000000000000000 [ 86.993552] FS: 00007f1260269700(0000) GS:ffff8ee881d40000(0000) knlGS:0000000000000000 [ 86.993555] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 86.993556] CR2: 000055651d517908 CR3: 0000000831b78000 CR4: 00000000003406e0 [ 86.993558] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 86.993560] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 86.993562] Call Trace: [ 86.993572] ? amdgpu_irq_disable_all+0x89/0xe0 [amdgpu] [ 86.993582] amdgpu_irq_uninstall+0x17/0x20 [amdgpu] [ 86.993589] drm_irq_uninstall+0x8e/0x170 [drm] [ 86.993598] amdgpu_irq_fini+0x83/0xc0 [amdgpu] [ 86.993606] tonga_ih_sw_fini+0x12/0x30 [amdgpu] [ 86.993613] amdgpu_fini+0x2c5/0x490 [amdgpu] [ 86.993620] amdgpu_device_fini+0x53/0x160 [amdgpu] [ 86.993626] amdgpu_driver_unload_kms+0x4f/0xa0 [amdgpu] [ 86.993632] drm_dev_unregister+0x3c/0xe0 [drm] [ 86.993637] drm_put_dev+0x36/0x70 [drm] [ 86.993643] amdgpu_pci_remove+0x15/0x20 [amdgpu] [ 86.993646] pci_device_remove+0x39/0xc0 [ 86.993649] device_release_driver_internal+0x155/0x210 [ 86.993651] device_release_driver+0x12/0x20 [ 86.993653] unbind_store+0x10d/0x160 [ 86.993655] drv_attr_store+0x25/0x30 [ 86.993657] sysfs_kf_write+0x37/0x40 [ 86.993659] kernfs_fop_write+0x120/0x1a0 [ 86.993662] __vfs_write+0x37/0x160 [ 86.993665] ? apparmor_file_permission+0x1a/0x20 [ 86.993667] ? security_file_permission+0x3b/0xc0 [ 86.993669] vfs_write+0xb8/0x1b0 [ 86.993672] SyS_write+0x55/0xc0 [ 86.993674] entry_SYSCALL_64_fastpath+0x1e/0xad [ 86.993676] RIP: 0033:0x7f125fd9f6e0 [ 86.993678] RSP: 002b:00007ffe60a95358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 86.993681] RAX: ffffffffffffffda RBX: 000000000126e090 RCX: 00007f125fd9f6e0 [ 86.993682] RDX: 000000000000000d RSI: 00007ffe60a95400 RDI: 0000000000000003 [ 86.993684] RBP: 0000000000000000 R08: 000000000126e520 R09: 0000000000000000 [ 86.993686] R10: 0000000000000837 R11: 0000000000000246 R12: 0000000000000000 [ 86.993688] R13: 000000000000002d R14: 000000000126f590 R15: 000000000126e090 [ 86.993690] Code: ff 25 ff ff ff df 31 c9 be b4 30 00 00 89 c2 48 89 df = e8 86 9a fb ff 31 d2 44 89 e6 48 89 df e8 e9 96 fb ff 25 ff ff ff df eb b6 <0f= > 0b 0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89=20 [ 86.993716] RIP: gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu] RSP: ffffacb00624fb68 [ 86.993719] ---[ end trace 36bcf8facd6b3d68 ]--- [ 86.993722] note: tee[2955] exited with preempt_count 1 --=20 You are receiving this mail because: You are the assignee for the bug.= --14904974320.f550.7212 Date: Sun, 26 Mar 2017 03:03:52 +0000 MIME-Version: 1.0 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://bugs.freedesktop.org/ Auto-Submitted: auto-generated
Bug ID 100399
Summary Kernel invalid opcode on unbinding amdgpu
Product DRI
Version unspecified
Hardware x86-64 (AMD64)
OS Linux (All)
Status NEW
Severity normal
Priority medium
Component DRM/AMDgpu
Assignee dri-devel@lists.freedesktop.org
Reporter nospam@kota.moe

I'm not sure where is the best place to post this report, so l=
et me know if
there is a better place than here.

I have a RX480 GPU that I use with amdgpu on linux 4.11.0-rc3+ (compiled wi=
th
the Ubuntu 4.8.0 lowlatency config), and everything seemingly works fine un=
til
I try to unbind amdgpu from the device. This also happened with linux
4.10.0-rc3+

I've reproduced this regardless of whether the amdgpu device is the primary=
 or
secondary display device, and whether X is active or not.

Observe:
$ lspci | grep AMD
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
Ellesmere [Radeon RX 470/480] (rev c7)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aaf0
$ echo 01:00.0 | sudo tee /sys/bus/pci/devices/01:00.0/driver/unbind
Segmentation Fault

At this point, the system becomes unstable and some system calls seems to j=
ust
hang (not sure which exactly, but sudo and ps a breaks). Trying to shut down
the system also hangs.

dmesg output:
[   86.993436] ------------[ cut here ]------------
[   86.993439] kernel BUG at drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:6930!
[   86.993442] invalid opcode: 0000 [#1] PREEMPT SMP
[   86.993443] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4
xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp iptable_filter nf_nat_h323
nf_conntrack_h323 nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp
nf_conntrack_proto_gre nf_nat_tftp nf_conntrack_tftp nf_nat_sip
nf_conntrack_sip nf_nat_irc nf_conntrack_irc nf_nat_ftp nf_conntrack_ftp
iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack
libcrc32c ip_tables x_tables bnep bridge stp llc binfmt_misc dm_snapshot
dm_bufio nls_iso8859_1 intel_rapl x86_pkg_temp_thermal intel_powerclamp
coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel
aes_x86_64 crypto_simd glue_helper cryptd intel_cstate intel_rapl_perf
input_leds serio_raw joydev snd_hda_codec_realtek snd_hda_codec_generic
snd_hda_codec_hdmi
[   86.993488]  mei_me snd_hda_intel mei snd_hda_codec snd_hda_core
intel_pch_thermal snd_hwdep snd_pcm snd_timer snd soundcore hci_uart btbcm
btqca btintel bluetooth intel_lpss_acpi intel_lpss shpchp acpi_als acpi_pad
mac_hid kfifo_buf tpm_infineon industrialio kvm_intel kvm irqbypass it87
hwmon_vid parport_pc ppdev lp parport autofs4 btrfs xor raid6_pq hid_generic
usbhid mxm_wmi amdkfd amd_iommu_v2 i915 amdgpu ttm drm_kms_helper igb e1000e
syscopyarea sysfillrect dca psmouse nvme sysimgblt ptp fb_sys_fops nvme_core
firewire_ohci pps_core i2c_algo_bit drm ahci firewire_core crc_itu_t libahci
wmi video pinctrl_sunrisepoint i2c_hid pinctrl_intel hid fjes
[   86.993519] CPU: 5 PID: 2955 Comm: tee Not tainted 4.11.0-rc3+ #1
[   86.993521] Hardware name: Gigabyte Technology Co., Ltd.
Z170X-UD5/Z170X-UD5-CF, BIOS F4 10/21/2015
[   86.993523] task: ffff8ee839f4d880 task.stack: ffffacb00624c000
[   86.993539] RIP: 0010:gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu]
[   86.993541] RSP: 0018:ffffacb00624fb68 EFLAGS: 00010046
[   86.993543] RAX: 0000000000000000 RBX: ffff8ee855f6b2d8 RCX:
0000000000000000
[   86.993545] RDX: 0000000000000000 RSI: ffff8ee855f6c750 RDI:
ffff8ee855f68000
[   86.993546] RBP: ffffacb00624fba8 R08: 000000000001e640 R09:
ffffffffc039bcb9
[   86.993548] R10: fffff1f06155f200 R11: 0000000000000000 R12:
ffff8ee855f68000
[   86.993550] R13: ffff8ee855f6b548 R14: ffff8ee855f6c750 R15:
0000000000000000
[   86.993552] FS:  00007f1260269700(0000) GS:ffff8ee881d40000(0000)
knlGS:0000000000000000
[   86.993555] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   86.993556] CR2: 000055651d517908 CR3: 0000000831b78000 CR4:
00000000003406e0
[   86.993558] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[   86.993560] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[   86.993562] Call Trace:
[   86.993572]  ? amdgpu_irq_disable_all+0x89/0xe0 [amdgpu]
[   86.993582]  amdgpu_irq_uninstall+0x17/0x20 [amdgpu]
[   86.993589]  drm_irq_uninstall+0x8e/0x170 [drm]
[   86.993598]  amdgpu_irq_fini+0x83/0xc0 [amdgpu]
[   86.993606]  tonga_ih_sw_fini+0x12/0x30 [amdgpu]
[   86.993613]  amdgpu_fini+0x2c5/0x490 [amdgpu]
[   86.993620]  amdgpu_device_fini+0x53/0x160 [amdgpu]
[   86.993626]  amdgpu_driver_unload_kms+0x4f/0xa0 [amdgpu]
[   86.993632]  drm_dev_unregister+0x3c/0xe0 [drm]
[   86.993637]  drm_put_dev+0x36/0x70 [drm]
[   86.993643]  amdgpu_pci_remove+0x15/0x20 [amdgpu]
[   86.993646]  pci_device_remove+0x39/0xc0
[   86.993649]  device_release_driver_internal+0x155/0x210
[   86.993651]  device_release_driver+0x12/0x20
[   86.993653]  unbind_store+0x10d/0x160
[   86.993655]  drv_attr_store+0x25/0x30
[   86.993657]  sysfs_kf_write+0x37/0x40
[   86.993659]  kernfs_fop_write+0x120/0x1a0
[   86.993662]  __vfs_write+0x37/0x160
[   86.993665]  ? apparmor_file_permission+0x1a/0x20
[   86.993667]  ? security_file_permission+0x3b/0xc0
[   86.993669]  vfs_write+0xb8/0x1b0
[   86.993672]  SyS_write+0x55/0xc0
[   86.993674]  entry_SYSCALL_64_fastpath+0x1e/0xad
[   86.993676] RIP: 0033:0x7f125fd9f6e0
[   86.993678] RSP: 002b:00007ffe60a95358 EFLAGS: 00000246 ORIG_RAX:
0000000000000001
[   86.993681] RAX: ffffffffffffffda RBX: 000000000126e090 RCX:
00007f125fd9f6e0
[   86.993682] RDX: 000000000000000d RSI: 00007ffe60a95400 RDI:
0000000000000003
[   86.993684] RBP: 0000000000000000 R08: 000000000126e520 R09:
0000000000000000
[   86.993686] R10: 0000000000000837 R11: 0000000000000246 R12:
0000000000000000
[   86.993688] R13: 000000000000002d R14: 000000000126f590 R15:
000000000126e090
[   86.993690] Code: ff 25 ff ff ff df 31 c9 be b4 30 00 00 89 c2 48 89 df =
e8
86 9a fb ff 31 d2 44 89 e6 48 89 df e8 e9 96 fb ff 25 ff ff ff df eb b6 <=
;0f> 0b
0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89=20
[   86.993716] RIP: gfx_v8_0_kiq_set_interrupt_state+0xce/0xe0 [amdgpu] RSP:
ffffacb00624fb68
[   86.993719] ---[ end trace 36bcf8facd6b3d68 ]---
[   86.993722] note: tee[2955] exited with preempt_count 1


You are receiving this mail because:
  • You are the assignee for the bug.
= --14904974320.f550.7212-- --===============1293305471== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KZHJpLWRldmVs IG1haWxpbmcgbGlzdApkcmktZGV2ZWxAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vZHJpLWRldmVsCg== --===============1293305471==--