* [Bug 97500] Cannot unbind GPU from AMDGPU
@ 2016-08-26 15:48 bugzilla-daemon
2016-09-01 0:17 ` bugzilla-daemon
` (9 more replies)
0 siblings, 10 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-08-26 15:48 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1229 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Bug ID: 97500
Summary: Cannot unbind GPU from AMDGPU
Product: DRI
Version: XOrg git
Hardware: x86-64 (AMD64)
OS: Linux (All)
Status: NEW
Severity: normal
Priority: medium
Component: DRM/AMDgpu
Assignee: dri-devel@lists.freedesktop.org
Reporter: commendsarnex@gmail.com
Created attachment 126056
--> https://bugs.freedesktop.org/attachment.cgi?id=126056&action=edit
dmesg
Hi guys. With AMDGPU, if I try to unbind my GPU from amdgpu, I get a hard
kernel lockup. My GPU is the RX 480. I'm currently on 4.7.2, but I've tried
drm-next-4.9 also. I've tried both echo $CARD_PCI_LOCATION > unbind, and
unbinding the vtcon and then modprobe -r amdgpu. When I had my HD 7950, the
echo $CARD_PCI_LOCATION > unbind method worked every time. I'm not sure how to
get any debug info, since even an ssh session locks up. I tried using pstore,
but no logs were saved.
Please let me know if you have any ideas or need any more info.
Thanks,
Sarnex
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2614 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
@ 2016-09-01 0:17 ` bugzilla-daemon
2016-09-01 0:47 ` bugzilla-daemon
` (8 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-01 0:17 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 1111 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Grazvydas Ignotas <notasas@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |alexdeucher@gmail.com,
| |deathsimple@vodafone.de,
| |michel@daenzer.net,
| |notasas@gmail.com
--- Comment #1 from Grazvydas Ignotas <notasas@gmail.com> ---
Same problem here with RX 470. The happens both on rmmod'ing amdgpu or
attempting to unbind, even when using the drm-next-4.9-wip branch.
It would be great if this worked to be able to switch to GPU passthrough
without a reboot. The windows driver seems to be already behaving well, I can
start/shutdown the vm multiple times and then hand over the card to amdgpu
without problems, only taking it away from amdgpu locks up the machine.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2244 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
2016-09-01 0:17 ` bugzilla-daemon
@ 2016-09-01 0:47 ` bugzilla-daemon
2016-09-13 1:03 ` bugzilla-daemon
` (7 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-01 0:47 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 469 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Michel Dänzer <michel@daenzer.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC|michel@daenzer.net |
--- Comment #2 from Michel Dänzer <michel@daenzer.net> ---
I get bug updates via the mailing list.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1754 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
2016-09-01 0:17 ` bugzilla-daemon
2016-09-01 0:47 ` bugzilla-daemon
@ 2016-09-13 1:03 ` bugzilla-daemon
2016-09-24 21:23 ` bugzilla-daemon
` (6 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-13 1:03 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 559 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
--- Comment #3 from Micael Bergeron <micaelbergeron@gmail.com> ---
I also have this behavior on Linux 4.7.2
I tried either to unbind on /sys/bus/pci/drivers/amdgpu/unbind or remove the
device and trigger a rescan using /sys/bus/pci/devices/.../remove,
/sys/bus/pci/rescan.
I have kernel panics and/or system hangs either way.
It would be awesome to be able to yield the GPU to a VM then claim it back when
finished.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1308 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (2 preceding siblings ...)
2016-09-13 1:03 ` bugzilla-daemon
@ 2016-09-24 21:23 ` bugzilla-daemon
2016-09-24 21:25 ` bugzilla-daemon
` (5 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-24 21:23 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 656 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
--- Comment #4 from Grazvydas Ignotas <notasas@gmail.com> ---
Created attachment 126768
--> https://bugs.freedesktop.org/attachment.cgi?id=126768&action=edit
modprobe then rmmod
I've been trying today's drm-next-4.9-wip (merged with 4.8.0-rc7) and the
situation has improved somewhat, doing rmmod just after modprobe succeeds with
a WARN from TTM, but attempts to modprobe it again are failing. If X session is
started/stopped before rmmod, the consequences are more severe, looks like some
sort of corruption.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1530 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (3 preceding siblings ...)
2016-09-24 21:23 ` bugzilla-daemon
@ 2016-09-24 21:25 ` bugzilla-daemon
2016-09-25 21:06 ` bugzilla-daemon
` (4 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-24 21:25 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 367 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
--- Comment #5 from Grazvydas Ignotas <notasas@gmail.com> ---
Created attachment 126769
--> https://bugs.freedesktop.org/attachment.cgi?id=126769&action=edit
modprobe then X then rmmod
dmasg if X session was used before rmmod
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1255 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (4 preceding siblings ...)
2016-09-24 21:25 ` bugzilla-daemon
@ 2016-09-25 21:06 ` bugzilla-daemon
2016-10-04 15:42 ` bugzilla-daemon
` (3 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-09-25 21:06 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 2301 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
--- Comment #6 from Grazvydas Ignotas <notasas@gmail.com> ---
Created attachment 126782
--> https://bugs.freedesktop.org/attachment.cgi?id=126782&action=edit
dmesg of powerplay crash
I've sent some patches with fixes, but there seem to be multiple other issues.
One of the problems is that struct amdgpu_i2c_chan contains struct drm_dp_aux,
and on amdgpu_i2c_fini() call, which frees amdgpu_i2c_chan, drm_dp_aux is still
in use. This causes memory corruption. Don't know how to solve this, perhaps
somebody knows this code better?
A hack can be used to trade this corruption for a leak:
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
index 34bab61..8beaee0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_i2c.c
@@ -221,6 +221,8 @@ void amdgpu_i2c_destroy(struct amdgpu_i2c_chan *i2c)
if (!i2c)
return;
i2c_del_adapter(&i2c->adapter);
+ if (i2c->has_aux)
+ return;
kfree(i2c);
}
---
Another one is TTM leak, can also be seen in this attachment.
CONFIG_DMA_API_DEBUG reports:
WARNING: CPU: 3 PID: 1666 at lib/dma-debug.c:976
dma_debug_device_change+0x1ca/0x240
pci 0000:01:00.0: DMA-API: device driver has pending DMA allocations while
released from device [count=202]
One of leaked entries details: [device address=0x00000003dcfe9000] [size=4096
bytes] [mapped with DMA_BIDIRECTIONAL] [mapped as coherent]
Mapped at:
[<ffffffff8163d941>] debug_dma_alloc_coherent+0x41/0x110
[<ffffffffa0728d84>] ttm_dma_populate+0xb64/0x1150 [ttm]
[<ffffffffa0b770ac>] amdgpu_ttm_tt_populate+0x35c/0x510 [amdgpu]
[<ffffffffa0719141>] ttm_tt_bind+0x71/0xd0 [ttm]
[<ffffffffa071c9d8>] ttm_bo_handle_move_mem+0xa08/0xaa0 [ttm]
---
Next one is powerplay crash in
drivers/gpu/drm/amd/powerplay/hwmgr/smu7_hwmgr.c:3336 ,
dpm_table->sclk_table.count is 0 so array access ends up badly. Could be
related to "DPM is already running right now, no need to enable DPM!" message,
full dmesg attached.
I won't have time to work on this for a while, but maybe somebody else does.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 3254 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (5 preceding siblings ...)
2016-09-25 21:06 ` bugzilla-daemon
@ 2016-10-04 15:42 ` bugzilla-daemon
2016-10-29 15:06 ` bugzilla-daemon
` (2 subsequent siblings)
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-10-04 15:42 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 2046 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Andreas Grosse <m-bugs-freedesktop@andig.net> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |m-bugs-freedesktop@andig.ne
| |t
--- Comment #7 from Andreas Grosse <m-bugs-freedesktop@andig.net> ---
Created attachment 126997
--> https://bugs.freedesktop.org/attachment.cgi?id=126997&action=edit
dmesg: boot then unbind
I am getting a kernel panic with Linux 4.8.0 when I unbind my RX480 (XFX Radeon
RX 480 GTR Black 8GB, if that helps) from amdgpu. The system freezes
immediately and only pushes this to the serial console (which is why it is not
in the attached dmesg):
[ 80.266963] {1}[Hardware Error]: event severity: fatal
[ 80.266964] {1}[Hardware Error]: Error 0, type: fatal
[ 80.266964] {1}[Hardware Error]: section_type: PCIe error
[ 80.266964] {1}[Hardware Error]: port_type: 4, root port
[ 80.266965] {1}[Hardware Error]: version: 1.16
[ 80.266965] {1}[Hardware Error]: command: 0x4010, status: 0x0547
[ 80.266965] {1}[Hardware Error]: device_id: 0000:00:01.0
[ 80.266966] {1}[Hardware Error]: slot: 0
[ 80.266966] {1}[Hardware Error]: secondary_bus: 0x01
[ 80.266966] {1}[Hardware Error]: vendor_id: 0x8086, device_id: 0x0c01
[ 80.266967] {1}[Hardware Error]: class_code: 000406
[ 80.266967] {1}[Hardware Error]: bridge: secondary_status: 0x2000,
control: 0x0003
[ 80.266967] Kernel panic - not syncing: Fatal hardware error!
[ 80.705884] Kernel Offset: disabled
Is this the same issue?
I have attached the kernel output from boot until I got the panic after
unbinding with this command:
echo 0000:01:00.0 > /sys/bus/pci/drivers/amdgpu/unbind
Is there any further information that I can provide to address this issue?
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 3423 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (6 preceding siblings ...)
2016-10-04 15:42 ` bugzilla-daemon
@ 2016-10-29 15:06 ` bugzilla-daemon
2016-10-29 16:37 ` bugzilla-daemon
2016-10-31 7:44 ` bugzilla-daemon
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-10-29 15:06 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 712 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Grazvydas Ignotas <notasas@gmail.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |commendsarnex@gmail.com,
| |micaelbergeron@gmail.com
--- Comment #8 from Grazvydas Ignotas <notasas@gmail.com> ---
Finally it's behaving properly for me when using today's drm-next-4.10-wip
branch with this patch on top:
https://lists.freedesktop.org/archives/amd-gfx/2016-October/003141.html
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2023 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (7 preceding siblings ...)
2016-10-29 15:06 ` bugzilla-daemon
@ 2016-10-29 16:37 ` bugzilla-daemon
2016-10-31 7:44 ` bugzilla-daemon
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-10-29 16:37 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 653 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
--- Comment #9 from Nick Sarnie <commendsarnex@gmail.com> ---
(In reply to Grazvydas Ignotas from comment #8)
> Finally it's behaving properly for me when using today's drm-next-4.10-wip
> branch with this patch on top:
> https://lists.freedesktop.org/archives/amd-gfx/2016-October/003141.html
I can confirm that GPU unbinding works as expected with this setup. I'm getting
constant GPU hangs and weird behavior when using my Intel GPU, but I can't
imagine it's related to these changes.
Great work,
Sarnex
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 1559 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
* [Bug 97500] Cannot unbind GPU from AMDGPU
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
` (8 preceding siblings ...)
2016-10-29 16:37 ` bugzilla-daemon
@ 2016-10-31 7:44 ` bugzilla-daemon
9 siblings, 0 replies; 11+ messages in thread
From: bugzilla-daemon @ 2016-10-31 7:44 UTC (permalink / raw)
To: dri-devel
[-- Attachment #1.1: Type: text/plain, Size: 743 bytes --]
https://bugs.freedesktop.org/show_bug.cgi?id=97500
Christian König <deathsimple@vodafone.de> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #10 from Christian König <deathsimple@vodafone.de> ---
I hoped that this fix might help with this bug as well, but I couldn't find the
bug report again of hand.
Good to see that fixed as well. Please close the bug report as soon as you can
confirm that it works on an upstream kernel.
--
You are receiving this mail because:
You are the assignee for the bug.
[-- Attachment #1.2: Type: text/html, Size: 2196 bytes --]
[-- Attachment #2: Type: text/plain, Size: 160 bytes --]
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2016-10-31 7:44 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-26 15:48 [Bug 97500] Cannot unbind GPU from AMDGPU bugzilla-daemon
2016-09-01 0:17 ` bugzilla-daemon
2016-09-01 0:47 ` bugzilla-daemon
2016-09-13 1:03 ` bugzilla-daemon
2016-09-24 21:23 ` bugzilla-daemon
2016-09-24 21:25 ` bugzilla-daemon
2016-09-25 21:06 ` bugzilla-daemon
2016-10-04 15:42 ` bugzilla-daemon
2016-10-29 15:06 ` bugzilla-daemon
2016-10-29 16:37 ` bugzilla-daemon
2016-10-31 7:44 ` bugzilla-daemon
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).