* [PATCH] drm/amdgpu: fix fence slab teardown
@ 2016-10-23 18:31 Grazvydas Ignotas
2016-10-24 2:34 ` zhoucm1
[not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
0 siblings, 2 replies; 6+ messages in thread
From: Grazvydas Ignotas @ 2016-10-23 18:31 UTC (permalink / raw)
To: dri-devel, amd-gfx
To free fences, call_rcu() is used, which calls amdgpu_fence_free()
after a grace period. During teardown, there is no guarantee all
callbacks have finished, so amdgpu_fence_slab may be destroyed before
all fences have been freed. If we are lucky, this results in some slab
warnings, if not, we get a crash in one of rcu threads because callback
is called after amdgpu has already been unloaded.
Fix it with a rcu_barrier().
Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 3a2e42f..77b34ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void)
void amdgpu_fence_slab_fini(void)
{
+ rcu_barrier();
kmem_cache_destroy(amdgpu_fence_slab);
}
/*
--
2.7.4
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel
^ permalink raw reply related [flat|nested] 6+ messages in thread* Re: [PATCH] drm/amdgpu: fix fence slab teardown 2016-10-23 18:31 [PATCH] drm/amdgpu: fix fence slab teardown Grazvydas Ignotas @ 2016-10-24 2:34 ` zhoucm1 [not found] ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org> [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> 1 sibling, 1 reply; 6+ messages in thread From: zhoucm1 @ 2016-10-24 2:34 UTC (permalink / raw) To: Grazvydas Ignotas, dri-devel, amd-gfx Acked-by: Chunming Zhou <david1.zhou@amd.com> On 2016年10月24日 02:31, Grazvydas Ignotas wrote: > To free fences, call_rcu() is used, which calls amdgpu_fence_free() > after a grace period. During teardown, there is no guarantee all > callbacks have finished, so amdgpu_fence_slab may be destroyed before > all fences have been freed. If we are lucky, this results in some slab > warnings, if not, we get a crash in one of rcu threads because callback > is called after amdgpu has already been unloaded. > > Fix it with a rcu_barrier(). > > Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release") > Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 3a2e42f..77b34ec 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void) > > void amdgpu_fence_slab_fini(void) > { > + rcu_barrier(); > kmem_cache_destroy(amdgpu_fence_slab); > } > /* _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>]
* 答复: [PATCH] drm/amdgpu: fix fence slab teardown [not found] ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org> @ 2016-10-24 3:35 ` Qu, Jim 2016-10-24 9:32 ` Grazvydas Ignotas 2016-10-24 9:05 ` Christian König 1 sibling, 1 reply; 6+ messages in thread From: Qu, Jim @ 2016-10-24 3:35 UTC (permalink / raw) To: Zhou, David(ChunMing), Grazvydas Ignotas, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org I did observed the issue when replace kernel module use DKMS, and it maybe get error at reboot, got calltrace: [ 3529.525360] ============================================================================= [ 3529.525361] BUG amd_sched_fence (Tainted: G B OE ------------ ): Objects remaining in amd_sched_fence on kmem_cache_close() [ 3529.525361] ----------------------------------------------------------------------------- [ 3529.525361] [ 3529.525361] INFO: Slab 0xffffea000094b200 objects=25 used=2 fp=0xffff8800252c9180 flags=0x1fffff00004080 [ 3529.525362] CPU: 0 PID: 18523 Comm: reboot Tainted: G B OE ------------ 3.10.0-512.el7.x86_64 #1 [ 3529.525362] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1802 01/28/2014 [ 3529.525363] ffffea000094b200 00000000b3b19dcf ffff880160827b50 ffffffff81685e8c [ 3529.525363] ffff880160827c28 ffffffff811d9e34 ffff880000000020 ffff880160827c38 [ 3529.525364] ffff880160827be8 656a624f818de5f0 616d657220737463 6e6920676e696e69 [ 3529.525364] Call Trace: [ 3529.525365] [<ffffffff81685e8c>] dump_stack+0x19/0x1b [ 3529.525366] [<ffffffff811d9e34>] slab_err+0xb4/0xe0 [ 3529.525367] [<ffffffff81088c29>] ? vprintk_default+0x29/0x40 [ 3529.525368] [<ffffffff8167f434>] ? printk+0x5e/0x75 [ 3529.525369] [<ffffffff811dd133>] ? __kmalloc+0x1f3/0x240 [ 3529.525370] [<ffffffff811df80b>] ? kmem_cache_close+0x12b/0x2f0 [ 3529.525370] [<ffffffff811df82c>] kmem_cache_close+0x14c/0x2f0 [ 3529.525371] [<ffffffff811df9e4>] __kmem_cache_shutdown+0x14/0x80 [ 3529.525372] [<ffffffff811a5704>] kmem_cache_destroy+0x44/0xf0 [ 3529.525387] [<ffffffffa02bfb0c>] amd_sched_fini+0x3c/0x40 [amdgpu] [ 3529.525395] [<ffffffffa0231bfa>] amdgpu_fence_driver_fini+0x7a/0x110 [amdgpu] [ 3529.525403] [<ffffffffa02230dd>] amdgpu_device_fini+0x3d/0x1f0 [amdgpu] [ 3529.525411] [<ffffffffa0225673>] amdgpu_driver_unload_kms+0x43/0x80 [amdgpu] [ 3529.525416] [<ffffffffa005fb89>] drm_dev_unregister+0x29/0xb0 [drm] [ 3529.525422] [<ffffffffa0060273>] drm_put_dev+0x23/0x70 [drm] [ 3529.525429] [<ffffffffa021f3fd>] amdgpu_pci_shutdown+0x1d/0x20 [amdgpu] [ 3529.525430] [<ffffffff81359b56>] pci_device_shutdown+0x36/0x70 [ 3529.525431] [<ffffffff8142a388>] device_shutdown+0xc8/0x180 [ 3529.525432] [<ffffffff810a1536>] kernel_restart_prepare+0x36/0x40 [ 3529.525433] [<ffffffff810a1552>] kernel_restart+0x12/0x60 [ 3529.525433] [<ffffffff810a17c9>] SYSC_reboot+0x229/0x260 [ 3529.525435] [<ffffffff81691971>] ? __do_page_fault+0x171/0x450 [ 3529.525436] [<ffffffff810a186e>] SyS_reboot+0xe/0x10 [ 3529.525437] [<ffffffff81696489>] system_call_fastpath+0x16/0x1b [ 3529.525438] INFO: Object 0xffff8800252c8a00 @offset=2560 [ 3529.525438] INFO: Object 0xffff8800252c9540 @offset=5440 Do these series patches fix this issue? Thanks JimQu ________________________________________ 发件人: amd-gfx <amd-gfx-bounces@lists.freedesktop.org> 代表 zhoucm1 <david1.zhou@amd.com> 发送时间: 2016年10月24日 10:34 收件人: Grazvydas Ignotas; dri-devel@lists.freedesktop.org; amd-gfx@lists.freedesktop.org 主题: Re: [PATCH] drm/amdgpu: fix fence slab teardown Acked-by: Chunming Zhou <david1.zhou@amd.com> On 2016年10月24日 02:31, Grazvydas Ignotas wrote: > To free fences, call_rcu() is used, which calls amdgpu_fence_free() > after a grace period. During teardown, there is no guarantee all > callbacks have finished, so amdgpu_fence_slab may be destroyed before > all fences have been freed. If we are lucky, this results in some slab > warnings, if not, we get a crash in one of rcu threads because callback > is called after amdgpu has already been unloaded. > > Fix it with a rcu_barrier(). > > Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release") > Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 3a2e42f..77b34ec 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void) > > void amdgpu_fence_slab_fini(void) > { > + rcu_barrier(); > kmem_cache_destroy(amdgpu_fence_slab); > } > /* _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 答复: [PATCH] drm/amdgpu: fix fence slab teardown 2016-10-24 3:35 ` 答复: " Qu, Jim @ 2016-10-24 9:32 ` Grazvydas Ignotas 0 siblings, 0 replies; 6+ messages in thread From: Grazvydas Ignotas @ 2016-10-24 9:32 UTC (permalink / raw) To: Qu, Jim; +Cc: amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org On Mon, Oct 24, 2016 at 6:35 AM, Qu, Jim <Jim.Qu@amd.com> wrote: > I did observed the issue when replace kernel module use DKMS, and it maybe get error at reboot, got calltrace: > > [ 3529.525360] ============================================================================= > [ 3529.525361] BUG amd_sched_fence (Tainted: G B OE ------------ ): Objects remaining in amd_sched_fence on kmem_cache_close() > [ 3529.525361] ----------------------------------------------------------------------------- > [ 3529.525361] > [ 3529.525361] INFO: Slab 0xffffea000094b200 objects=25 used=2 fp=0xffff8800252c9180 flags=0x1fffff00004080 > [ 3529.525362] CPU: 0 PID: 18523 Comm: reboot Tainted: G B OE ------------ 3.10.0-512.el7.x86_64 #1 > [ 3529.525362] Hardware name: ASUS All Series/Z87-PLUS, BIOS 1802 01/28/2014 > [ 3529.525363] ffffea000094b200 00000000b3b19dcf ffff880160827b50 ffffffff81685e8c > [ 3529.525363] ffff880160827c28 ffffffff811d9e34 ffff880000000020 ffff880160827c38 > [ 3529.525364] ffff880160827be8 656a624f818de5f0 616d657220737463 6e6920676e696e69 > [ 3529.525364] Call Trace: > [ 3529.525365] [<ffffffff81685e8c>] dump_stack+0x19/0x1b > [ 3529.525366] [<ffffffff811d9e34>] slab_err+0xb4/0xe0 > [ 3529.525367] [<ffffffff81088c29>] ? vprintk_default+0x29/0x40 > [ 3529.525368] [<ffffffff8167f434>] ? printk+0x5e/0x75 > [ 3529.525369] [<ffffffff811dd133>] ? __kmalloc+0x1f3/0x240 > [ 3529.525370] [<ffffffff811df80b>] ? kmem_cache_close+0x12b/0x2f0 > [ 3529.525370] [<ffffffff811df82c>] kmem_cache_close+0x14c/0x2f0 > [ 3529.525371] [<ffffffff811df9e4>] __kmem_cache_shutdown+0x14/0x80 > [ 3529.525372] [<ffffffff811a5704>] kmem_cache_destroy+0x44/0xf0 > [ 3529.525387] [<ffffffffa02bfb0c>] amd_sched_fini+0x3c/0x40 [amdgpu] > [ 3529.525395] [<ffffffffa0231bfa>] amdgpu_fence_driver_fini+0x7a/0x110 [amdgpu] > [ 3529.525403] [<ffffffffa02230dd>] amdgpu_device_fini+0x3d/0x1f0 [amdgpu] > [ 3529.525411] [<ffffffffa0225673>] amdgpu_driver_unload_kms+0x43/0x80 [amdgpu] > [ 3529.525416] [<ffffffffa005fb89>] drm_dev_unregister+0x29/0xb0 [drm] > [ 3529.525422] [<ffffffffa0060273>] drm_put_dev+0x23/0x70 [drm] > [ 3529.525429] [<ffffffffa021f3fd>] amdgpu_pci_shutdown+0x1d/0x20 [amdgpu] > [ 3529.525430] [<ffffffff81359b56>] pci_device_shutdown+0x36/0x70 > [ 3529.525431] [<ffffffff8142a388>] device_shutdown+0xc8/0x180 > [ 3529.525432] [<ffffffff810a1536>] kernel_restart_prepare+0x36/0x40 > [ 3529.525433] [<ffffffff810a1552>] kernel_restart+0x12/0x60 > [ 3529.525433] [<ffffffff810a17c9>] SYSC_reboot+0x229/0x260 > [ 3529.525435] [<ffffffff81691971>] ? __do_page_fault+0x171/0x450 > [ 3529.525436] [<ffffffff810a186e>] SyS_reboot+0xe/0x10 > [ 3529.525437] [<ffffffff81696489>] system_call_fastpath+0x16/0x1b > [ 3529.525438] INFO: Object 0xffff8800252c8a00 @offset=2560 > [ 3529.525438] INFO: Object 0xffff8800252c9540 @offset=5440 > > > Do these series patches fix this issue? Yes, but only partially - there are still some leaked objects left. When SLUB_DEBUG is set, you can also set CONFIG_SLUB_DEBUG_ON or add "slub_debug" to kernel command line to see the leak backtraces. Gražvydas _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] drm/amdgpu: fix fence slab teardown [not found] ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org> 2016-10-24 3:35 ` 答复: " Qu, Jim @ 2016-10-24 9:05 ` Christian König 1 sibling, 0 replies; 6+ messages in thread From: Christian König @ 2016-10-24 9:05 UTC (permalink / raw) To: zhoucm1, Grazvydas Ignotas, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW Interesting catch, patch is Reviewed-by: Christian König <christian.koenig@amd.com>. Am 24.10.2016 um 04:34 schrieb zhoucm1: > Acked-by: Chunming Zhou <david1.zhou@amd.com> > > On 2016年10月24日 02:31, Grazvydas Ignotas wrote: >> To free fences, call_rcu() is used, which calls amdgpu_fence_free() >> after a grace period. During teardown, there is no guarantee all >> callbacks have finished, so amdgpu_fence_slab may be destroyed before >> all fences have been freed. If we are lucky, this results in some slab >> warnings, if not, we get a crash in one of rcu threads because callback >> is called after amdgpu has already been unloaded. >> >> Fix it with a rcu_barrier(). >> >> Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release") >> Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> index 3a2e42f..77b34ec 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void) >> void amdgpu_fence_slab_fini(void) >> { >> + rcu_barrier(); >> kmem_cache_destroy(amdgpu_fence_slab); >> } >> /* > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
[parent not found: <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>]
* Re: [PATCH] drm/amdgpu: fix fence slab teardown [not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> @ 2016-10-24 16:31 ` Alex Deucher 0 siblings, 0 replies; 6+ messages in thread From: Alex Deucher @ 2016-10-24 16:31 UTC (permalink / raw) To: Grazvydas Ignotas; +Cc: amd-gfx list, Maling list - DRI developers On Sun, Oct 23, 2016 at 2:31 PM, Grazvydas Ignotas <notasas@gmail.com> wrote: > To free fences, call_rcu() is used, which calls amdgpu_fence_free() > after a grace period. During teardown, there is no guarantee all > callbacks have finished, so amdgpu_fence_slab may be destroyed before > all fences have been freed. If we are lucky, this results in some slab > warnings, if not, we get a crash in one of rcu threads because callback > is called after amdgpu has already been unloaded. > > Fix it with a rcu_barrier(). > > Fixes: b44135351a3a ("drm/amdgpu: RCU protected amdgpu_fence_release") > Signed-off-by: Grazvydas Ignotas <notasas@gmail.com> Applied. Thanks! Alex > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 3a2e42f..77b34ec 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -68,6 +68,7 @@ int amdgpu_fence_slab_init(void) > > void amdgpu_fence_slab_fini(void) > { > + rcu_barrier(); > kmem_cache_destroy(amdgpu_fence_slab); > } > /* > -- > 2.7.4 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-10-24 16:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-10-23 18:31 [PATCH] drm/amdgpu: fix fence slab teardown Grazvydas Ignotas
2016-10-24 2:34 ` zhoucm1
[not found] ` <580D731B.5050304-5C7GfCeVMHo@public.gmane.org>
2016-10-24 3:35 ` 答复: " Qu, Jim
2016-10-24 9:32 ` Grazvydas Ignotas
2016-10-24 9:05 ` Christian König
[not found] ` <1477247507-11378-1-git-send-email-notasas-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2016-10-24 16:31 ` Alex Deucher
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.