public inbox for intel-gfx@lists.freedesktop.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Nicolas Frattaroli <frattaroli.nicolas@gmail.com>,
	linaro-mm-sig@lists.linaro.org, dri-devel@lists.freedesktop.org,
	linux-media@vger.kernel.org, intel-gfx@lists.freedesktop.org,
	lima@lists.freedesktop.org
Cc: daniel@ffwll.ch, tvrtko.ursulin@linux.intel.com
Subject: Re: [Intel-gfx] [PATCH 16/28] drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2
Date: Sun, 17 Oct 2021 17:26:42 +0200	[thread overview]
Message-ID: <da9e02d6-5000-daa9-6ba1-a77c8e589ca1@gmail.com> (raw)
In-Reply-To: <2023306.UmlnhvANQh@archbook>

Thanks for the notice. Going to take a deeper look into this tomorrow.

Basically looks like we messed up the fence ref count somehow.

Thanks,
Christian.

Am 17.10.21 um 16:40 schrieb Nicolas Frattaroli:
> On Dienstag, 5. Oktober 2021 13:37:30 CEST Christian König wrote:
>> Simplifying the code a bit.
>>
>> v2: use dma_resv_for_each_fence
>>
>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 26 ++++++--------------------
>>   1 file changed, 6 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c
>> b/drivers/gpu/drm/scheduler/sched_main.c index 042c16b5d54a..5bc5f775abe1
>> 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -699,30 +699,16 @@ int drm_sched_job_add_implicit_dependencies(struct
>> drm_sched_job *job, struct drm_gem_object *obj,
>>   					    bool write)
>>   {
>> +	struct dma_resv_iter cursor;
>> +	struct dma_fence *fence;
>>   	int ret;
>> -	struct dma_fence **fences;
>> -	unsigned int i, fence_count;
>> -
>> -	if (!write) {
>> -		struct dma_fence *fence = dma_resv_get_excl_unlocked(obj-
>> resv);
>> -
>> -		return drm_sched_job_add_dependency(job, fence);
>> -	}
>> -
>> -	ret = dma_resv_get_fences(obj->resv, NULL, &fence_count, &fences);
>> -	if (ret || !fence_count)
>> -		return ret;
>>
>> -	for (i = 0; i < fence_count; i++) {
>> -		ret = drm_sched_job_add_dependency(job, fences[i]);
>> +	dma_resv_for_each_fence(&cursor, obj->resv, write, fence) {
>> +		ret = drm_sched_job_add_dependency(job, fence);
>>   		if (ret)
>> -			break;
>> +			return ret;
>>   	}
>> -
>> -	for (; i < fence_count; i++)
>> -		dma_fence_put(fences[i]);
>> -	kfree(fences);
>> -	return ret;
>> +	return 0;
>>   }
>>   EXPORT_SYMBOL(drm_sched_job_add_implicit_dependencies);
> Hi Christian,
>
> unfortunately, this breaks lima on the rk3328 quite badly. Running glmark2-
> es2-drm just locks up the device with the following traces:
>
> [   39.624100] ------------[ cut here ]------------
> [   39.624555] refcount_t: addition on 0; use-after-free.
> [   39.625058] WARNING: CPU: 0 PID: 123 at lib/refcount.c:25
> refcount_warn_saturate+0xa4/0x150
> [   39.625825] Modules linked in: 8021q garp stp mrp llc crct10dif_ce
> hantro_vpu(C) fuse ip_tables x_tables ipv6
> [   39.626753] CPU: 0 PID: 123 Comm: pp Tainted: G         C        5.15.0-
> rc1fratti-00251-g9c2ba265352a #158
> [   39.627614] Hardware name: Pine64 Rock64 (DT)
> [   39.628004] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   39.628628] pc : refcount_warn_saturate+0xa4/0x150
> [   39.629062] lr : refcount_warn_saturate+0xa4/0x150
> [   39.629495] sp : ffffffc0124d3d90
> [   39.629794] x29: ffffffc0124d3d90 x28: 0000000000000000 x27:
> 0000000000000000
> [   39.630441] x26: 0000000000000000 x25: ffffffc0117fe000 x24:
> ffffff8001ad73f8
> [   39.631087] x23: ffffffc0107fc3e0 x22: ffffffc0117fe000 x21:
> ffffff8010660000
> [   39.631731] x20: ffffff8001ad73c0 x19: ffffff807db094c8 x18:
> ffffffffffffffff
> [   39.632377] x17: 0000000000000001 x16: 0000000000000001 x15:
> 0765076507720766
> [   39.633022] x14: 072d077207650774 x13: 0765076507720766 x12:
> 072d077207650774
> [   39.633668] x11: 0720072007200720 x10: ffffffc011c4b1b0 x9 :
> ffffffc01010ac54
> [   39.634314] x8 : 00000000ffffdfff x7 : ffffffc011cfb1b0 x6 :
> 0000000000000001
> [   39.634960] x5 : ffffff807fb4d980 x4 : 0000000000000000 x3 :
> 0000000000000027
> [   39.635605] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffffff8000e1f000
> [   39.636250] Call trace:
> [   39.636475]  refcount_warn_saturate+0xa4/0x150
> [   39.636879]  drm_sched_entity_pop_job+0x414/0x4a0
> [   39.637307]  drm_sched_main+0xe4/0x450
> [   39.637651]  kthread+0x12c/0x140
> [   39.637949]  ret_from_fork+0x10/0x20
> [   39.638279] ---[ end trace 47528e09b2512330 ]---
> [   39.638783] ------------[ cut here ]------------
> [   39.639214] refcount_t: underflow; use-after-free.
> [   39.639687] WARNING: CPU: 0 PID: 123 at lib/refcount.c:28
> refcount_warn_saturate+0xf8/0x150
> [   39.640447] Modules linked in: 8021q garp stp mrp llc crct10dif_ce
> hantro_vpu(C) fuse ip_tables x_tables ipv6
> [   39.641373] CPU: 0 PID: 123 Comm: pp Tainted: G        WC        5.15.0-
> rc1fratti-00251-g9c2ba265352a #158
> [   39.642237] Hardware name: Pine64 Rock64 (DT)
> [   39.642632] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   39.643257] pc : refcount_warn_saturate+0xf8/0x150
> [   39.643693] lr : refcount_warn_saturate+0xf8/0x150
> [   39.644128] sp : ffffffc0124d3d90
> [   39.644430] x29: ffffffc0124d3d90 x28: 0000000000000000 x27:
> 0000000000000000
> [   39.645077] x26: 0000000000000000 x25: ffffffc0117fe000 x24:
> ffffff8001ad73f8
> [   39.645724] x23: ffffffc0107fc3e0 x22: ffffffc0117fe000 x21:
> ffffff8010660000
> [   39.646372] x20: ffffff8001ad73c0 x19: ffffff807db094c8 x18:
> ffffffffffffffff
> [   39.647020] x17: 0000000000000001 x16: 0000000000000001 x15:
> 072007200720072e
> [   39.647666] x14: 0765076507720766 x13: 072007200720072e x12:
> 0765076507720766
> [   39.648312] x11: 0720072007200720 x10: ffffffc011c4b1b0 x9 :
> ffffffc01010ac54
> [   39.648957] x8 : 00000000ffffdfff x7 : ffffffc011cfb1b0 x6 :
> 0000000000000001
> [   39.649602] x5 : ffffff807fb4d980 x4 : 0000000000000000 x3 :
> 0000000000000027
> [   39.650247] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffffff8000e1f000
> [   39.650894] Call trace:
> [   39.651119]  refcount_warn_saturate+0xf8/0x150
> [   39.651526]  drm_sched_entity_pop_job+0x420/0x4a0
> [   39.651953]  drm_sched_main+0xe4/0x450
> [   39.652296]  kthread+0x12c/0x140
> [   39.652595]  ret_from_fork+0x10/0x20
> [   39.652924] ---[ end trace 47528e09b2512331 ]---
> [   39.717053] ------------[ cut here ]------------
> [   39.717543] refcount_t: saturated; leaking memory.
> [   39.718030] WARNING: CPU: 1 PID: 375 at lib/refcount.c:22
> refcount_warn_saturate+0x78/0x150
> [   39.718800] Modules linked in: 8021q garp stp mrp llc crct10dif_ce
> hantro_vpu(C) fuse ip_tables x_tables ipv6
> [   39.719744] CPU: 1 PID: 375 Comm: glmark2-es2-drm Tainted: G        WC
> 5.15.0-rc1fratti-00251-g9c2ba265352a #158
> [   39.720712] Hardware name: Pine64 Rock64 (DT)
> [   39.721109] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   39.721739] pc : refcount_warn_saturate+0x78/0x150
> [   39.722178] lr : refcount_warn_saturate+0x78/0x150
> [   39.722617] sp : ffffffc012913a90
> [   39.722921] x29: ffffffc012913a90 x28: ffffff8010630000 x27:
> ffffff8005219e00
> [   39.723576] x26: ffffff80103da500 x25: 0000000000000000 x24:
> ffffff8000cb24c0
> [   39.724230] x23: ffffff800ac045b0 x22: ffffff8005212100 x21:
> 0000000000000000
> [   39.724884] x20: ffffff8000cb24c0 x19: 0000000000000000 x18:
> ffffffffffffffff
> [   39.725538] x17: 0000000000000000 x16: 0000000000000000 x15:
> 072007200720072e
> [   39.726192] x14: 07790772076f076d x13: 072007200720072e x12:
> 07790772076f076d
> [   39.726846] x11: 0720072007200720 x10: ffffffc011c4b1b0 x9 :
> ffffffc01010ac54
> [   39.727501] x8 : 00000000ffffdfff x7 : ffffffc011cfb1b0 x6 :
> 0000000000000001
> [   39.728155] x5 : ffffff807fb68980 x4 : 0000000000000000 x3 :
> 0000000000000027
> [   39.728808] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
> ffffff8004d7b800
> [   39.729464] Call trace:
> [   39.729691]  refcount_warn_saturate+0x78/0x150
> [   39.730101]  dma_resv_add_shared_fence+0x1ac/0x1cc
> [   39.730543]  lima_gem_submit+0x300/0x580
> [   39.730909]  lima_ioctl_gem_submit+0x284/0x340
> [   39.731318]  drm_ioctl_kernel+0xd0/0x180
> [   39.731685]  drm_ioctl+0x220/0x450
> [   39.732005]  __arm64_sys_ioctl+0x568/0xe9c
> [   39.732386]  invoke_syscall.constprop.0+0x58/0xf0
> [   39.732824]  do_el0_svc+0x138/0x170
> [   39.733152]  el0_svc+0x28/0xc0
> [   39.733441]  el0t_64_sync_handler+0xa8/0x130
> [   39.733837]  el0t_64_sync+0x1a0/0x1a4
> [   39.734178] ---[ end trace 47528e09b2512332 ]---
> [   39.734926] Unable to handle kernel write to read-only memory at virtual
> address ffffffc0107fbc70
> [   39.735763] Mem abort info:
> [   39.736029]   ESR = 0x9600004e
> [   39.736313]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   39.736796]   SET = 0, FnV = 0
> [   39.737080]   EA = 0, S1PTW = 0
> [   39.737368]   FSC = 0x0e: level 2 permission fault
> [   39.737804] Data abort info:
> [   39.738068]   ISV = 0, ISS = 0x0000004e
> [   39.738419]   CM = 0, WnR = 1
> [   39.738693] swapper pgtable: 4k pages, 39-bit VAs, pgdp=0000000003893000
> [   39.739297] [ffffffc0107fbc70] pgd=100000007ffff003, p4d=100000007ffff003,
> pud=100000007ffff003, pmd=0040000002800781
> [   39.740270] Internal error: Oops: 9600004e [#1] SMP
> [   39.740719] Modules linked in: 8021q garp stp mrp llc crct10dif_ce
> hantro_vpu(C) fuse ip_tables x_tables ipv6
> [   39.741665] CPU: 0 PID: 123 Comm: pp Tainted: G        WC        5.15.0-
> rc1fratti-00251-g9c2ba265352a #158
> [   39.742537] Hardware name: Pine64 Rock64 (DT)
> [   39.742934] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [   39.743570] pc : dma_fence_add_callback+0xb0/0xf0
> [   39.744017] lr : dma_fence_add_callback+0x5c/0xf0
> [   39.744457] sp : ffffffc0124d3d60
> [   39.744764] x29: ffffffc0124d3d60 x28: 0000000000000000 x27:
> 0000000000000000
> [   39.745423] x26: 0000000000000000 x25: ffffffc0117fe000 x24:
> ffffff800536b6e0
> [   39.746080] x23: 0000000000000000 x22: 0000000000000000 x21:
> ffffffc0107fc3e0
> [   39.746736] x20: ffffff807db09528 x19: ffffff8000cb24c0 x18:
> 0000000000000001
> [   39.747390] x17: 000000040044ffff x16: 000000000000000c x15:
> 000000000000000d
> [   39.748044] x14: 0000000000000000 x13: 000000000000072b x12:
> 071c71c71c71c71c
> [   39.748697] x11: 000000000000072b x10: 0000000000000002 x9 :
> ffffffc01087d5ac
> [   39.749350] x8 : 0000000000000238 x7 : 0000000000000000 x6 :
> 0000000000000000
> [   39.750002] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
> ffffff8000cb24f0
> [   39.750654] x2 : 0000000000000000 x1 : ffffffc0107fbc70 x0 :
> ffffff8000cb24d0
> [   39.751309] Call trace:
> [   39.751539]  dma_fence_add_callback+0xb0/0xf0
> [   39.751944]  drm_sched_entity_pop_job+0xac/0x4a0
> [   39.752371]  drm_sched_main+0xe4/0x450
> [   39.752720]  kthread+0x12c/0x140
> [   39.753024]  ret_from_fork+0x10/0x20
> [   39.753367] Code: 91004260 f9400e61 f9000e74 a9000680 (f9000034)
> [   39.753920] ---[ end trace 47528e09b2512333 ]---
> [   40.253374] [drm:lima_sched_timedout_job] *ERROR* lima job timeout
>
> I've bisected the problem to this commit, and confirmed that reverting it gets
> glmark2's 3d horse back to spinning.
>
> It's possible this patch just uncovers a bug in lima, so I've added the lima
> list as a recipient to this reply as well.
>
> Since I doubt AMD has many Rockchip SoCs laying about, I'll gladly test any
> prospective fixes for this.
>
> Regards,
> Nicolas Frattaroli
>
>


  reply	other threads:[~2021-10-18 12:19 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-05 11:37 [Intel-gfx] Deploying new iterator interface for dma-buf Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 01/28] dma-buf: add dma_resv_for_each_fence_unlocked v8 Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 02/28] dma-buf: add dma_resv_for_each_fence v2 Christian König
2021-10-06  8:24   ` Christian König
2021-10-06  8:40   ` Tvrtko Ursulin
2021-10-06  8:52     ` Tvrtko Ursulin
2021-10-05 11:37 ` [Intel-gfx] [PATCH 03/28] dma-buf: add dma_resv selftest v3 Christian König
2021-10-13 14:04   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 04/28] dma-buf: use new iterator in dma_resv_copy_fences Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 05/28] dma-buf: use new iterator in dma_resv_get_fences v3 Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 06/28] dma-buf: use new iterator in dma_resv_wait_timeout Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 07/28] dma-buf: use new iterator in dma_resv_test_signaled Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 08/28] dma-buf: use the new iterator in dma_buf_debug_show Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 09/28] dma-buf: use the new iterator in dma_resv_poll Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 10/28] drm/ttm: use the new iterator in ttm_bo_flush_all_fences Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 11/28] drm/amdgpu: use the new iterator in amdgpu_sync_resv Christian König
2021-10-13 14:06   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 12/28] drm/amdgpu: use new iterator in amdgpu_ttm_bo_eviction_valuable Christian König
2021-10-13 14:07   ` Daniel Vetter
2021-10-19 11:36     ` Christian König
2021-10-19 16:30       ` Felix Kuehling
2021-10-21 11:29         ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 13/28] drm/amdgpu: use new iterator in amdgpu_vm_prt_fini Christian König
2021-10-13 14:12   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 14/28] drm/msm: use new iterator in msm_gem_describe Christian König
2021-10-13 14:14   ` Daniel Vetter
2021-10-19 11:49     ` Christian König
2021-10-21 11:30       ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 15/28] drm/radeon: use new iterator in radeon_sync_resv Christian König
2021-10-13 14:15   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 16/28] drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2 Christian König
2021-10-17 14:40   ` Nicolas Frattaroli
2021-10-17 15:26     ` Christian König [this message]
2021-10-05 11:37 ` [Intel-gfx] [PATCH 17/28] drm/i915: use the new iterator in i915_gem_busy_ioctl v2 Christian König
2021-10-05 12:40   ` Tvrtko Ursulin
2021-10-05 12:44     ` Christian König
2021-10-13 14:19       ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 18/28] drm/i915: use the new iterator in i915_sw_fence_await_reservation v3 Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 19/28] drm/i915: use the new iterator in i915_request_await_object v2 Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 20/28] drm/i915: use new iterator in i915_gem_object_wait_reservation Christian König
2021-10-14 12:04   ` Maarten Lankhorst
2021-10-05 11:37 ` [Intel-gfx] [PATCH 21/28] drm/i915: use new iterator in i915_gem_object_wait_priority Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 22/28] drm/i915: use new cursor in intel_prepare_plane_fb Christian König
2021-10-05 11:37 ` [Intel-gfx] [PATCH 23/28] drm: use new iterator in drm_gem_fence_array_add_implicit v3 Christian König
2021-10-13 14:21   ` Daniel Vetter
2021-10-19 12:54     ` Christian König
2021-10-19 13:59       ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 24/28] drm: use new iterator in drm_gem_plane_helper_prepare_fb v2 Christian König
2021-10-13 14:23   ` Daniel Vetter
2021-10-19 13:02     ` Christian König
2021-10-19 14:30       ` Daniel Vetter
2021-10-19 15:51         ` Christian König
2021-10-21 11:31           ` Daniel Vetter
2021-10-21 11:33   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 25/28] drm/nouveau: use the new iterator in nouveau_fence_sync Christian König
2021-10-13 14:27   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 26/28] drm/nouveau: use the new interator in nv50_wndw_prepare_fb Christian König
2021-10-13 14:29   ` Daniel Vetter
2021-10-22 13:17     ` Christian König
2021-10-28 15:26       ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 27/28] drm/etnaviv: use new iterator in etnaviv_gem_describe Christian König
2021-10-13 14:31   ` Daniel Vetter
2021-10-05 11:37 ` [Intel-gfx] [PATCH 28/28] drm/etnaviv: replace dma_resv_get_excl_unlocked Christian König
2021-10-13 14:32   ` Daniel Vetter
2021-10-05 13:27 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] dma-buf: add dma_resv_for_each_fence_unlocked v8 Patchwork
2021-10-05 13:30 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-10-05 14:01 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2021-10-01 10:05 [Intel-gfx] Deploying new iterator interface for dma-buf Christian König
2021-10-01 10:05 ` [Intel-gfx] [PATCH 16/28] drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2 Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=da9e02d6-5000-daa9-6ba1-a77c8e589ca1@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=frattaroli.nicolas@gmail.com \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=lima@lists.freedesktop.org \
    --cc=linaro-mm-sig@lists.linaro.org \
    --cc=linux-media@vger.kernel.org \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox