public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: "Steven Rostedt" <rostedt@goodmis.org>,
	"Christian König" <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>,
	intel-gfx@lists.freedesktop.org,
	Masami Hiramatsu <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Felix Kuehling <Felix.Kuehling@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
Date: Wed, 15 Mar 2023 16:25:11 +0100	[thread overview]
Message-ID: <07597f3e-0b35-c22b-91ec-fa3875d6fe22@gmail.com> (raw)
In-Reply-To: <20230315110949.1e11b3aa@gandalf.local.home>

Am 15.03.23 um 16:09 schrieb Steven Rostedt:
> On Wed, 8 Mar 2023 07:17:38 +0100
> Christian König <christian.koenig@amd.com> wrote:
>
>> Am 08.03.23 um 03:26 schrieb Steven Rostedt:
>>> On Tue, 7 Mar 2023 21:22:23 -0500
>>> Steven Rostedt <rostedt@goodmis.org> wrote:
>>>   
>>>> Looks like there was a lock possibly used after free. But as commit
>>>> 9bff18d13473a9fdf81d5158248472a9d8ecf2bd ("drm/ttm: use per BO cleanup
>>>> workers") changed a lot of this code, I figured it may be the culprit.
>>> If I bothered to look at the second warning after this one (I usually stop
>>> after the first), it appears to state there was a use after free issue.
>> Yeah, that looks like the reference count was somehow messed up.
>>
>> What test case/environment do you run to trigger this?
>>
>> Thanks for the notice,
> I'm still getting this on Linus's latest tree.

This must be some reference counting issue which only happens in your 
particular use case. We have tested this quite extensively and couldn't 
reproduce it so far.

Can you apply this code snippet here and see if you get any warning in 
the system logs?

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index 459f1b4440da..efc390bfd69c 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -314,6 +314,7 @@ static void ttm_bo_delayed_delete(struct work_struct 
*work)
         dma_resv_lock(bo->base.resv, NULL);
         ttm_bo_cleanup_memtype_use(bo);
         dma_resv_unlock(bo->base.resv);
+       bo->delayed_delete.func = NULL;
         ttm_bo_put(bo);
  }

@@ -327,6 +328,8 @@ static void ttm_bo_release(struct kref *kref)
         WARN_ON_ONCE(bo->pin_count);
         WARN_ON_ONCE(bo->bulk_move);

+       WARN_ON(bo->delayed_delete.func != NULL);
+
         if (!bo->deleted) {
                 ret = ttm_bo_individualize_resv(bo);
                 if (ret) {


Thanks,
Christian.

>
> [  230.530222] ------------[ cut here ]------------
> [  230.569795] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
> [  230.569957] WARNING: CPU: 0 PID: 212 at kernel/locking/mutex.c:582 __ww_mutex_lock.constprop.0+0x62a/0x1300
> [  230.612599] Modules linked in:
> [  230.632144] CPU: 0 PID: 212 Comm: kworker/0:8H Not tainted 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #992
> [  230.654939] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> [  230.678866] Workqueue: ttm ttm_bo_delayed_delete
> [  230.699452] EIP: __ww_mutex_lock.constprop.0+0x62a/0x1300
> [  230.720582] Code: e8 3b 9a 95 ff 85 c0 0f 84 61 fa ff ff 8b 0d 58 bc 3a c4 85 c9 0f 85 53 fa ff ff 68 54 98 06 c4 68 b7 b6 04 c4 e8 46 af 40 ff <0f> 0b 58 5a e9 3b fa ff ff 8d 74 26 00 90 a1 ec 47 b0 c4 85 c0 75
> [  230.768336] EAX: 00000028 EBX: 00000000 ECX: c51abdd8 EDX: 00000002
> [  230.792001] ESI: 00000000 EDI: c53856bc EBP: c51abf00 ESP: c51abeac
> [  230.815944] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010246
> [  230.840033] CR0: 80050033 CR2: ff9ff000 CR3: 04506000 CR4: 00150ef0
> [  230.864059] Call Trace:
> [  230.886369]  ? ttm_bo_delayed_delete+0x30/0x94
> [  230.909902]  ww_mutex_lock+0x32/0x94
> [  230.932550]  ttm_bo_delayed_delete+0x30/0x94
> [  230.955798]  process_one_work+0x21a/0x484
> [  230.979335]  worker_thread+0x14a/0x39c
> [  231.002258]  kthread+0xea/0x10c
> [  231.024769]  ? process_one_work+0x484/0x484
> [  231.047870]  ? kthread_complete_and_exit+0x1c/0x1c
> [  231.071498]  ret_from_fork+0x1c/0x28
> [  231.094701] irq event stamp: 4023
> [  231.117272] hardirqs last  enabled at (4023): [<c3d1df99>] _raw_spin_unlock_irqrestore+0x2d/0x58
> [  231.143217] hardirqs last disabled at (4022): [<c31d5a55>] kvfree_call_rcu+0x155/0x2ec
> [  231.166058] softirqs last  enabled at (3460): [<c3d1f403>] __do_softirq+0x2c3/0x3bb
> [  231.183104] softirqs last disabled at (3455): [<c30c96a9>] call_on_stack+0x45/0x4c
> [  231.200336] ---[ end trace 0000000000000000 ]---
> [  231.216572] ------------[ cut here ]------------
>
>
> This is preventing me from adding any of my own patches on v6.3-rcX due to
> this bug failing my tests. Which means I can't add anything to linux-next
> until this is fixed!
>
> -- Steve


  reply	other threads:[~2023-03-15 15:25 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08  2:22 [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete() Steven Rostedt
2023-03-08  2:26 ` Steven Rostedt
2023-03-08  6:17   ` Christian König
2023-03-08 12:43     ` Steven Rostedt
2023-03-15 18:41       ` Christian König
2023-03-15 19:15         ` [Intel-gfx] " Matthew Auld
2023-03-15 19:51           ` Christian König
2023-03-15 20:20             ` Steven Rostedt
2023-03-16  0:21               ` Steven Rostedt
2023-03-16  0:22                 ` Steven Rostedt
2023-03-17 17:42                   ` Linus Torvalds
2023-03-15 15:09     ` Steven Rostedt
2023-03-15 15:25       ` Christian König [this message]
2023-03-15 15:57         ` Steven Rostedt
2023-03-15 17:31           ` Steven Rostedt
2023-03-15 18:34             ` Christian König
2023-03-15 17:54           ` Steven Rostedt
2023-03-15 18:25             ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=07597f3e-0b35-c22b-91ec-fa3875d6fe22@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Arunpravin.PaneerSelvam@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox