public inbox for linux-media@vger.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: "Christian König" <christian.koenig@amd.com>,
	"Arunpravin Paneer Selvam" <Arunpravin.PaneerSelvam@amd.com>,
	intel-gfx@lists.freedesktop.org,
	"Masami Hiramatsu" <mhiramat@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	linux-media@vger.kernel.org
Subject: Re: [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
Date: Wed, 15 Mar 2023 19:34:37 +0100	[thread overview]
Message-ID: <b6bf19b1-1265-a5cc-7a82-300fe7bdd15b@gmail.com> (raw)
In-Reply-To: <20230315133146.3a48206e@gandalf.local.home>

Am 15.03.23 um 18:31 schrieb Steven Rostedt:
> On Wed, 15 Mar 2023 11:57:12 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> So I'm looking at the backtraces.
>
>> The WARN_ON triggered:
>>
>> [   21.481449] mpls_gso: MPLS GSO support
>> [   21.488795] IPI shorthand broadcast: enabled
>> [   21.488873] ------------[ cut here ]------------
>> [   21.490101] ------------[ cut here ]------------
>>
>> [   21.491693] WARNING: CPU: 1 PID: 38 at drivers/gpu/drm/ttm/ttm_bo.c:332 ttm_bo_release+0x2ac/0x2fc  <<<---- Line of the added WARN_ON()
> This happened on CPU 1.
>
>> [   21.492940] refcount_t: underflow; use-after-free.
>> [   21.492965] WARNING: CPU: 0 PID: 84 at lib/refcount.c:28 refcount_warn_saturate+0xb6/0xfc
> This happened on CPU 0.
>
>> [   21.496116] Modules linked in:
>> [   21.497197] Modules linked in:
>> [   21.500105] CPU: 1 PID: 38 Comm: kworker/1:1 Not tainted 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
>> [   21.500789] CPU: 0 PID: 84 Comm: kworker/0:1H Not tainted 6.3.0-rc2-test-00047-g6015b1aca1a2-dirty #993
>> [   21.501882] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
>> [   21.503533] sched_clock: Marking stable (20788024762, 714243692)->(22140778105, -638509651)
>> [   21.504080] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
>> [   21.504089] Workqueue: ttm ttm_bo_delayed_delete
>> [   21.507196] Workqueue: events drm_fb_helper_damage_work
>> [   21.509235]
>> [   21.510291] registered taskstats version 1
>> [   21.510302] Running ring buffer tests...
>> [   21.511792]
>> [   21.513870] EIP: refcount_warn_saturate+0xb6/0xfc
>> [   21.515261] EIP: ttm_bo_release+0x2ac/0x2fc
>> [   21.516566] Code: 68 00 27 0c d8 e8 36 3b aa ff 0f 0b 58 c9 c3 90 80 3d 41 c2 37 d8 00 75 8a c6 05 41 c2 37 d8 01 68 2c 27 0c d8 e8 16 3b aa ff <0f> 0b 59 c9 c3 80 3d 3f c2 37 d8 00 0f 85 67 ff ff ff c6 05 3f c2
>> [   21.516998] Code: ff 8d b4 26 00 00 00 00 66 90 0f 0b 8b 43 10 85 c0 0f 84 a1 fd ff ff 8d 76 00 0f 0b 8b 43 28 85 c0 0f 84 9c fd ff ff 8d 76 00 <0f> 0b e9 92 fd ff ff 8d b4 26 00 00 00 00 66 90 c7 43 18 00 00 00
>> [   21.517905] EAX: 00000026 EBX: c129d150 ECX: 00000040 EDX: 00000002
>> [   21.518987] EAX: d78c8550 EBX: c129d134 ECX: c129d134 EDX: 00000001
>> [   21.519337] ESI: c129d0bc EDI: f6f91200 EBP: c2b8bf18 ESP: c2b8bf14
>> [   21.520617] ESI: c129d000 EDI: c126a7a0 EBP: c1839c24 ESP: c1839bec
>> [   21.521546] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
>> [   21.526154] DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 EFLAGS: 00010286
>> [   21.526162] CR0: 80050033 CR2: 00000000 CR3: 18506000 CR4: 00150ef0
>> [   21.526166] Call Trace:
>> [   21.526189]  ? ww_mutex_unlock+0x3a/0x94
>> [   21.530300] CR0: 80050033 CR2: ff9ff000 CR3: 18506000 CR4: 00150ef0
>> [   21.531722]  ? ttm_bo_cleanup_refs+0xc4/0x1e0
>> [   21.533114] Call Trace:
>> [   21.534516]  ttm_mem_evict_first+0x3d3/0x568
>> [   21.535901]  ttm_bo_delayed_delete+0x9c/0xa4
>> [   21.537391]  ? kfree+0x6b/0xdc
>> [   21.538901]  process_one_work+0x21a/0x484
>> [   21.540279]  ? ttm_range_man_alloc+0xe0/0xec
>> [   21.540854]  worker_thread+0x14a/0x39c
>> [   21.541714]  ? ttm_range_man_fini_nocheck+0xe8/0xe8
>> [   21.543332]  kthread+0xea/0x10c
>> [   21.544301]  ttm_bo_mem_space+0x1d0/0x1e4
>> [   21.544942]  ? process_one_work+0x484/0x484
>> [   21.545887]  ttm_bo_validate+0xc5/0x19c
>> [   21.546986]  ? kthread_complete_and_exit+0x1c/0x1c
>> [   21.547680]  ttm_bo_init_reserved+0x15e/0x1fc
>> [   21.548716]  ret_from_fork+0x1c/0x28
>> [   21.549650]  qxl_bo_create+0x145/0x20c
> The qxl_bo_create() calls ttm_bo_init_reserved() as the object in question
> is about to be freed.
>
> I'm guessing what is happening here, is that an object was to be freed by
> the delayed_delete, and in the mean time, something else picked it up.
>
> What's protecting this from not being used again?

The reference count. This is pretty clearly an unbalanced reference 
counting issue.

It's just that previously you wouldn't notice it much because we were 
just silently removing the BO from the LRU list without checking if it 
was already removed (and so just damaging a bit of memory).

While now we get tons of errors because the delayed worker actually runs 
no matter if the BO is already freed or not.

Christian.

>
> -- Steve
>


  reply	other threads:[~2023-03-15 18:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-08  2:22 [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete() Steven Rostedt
2023-03-08  2:26 ` Steven Rostedt
2023-03-08  6:17   ` Christian König
2023-03-08 12:43     ` Steven Rostedt
2023-03-15 18:41       ` Christian König
2023-03-15 19:15         ` [Intel-gfx] " Matthew Auld
2023-03-15 19:51           ` Christian König
2023-03-15 20:20             ` Steven Rostedt
2023-03-16  0:21               ` Steven Rostedt
2023-03-16  0:22                 ` Steven Rostedt
2023-03-17 17:42                   ` Linus Torvalds
2023-03-15 15:09     ` Steven Rostedt
2023-03-15 15:25       ` Christian König
2023-03-15 15:57         ` Steven Rostedt
2023-03-15 17:31           ` Steven Rostedt
2023-03-15 18:34             ` Christian König [this message]
2023-03-15 17:54           ` Steven Rostedt
2023-03-15 18:25             ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6bf19b1-1265-a5cc-7a82-300fe7bdd15b@gmail.com \
    --to=ckoenig.leichtzumerken@gmail.com \
    --cc=Arunpravin.PaneerSelvam@amd.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-media@vger.kernel.org \
    --cc=mhiramat@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox