Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
To: "Christian König" <christian.koenig@amd.com>,
	intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org
Cc: Maxime Ripard <mripard@kernel.org>
Subject: Re: [Intel-xe] [PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning
Date: Fri, 8 Sep 2023 16:31:36 +0200	[thread overview]
Message-ID: <e848c4f8-431b-1b7d-6300-d7f1fcd2c948@linux.intel.com> (raw)
In-Reply-To: <de7a7309-9c5b-09a7-7557-2d6050838215@linux.intel.com>


On 9/8/23 13:13, Thomas Hellström wrote:
>
> On 9/8/23 11:14, Christian König wrote:
>> Am 08.09.23 um 11:04 schrieb Thomas Hellström:
>>>
>>> On 9/8/23 10:52, Christian König wrote:
>>>> Am 08.09.23 um 09:37 schrieb Thomas Hellström:
>>>>> Hi,
>>>>>
>>>>> On 9/7/23 16:49, Christian König wrote:
>>>>>> Am 07.09.23 um 16:47 schrieb Thomas Hellström:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On 9/7/23 16:37, Christian König wrote:
>>>>>>>> Am 07.09.23 um 15:53 schrieb Thomas Hellström:
>>>>>>>>> While trying to replicate a weird drm_exec lock alloc tracking 
>>>>>>>>> warning
>>>>>>>>> using the drm_exec kunit test, the warning was shadowed by a 
>>>>>>>>> UAF warning
>>>>>>>>> from KASAN due to a bug in the drm kunit helpers.
>>>>>>>>>
>>>>>>>>> Patch 1 fixes that drm kunit UAF.
>>>>>>>>> Patch 2 introduces a drm_exec kunit subtest that fails if the 
>>>>>>>>> conditions
>>>>>>>>>        for the weird warning are met.
>>>>>>>>>
>>>>>>>>> The series previously also had a patch with a drm_exec 
>>>>>>>>> workaround for the
>>>>>>>>> warning but that patch has already been commited to 
>>>>>>>>> drm_misc_next_fixes.
>>>>>>>>
>>>>>>>> Thinking more about this what happens when somebody calls 
>>>>>>>> drm_exec_unlock_obj() on the first locked object?
>>>>>>>>
>>>>>>> Essentially the same thing. I've been thinking of the best way 
>>>>>>> to handle that, but not sure what's the best one.
>>>>>>
>>>>>> Well what does lockdep store in that object in the first place? 
>>>>>> Could we fix that somehow?
>>>>>
>>>>> Lockdep maintains an array of held locks (lock classes) for each 
>>>>> task. Upon freeing, that list is traversed to see if the address 
>>>>> matches the stored memory address. This also has the interesting 
>>>>> side effect that IICR dma_resv_assert_held() checks if *any* 
>>>>> dma_resv is held....
>>>>>
>>>>> Ideally each object would have its own class instance, but I think 
>>>>> some applications would then exhaust the array size.
>>>>
>>>> IIRC Daniel once explained to me that he designed lockdep for 
>>>> ww_mutexes like this for some reason, but I don't remember the 
>>>> details any more.
>>>>
>>>> Maybe lockdep wouldn't otherwise be able to deal with the fact that 
>>>> you could lock them in any order or something like that.
>>>
>>> Oh, that's well handled with the mutex_lock_nest_lock()  type of 
>>> annotation that's used for WW mutexes. IIRC the problem is that 
>>> lockdep can't really deal with either that vast number of locks 
>>> overall or the vast number of held locks per process.
>>
>> Could we somehow teach lockdep that multiple locks of a lock class 
>> can be held at the same time? E.g. like a reference count in the 
>> lockclass or something like that?
>>
>>>
>>>>
>>>>>
>>>>> I'll dig a bit deeper into this.
>>>>>
>>>>>
>>>>> Meanwhile for the unlock problem, looking at how the unlocks are 
>>>>> used in i915 it's typically locks that are grabbed during eviction 
>>>>> and released again once validation of a single object succeeded. 
>>>>> The risk of them ending up at the first lock is small, unless they 
>>>>> are prelocked as the contended lock. But for these "temporary" 
>>>>> objects, the prelocked lock is immediately dropped after locking 
>>>>> and are only used to find something suitable to wait for to relax 
>>>>> the ww transaction.
>>>>
>>>> Yeah, I don't see this as an use case in reality. It's more of a 
>>>> "what if?" thing.
>>>
>>> Oh, it's a real use-case. As soon as you start having sleeping locks 
>>> for eviction you hit it, in particular with WW mutex slowpath 
>>> debugging. And we will need to work on improving TTM support for 
>>> that for xe.
>>
>> Oh, good point! When we have contention on a lock, rollback and take 
>> that lock then first it can be that this lock then needs to be 
>> unlocked again. Unlikely, but certainly possible.
>>
>> Sounds like we really need to fix this in lockdep then.
>
> So it seems lockdep *does* reference counting in this case, but stores 
> the address of the first locked lockdep map, and then subsequently 
> uses it for various things. In short freeing the first lock isn't 
> something lockdep thinks you should do. Ever.
>
> The good thing about this is that this refcounting appears only done 
> on nest locks, that is, when we have a ww context AFAICT. That means 
> we can probably store a fake ww_mutex lockdep map with the ww acquire 
> context and lock it when we initialize the context and unlock it on 
> ww_acquire_fini().
>
> Should take care of the problem I think, although the problem of 
> lockdep_assert() and lock freeing granularity will remain. It looks 
> like there is a comparison function one can optionally set to make 
> different objects look separate to lockdep. Probably something to 
> think of for enhanced debugging with a limited set of locked objects.
>
> Need to also check what happens if we do a sequence of successful 
> trylocks.

OK, nested trylocks indeed seem to store one instance per lock, so not 
prone to the problem.

For locks under a ww_acquire_ctx, the solution outlined above appears to 
work, and it's restricted to lockdep code only.

/Thomas


>
> /Thomas
>
>>
>> Christian.
>>
>>>
>>>>
>>>>>
>>>>> If we were to implement something similar in drm_exec, we'd need 
>>>>> an interface to mark an object as "temporary" when locking, and 
>>>>> make sure we drop those objects if they end up as "prelocked". 
>>>>> Personally I think this solution works well and would be my 
>>>>> preferred choice.
>>>>>
>>>>> Yet another alternative would be to keep a reference even of the 
>>>>> unlocked objects...
>>>>>
>>>>> But these workarounds ofc only push the problem out of drm_exec. 
>>>>> Users of raw dma-resv or ww mutexes would still wonder what's 
>>>>> going on.
>>>>
>>>> Agree, completely. This is really a bug in lockdep or rather how we 
>>>> designed to implement ww_mutexes in lockdep and should therefore be 
>>>> fixed there I think.
>>>
>>>
>>>>
>>>> Christian.
>>>>
>>>>>
>>>>> /Thomas
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Christian.
>>>>>>
>>>>>>>
>>>>>>> /Thomas
>>>>>>>
>>>>>>>
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> v2:
>>>>>>>>> - Rewording of commit messages
>>>>>>>>> - Add some commit message tags
>>>>>>>>> v3:
>>>>>>>>> - Remove an already committed patch
>>>>>>>>> - Rework the test to not require dmesg inspection (Maxime Ripard)
>>>>>>>>> - Condition the test on CONFIG_LOCK_ALLOC
>>>>>>>>> - Update code comments and commit messages (Maxime Ripard)
>>>>>>>>>
>>>>>>>>> Cc: Maxime Ripard <mripard@kernel.org>
>>>>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>>>>>
>>>>>>>>> Thomas Hellström (2):
>>>>>>>>>    drm/tests: helpers: Avoid a driver uaf
>>>>>>>>>    drm/tests/drm_exec: Add a test for object freeing within
>>>>>>>>>      drm_exec_fini()
>>>>>>>>>
>>>>>>>>>   drivers/gpu/drm/tests/drm_exec_test.c | 82 
>>>>>>>>> +++++++++++++++++++++++++++
>>>>>>>>>   include/drm/drm_kunit_helpers.h       |  4 +-
>>>>>>>>>   2 files changed, 85 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>

  reply	other threads:[~2023-09-08 14:42 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-07 13:53 [Intel-xe] [PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning Thomas Hellström
2023-09-07 13:53 ` [Intel-xe] [PATCH v3 1/2] drm/tests: helpers: Avoid a driver uaf Thomas Hellström
2023-09-07 14:50   ` Maxime Ripard
2023-09-11 12:40   ` Francois Dugast
2023-09-11 13:04     ` Thomas Hellström
2023-09-14 11:59   ` [Intel-xe] (subset) " Maxime Ripard
2023-09-07 13:53 ` [Intel-xe] [PATCH v3 2/2] drm/tests/drm_exec: Add a test for object freeing within drm_exec_fini() Thomas Hellström
2023-09-07 14:52   ` Maxime Ripard
2023-09-07 14:37 ` [Intel-xe] [PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning Christian König
2023-09-07 14:47   ` Thomas Hellström
2023-09-07 14:49     ` Christian König
2023-09-08  7:37       ` Thomas Hellström
2023-09-08  8:52         ` Christian König
2023-09-08  9:04           ` Thomas Hellström
2023-09-08  9:14             ` Christian König
2023-09-08 11:13               ` Thomas Hellström
2023-09-08 14:31                 ` Thomas Hellström [this message]
2023-09-07 23:49 ` [Intel-xe] ✓ CI.Patch_applied: success for " Patchwork
2023-09-07 23:49 ` [Intel-xe] ✗ CI.checkpatch: warning " Patchwork
2023-09-07 23:50 ` [Intel-xe] ✓ CI.KUnit: success " Patchwork
2023-09-07 23:57 ` [Intel-xe] ✓ CI.Build: " Patchwork
2023-09-07 23:57 ` [Intel-xe] ✓ CI.Hooks: " Patchwork
2023-09-07 23:59 ` [Intel-xe] ✓ CI.checksparse: " Patchwork
2023-09-08  0:30 ` [Intel-xe] ✓ CI.BAT: " Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e848c4f8-431b-1b7d-6300-d7f1fcd2c948@linux.intel.com \
    --to=thomas.hellstrom@linux.intel.com \
    --cc=christian.koenig@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=mripard@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox