From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 32778EE8011 for ; Fri, 8 Sep 2023 14:42:03 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EBEAB10E8DA; Fri, 8 Sep 2023 14:42:02 +0000 (UTC) Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.65]) by gabe.freedesktop.org (Postfix) with ESMTPS id 870A010E8DA; Fri, 8 Sep 2023 14:42:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1694184120; x=1725720120; h=message-id:date:mime-version:subject:from:to:cc: references:in-reply-to:content-transfer-encoding; bh=Yqgc7+IYNH9wMTUddUHYFymjVShi7i8GajUcX3pwhKE=; b=CyLn6IJ/HdFko3M47gAvEsaDWr//s1dQWu6Lxc/P745GGGNDM1OZSyr2 aEQDIjBh6NO+/nXYHT70xBre+mFHCxOJiBCtjk62sPsQGmUNBcmjUB/41 aayh6iXyi/w2x+L28opA6qe/5/WRDi3GB1zlC3eT5xlAsPBvMUZEvkMeq grGlu29ryy/v2TC2uS/4mSdt0UpNbzpsKiznhk2NXgSCk78ieIWnKCFgj F03dT/ddhbg/0GHZQxiomtz2zJ642W6Og5rz5FhIPDs/DYP8npnS+XFX1 5knDSMsVhu/t6DaQTNygN0X+ZrQhO2OpU2LV3CTwbGfG0SknRjefh+Pvl g==; X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="381479656" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="381479656" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 07:31:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10827"; a="719175267" X-IronPort-AV: E=Sophos;i="6.02,237,1688454000"; d="scan'208";a="719175267" Received: from binm223x-mobl2.gar.corp.intel.com (HELO [10.249.254.172]) ([10.249.254.172]) by orsmga006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 08 Sep 2023 07:31:39 -0700 Message-ID: Date: Fri, 8 Sep 2023 16:31:36 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Content-Language: en-US From: =?UTF-8?Q?Thomas_Hellstr=c3=b6m?= To: =?UTF-8?Q?Christian_K=c3=b6nig?= , intel-xe@lists.freedesktop.org, dri-devel@lists.freedesktop.org References: <20230907135339.7971-1-thomas.hellstrom@linux.intel.com> <2a9310b5-cf08-d4fe-c08e-c3fc9d25653c@linux.intel.com> <95610a20-4364-7ba8-88be-3e303018ea79@amd.com> <84b2736f-c225-1421-f245-2de042345dea@linux.intel.com> <59839d43-c19c-e27a-51e6-fec44ac09936@amd.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [Intel-xe] [PATCH v3 0/2] drm/tests: Fix for UAF and a test for drm_exec lock alloc tracking warning X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Maxime Ripard Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 9/8/23 13:13, Thomas Hellström wrote: > > On 9/8/23 11:14, Christian König wrote: >> Am 08.09.23 um 11:04 schrieb Thomas Hellström: >>> >>> On 9/8/23 10:52, Christian König wrote: >>>> Am 08.09.23 um 09:37 schrieb Thomas Hellström: >>>>> Hi, >>>>> >>>>> On 9/7/23 16:49, Christian König wrote: >>>>>> Am 07.09.23 um 16:47 schrieb Thomas Hellström: >>>>>>> Hi, >>>>>>> >>>>>>> On 9/7/23 16:37, Christian König wrote: >>>>>>>> Am 07.09.23 um 15:53 schrieb Thomas Hellström: >>>>>>>>> While trying to replicate a weird drm_exec lock alloc tracking >>>>>>>>> warning >>>>>>>>> using the drm_exec kunit test, the warning was shadowed by a >>>>>>>>> UAF warning >>>>>>>>> from KASAN due to a bug in the drm kunit helpers. >>>>>>>>> >>>>>>>>> Patch 1 fixes that drm kunit UAF. >>>>>>>>> Patch 2 introduces a drm_exec kunit subtest that fails if the >>>>>>>>> conditions >>>>>>>>>        for the weird warning are met. >>>>>>>>> >>>>>>>>> The series previously also had a patch with a drm_exec >>>>>>>>> workaround for the >>>>>>>>> warning but that patch has already been commited to >>>>>>>>> drm_misc_next_fixes. >>>>>>>> >>>>>>>> Thinking more about this what happens when somebody calls >>>>>>>> drm_exec_unlock_obj() on the first locked object? >>>>>>>> >>>>>>> Essentially the same thing. I've been thinking of the best way >>>>>>> to handle that, but not sure what's the best one. >>>>>> >>>>>> Well what does lockdep store in that object in the first place? >>>>>> Could we fix that somehow? >>>>> >>>>> Lockdep maintains an array of held locks (lock classes) for each >>>>> task. Upon freeing, that list is traversed to see if the address >>>>> matches the stored memory address. This also has the interesting >>>>> side effect that IICR dma_resv_assert_held() checks if *any* >>>>> dma_resv is held.... >>>>> >>>>> Ideally each object would have its own class instance, but I think >>>>> some applications would then exhaust the array size. >>>> >>>> IIRC Daniel once explained to me that he designed lockdep for >>>> ww_mutexes like this for some reason, but I don't remember the >>>> details any more. >>>> >>>> Maybe lockdep wouldn't otherwise be able to deal with the fact that >>>> you could lock them in any order or something like that. >>> >>> Oh, that's well handled with the mutex_lock_nest_lock()  type of >>> annotation that's used for WW mutexes. IIRC the problem is that >>> lockdep can't really deal with either that vast number of locks >>> overall or the vast number of held locks per process. >> >> Could we somehow teach lockdep that multiple locks of a lock class >> can be held at the same time? E.g. like a reference count in the >> lockclass or something like that? >> >>> >>>> >>>>> >>>>> I'll dig a bit deeper into this. >>>>> >>>>> >>>>> Meanwhile for the unlock problem, looking at how the unlocks are >>>>> used in i915 it's typically locks that are grabbed during eviction >>>>> and released again once validation of a single object succeeded. >>>>> The risk of them ending up at the first lock is small, unless they >>>>> are prelocked as the contended lock. But for these "temporary" >>>>> objects, the prelocked lock is immediately dropped after locking >>>>> and are only used to find something suitable to wait for to relax >>>>> the ww transaction. >>>> >>>> Yeah, I don't see this as an use case in reality. It's more of a >>>> "what if?" thing. >>> >>> Oh, it's a real use-case. As soon as you start having sleeping locks >>> for eviction you hit it, in particular with WW mutex slowpath >>> debugging. And we will need to work on improving TTM support for >>> that for xe. >> >> Oh, good point! When we have contention on a lock, rollback and take >> that lock then first it can be that this lock then needs to be >> unlocked again. Unlikely, but certainly possible. >> >> Sounds like we really need to fix this in lockdep then. > > So it seems lockdep *does* reference counting in this case, but stores > the address of the first locked lockdep map, and then subsequently > uses it for various things. In short freeing the first lock isn't > something lockdep thinks you should do. Ever. > > The good thing about this is that this refcounting appears only done > on nest locks, that is, when we have a ww context AFAICT. That means > we can probably store a fake ww_mutex lockdep map with the ww acquire > context and lock it when we initialize the context and unlock it on > ww_acquire_fini(). > > Should take care of the problem I think, although the problem of > lockdep_assert() and lock freeing granularity will remain. It looks > like there is a comparison function one can optionally set to make > different objects look separate to lockdep. Probably something to > think of for enhanced debugging with a limited set of locked objects. > > Need to also check what happens if we do a sequence of successful > trylocks. OK, nested trylocks indeed seem to store one instance per lock, so not prone to the problem. For locks under a ww_acquire_ctx, the solution outlined above appears to work, and it's restricted to lockdep code only. /Thomas > > /Thomas > >> >> Christian. >> >>> >>>> >>>>> >>>>> If we were to implement something similar in drm_exec, we'd need >>>>> an interface to mark an object as "temporary" when locking, and >>>>> make sure we drop those objects if they end up as "prelocked". >>>>> Personally I think this solution works well and would be my >>>>> preferred choice. >>>>> >>>>> Yet another alternative would be to keep a reference even of the >>>>> unlocked objects... >>>>> >>>>> But these workarounds ofc only push the problem out of drm_exec. >>>>> Users of raw dma-resv or ww mutexes would still wonder what's >>>>> going on. >>>> >>>> Agree, completely. This is really a bug in lockdep or rather how we >>>> designed to implement ww_mutexes in lockdep and should therefore be >>>> fixed there I think. >>> >>> >>>> >>>> Christian. >>>> >>>>> >>>>> /Thomas >>>>> >>>>> >>>>> >>>>>> >>>>>> Christian. >>>>>> >>>>>>> >>>>>>> /Thomas >>>>>>> >>>>>>> >>>>>>>> Christian. >>>>>>>> >>>>>>>>> >>>>>>>>> v2: >>>>>>>>> - Rewording of commit messages >>>>>>>>> - Add some commit message tags >>>>>>>>> v3: >>>>>>>>> - Remove an already committed patch >>>>>>>>> - Rework the test to not require dmesg inspection (Maxime Ripard) >>>>>>>>> - Condition the test on CONFIG_LOCK_ALLOC >>>>>>>>> - Update code comments and commit messages (Maxime Ripard) >>>>>>>>> >>>>>>>>> Cc: Maxime Ripard >>>>>>>>> Cc: Christian König >>>>>>>>> >>>>>>>>> Thomas Hellström (2): >>>>>>>>>    drm/tests: helpers: Avoid a driver uaf >>>>>>>>>    drm/tests/drm_exec: Add a test for object freeing within >>>>>>>>>      drm_exec_fini() >>>>>>>>> >>>>>>>>>   drivers/gpu/drm/tests/drm_exec_test.c | 82 >>>>>>>>> +++++++++++++++++++++++++++ >>>>>>>>>   include/drm/drm_kunit_helpers.h       |  4 +- >>>>>>>>>   2 files changed, 85 insertions(+), 1 deletion(-) >>>>>>>>> >>>>>>>> >>>>>> >>>> >>