From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 35627F8A14D for ; Thu, 16 Apr 2026 09:56:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EA7BB10E023; Thu, 16 Apr 2026 09:56:34 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="ZzHoKpAc"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) by gabe.freedesktop.org (Postfix) with ESMTPS id E467A10E023 for ; Thu, 16 Apr 2026 09:56:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1776333392; x=1807869392; h=message-id:date:mime-version:subject:to:cc:references: from:in-reply-to:content-transfer-encoding; bh=DtqUqKJgaqyo5y4Psk+0d/jXI7MiVxP857k8ZQ8jqnQ=; b=ZzHoKpAcskh661pGJhKMQcYFy4yQrjRFOBXOJm9yzFQB1lhFQgGP6GZV mL2JPWpk4HtpOVgoxzOayugUz2N46C9g2Cj2cXN/duErZLclZmyeBvc3A 7TLBIhceeG9UxRPqiubaCD0f3x82aohrlC1QgnI3JjROXaEgyzJeWSYyt h/UybBV4GQI4/b/4AjUb1sJ3Cc7uG3VWp4r0AmofXmqKGn+wrSSqSN4+s 9uuroeZGguViC3y1sF21G+fey4hlzUn0tgZOmVEpA5zLHzxxqqQ7YokwB P6UfNr/8hFVKl+Z/SlGxP2TwhD0HgRnU221dtpH2o75rz7BAkek3HQP0k g==; X-CSE-ConnectionGUID: yXqOpmM5RD6jC1PeLoVy1A== X-CSE-MsgGUID: zXzTMMN7TA2dOhiprZjNeA== X-IronPort-AV: E=McAfee;i="6800,10657,11760"; a="94895864" X-IronPort-AV: E=Sophos;i="6.23,181,1770624000"; d="scan'208";a="94895864" Received: from fmviesa007.fm.intel.com ([10.60.135.147]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2026 02:56:32 -0700 X-CSE-ConnectionGUID: t8ei5GNwTeGgAnxAoh5X5w== X-CSE-MsgGUID: TyfT61m8QBeDGvCpDzIDIg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,181,1770624000"; d="scan'208";a="227541164" Received: from fpallare-mobl4.ger.corp.intel.com (HELO [10.245.244.235]) ([10.245.244.235]) by fmviesa007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Apr 2026 02:56:32 -0700 Message-ID: <5a8d11bf-d5ff-4c89-835d-0592f30a03c3@intel.com> Date: Thu, 16 Apr 2026 10:56:30 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH V7 02/10] gpu/buddy: Integrate lockdep for gpu buddy manager To: "Upadhyay, Tejas" , "intel-xe@lists.freedesktop.org" Cc: "Brost, Matthew" , "thomas.hellstrom@linux.intel.com" , "Ghimiray, Himal Prasad" References: <20260416074958.3722666-12-tejas.upadhyay@intel.com> <20260416074958.3722666-14-tejas.upadhyay@intel.com> Content-Language: en-GB From: Matthew Auld In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On 16/04/2026 10:43, Upadhyay, Tejas wrote: > > >> -----Original Message----- >> From: Auld, Matthew >> Sent: 16 April 2026 14:25 >> To: Upadhyay, Tejas ; intel- >> xe@lists.freedesktop.org >> Cc: Brost, Matthew ; >> thomas.hellstrom@linux.intel.com; Ghimiray, Himal Prasad >> >> Subject: Re: [RFC PATCH V7 02/10] gpu/buddy: Integrate lockdep for gpu >> buddy manager >> >> On 16/04/2026 08:49, Tejas Upadhyay wrote: >>> Integrating lockdep into the gpu_buddy manager as standard practice >>> for verifying that internal resources are correctly protected by their >>> associated locks. >>> >>> Signed-off-by: Tejas Upadhyay >>> --- >>> drivers/gpu/buddy.c | 18 ++++++++++-- >>> drivers/gpu/drm/drm_buddy.c | 7 +++-- >>> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 3 ++ >>> include/linux/gpu_buddy.h | 41 ++++++++++++++++++++++++++++ >>> 4 files changed, 65 insertions(+), 4 deletions(-) >>> >>> diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c index >>> 52686672e99f..53ff85ac2105 100644 >>> --- a/drivers/gpu/buddy.c >>> +++ b/drivers/gpu/buddy.c >>> @@ -437,6 +437,9 @@ int gpu_buddy_init(struct gpu_buddy *mm, u64 >> size, u64 chunk_size) >>> root_count++; >>> } while (size); >>> >>> +#ifdef CONFIG_LOCKDEP >>> + mm->lock_dep_map = NULL; >>> +#endif >>> return 0; >>> >>> out_free_roots: >>> @@ -464,6 +467,7 @@ void gpu_buddy_fini(struct gpu_buddy *mm) >>> unsigned int order; >>> int i; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> size = mm->size; >>> >>> for (i = 0; i < mm->n_roots; ++i) { @@ -538,6 +542,7 @@ void >>> gpu_buddy_reset_clear(struct gpu_buddy *mm, bool is_clear) >>> unsigned int order; >>> int i; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> size = mm->size; >>> for (i = 0; i < mm->n_roots; ++i) { >>> order = ilog2(size) - ilog2(mm->chunk_size); @@ -580,6 >> +585,7 @@ >>> EXPORT_SYMBOL(gpu_buddy_reset_clear); >>> void gpu_buddy_free_block(struct gpu_buddy *mm, >>> struct gpu_buddy_block *block) >>> { >>> + gpu_buddy_driver_lock_held(mm); >>> BUG_ON(!gpu_buddy_block_is_allocated(block)); >>> mm->avail += gpu_buddy_block_size(mm, block); >>> if (gpu_buddy_block_is_clear(block)) @@ -633,6 +639,7 @@ void >>> gpu_buddy_free_list(struct gpu_buddy *mm, >>> { >>> bool mark_clear = flags & GPU_BUDDY_CLEARED; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> __gpu_buddy_free_list(mm, objects, mark_clear, !mark_clear); >>> } >>> EXPORT_SYMBOL(gpu_buddy_free_list); >>> @@ -1172,6 +1179,8 @@ int gpu_buddy_block_trim(struct gpu_buddy >> *mm, >>> u64 new_start; >>> int err; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> + >>> if (!list_is_singular(blocks)) >>> return -EINVAL; >>> >>> @@ -1287,6 +1296,8 @@ int gpu_buddy_alloc_blocks(struct gpu_buddy >> *mm, >>> unsigned long pages; >>> int err; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> + >>> if (size < mm->chunk_size) >>> return -EINVAL; >>> >>> @@ -1458,9 +1469,11 @@ EXPORT_SYMBOL(gpu_buddy_alloc_blocks); >>> void gpu_buddy_block_print(struct gpu_buddy *mm, >>> struct gpu_buddy_block *block) >>> { >>> - u64 start = gpu_buddy_block_offset(block); >>> - u64 size = gpu_buddy_block_size(mm, block); >>> + u64 start, size; >>> >>> + gpu_buddy_driver_lock_held(mm); >> >> I don't think we want this one. The mm interaction is just for immutable state, >> and the block itself is essentially owned by the caller. Same reason why we >> don't want annotations for stuff like >> gpu_buddy_block_offset() etc. > > Yes, makes sense. I will change it. > >> >>> + start = gpu_buddy_block_offset(block); >>> + size = gpu_buddy_block_size(mm, block); >>> pr_info("%#018llx-%#018llx: %llu\n", start, start + size, size); >>> } >>> EXPORT_SYMBOL(gpu_buddy_block_print); >>> @@ -1475,6 +1488,7 @@ void gpu_buddy_print(struct gpu_buddy *mm) >>> { >>> int order; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> pr_info("chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, clear_free: >> %lluMiB\n", >>> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, >>> mm->clear_avail >> 20); >>> >>> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >>> index 841f3de5f307..f4ad09b8a36e 100644 >>> --- a/drivers/gpu/drm/drm_buddy.c >>> +++ b/drivers/gpu/drm/drm_buddy.c >>> @@ -25,9 +25,11 @@ void drm_buddy_block_print(struct gpu_buddy >> *mm, >>> struct gpu_buddy_block *block, >>> struct drm_printer *p) >>> { >>> - u64 start = gpu_buddy_block_offset(block); >>> - u64 size = gpu_buddy_block_size(mm, block); >>> + u64 start, size; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> + start = gpu_buddy_block_offset(block); >>> + size = gpu_buddy_block_size(mm, block); >> >> Same here. > > Yes > >> >>> drm_printf(p, "%#018llx-%#018llx: %llu\n", start, start + size, size); >>> } >>> EXPORT_SYMBOL(drm_buddy_block_print); >>> @@ -42,6 +44,7 @@ void drm_buddy_print(struct gpu_buddy *mm, struct >> drm_printer *p) >>> { >>> int order; >>> >>> + gpu_buddy_driver_lock_held(mm); >>> drm_printf(p, "chunk_size: %lluKiB, total: %lluMiB, free: %lluMiB, >> clear_free: %lluMiB\n", >>> mm->chunk_size >> 10, mm->size >> 20, mm->avail >> 20, >>> mm->clear_avail >> 20); >>> >>> diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c >>> b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c >>> index 01a9b92772f8..935e589dd4b0 100644 >>> --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c >>> +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c >>> @@ -293,7 +293,9 @@ static void xe_ttm_vram_mgr_fini(struct >> drm_device >>> *dev, void *arg) >>> >>> WARN_ON_ONCE(mgr->visible_avail != mgr->visible_size); >>> >>> + mutex_lock(&mgr->lock); >>> gpu_buddy_fini(&mgr->mm); >>> + mutex_unlock(&mgr->lock); >> >> This shouldn't need a lock. Annotation for this one should also be dropped. > > Thought was that while we call buddy_fini, there could be in flight access of buddy manager, so some new allocation in progress and thus it will provide protection against it. Please let me know what you think! The mm stuff is about to get nuked, so if something is still messing with this then we are in big trouble anyway, and grabbing the lock does't really help much. The fini() here should be triggered as we start tearing down the driver. > > Tejas >> >>> >>> ttm_resource_manager_cleanup(&mgr->manager); >>> >>> @@ -328,6 +330,7 @@ int __xe_ttm_vram_mgr_init(struct xe_device *xe, >> struct xe_ttm_vram_mgr *mgr, >>> if (err) >>> return err; >>> >>> + gpu_buddy_driver_set_lock(&mgr->mm, &mgr->lock); >>> ttm_set_driver_manager(&xe->ttm, mem_type, &mgr->manager); >>> ttm_resource_manager_set_used(&mgr->manager, true); >>> >>> diff --git a/include/linux/gpu_buddy.h b/include/linux/gpu_buddy.h >>> index 5fa917ba5450..c174de80ad72 100644 >>> --- a/include/linux/gpu_buddy.h >>> +++ b/include/linux/gpu_buddy.h >>> @@ -154,6 +154,7 @@ struct gpu_buddy_block { >>> * @avail: Total free space currently available for allocation in bytes. >>> * @clear_avail: Free space available in the clear tree (zeroed memory) in >> bytes. >>> * This is a subset of @avail. >>> + * @lock_dep_map: Annotates gpu_buddy API with a driver provided lock. >>> */ >>> struct gpu_buddy { >>> /* private: */ >>> @@ -179,8 +180,48 @@ struct gpu_buddy { >>> u64 size; >>> u64 avail; >>> u64 clear_avail; >>> +#ifdef CONFIG_LOCKDEP >>> + struct lockdep_map *lock_dep_map; >>> +#endif >>> }; >>> >>> +#ifdef CONFIG_LOCKDEP >>> +/** >>> + * gpu_buddy_driver_set_lock() - Set the lock protecting accesses to >>> +GPU BUDDY >>> + * @mm: Pointer to GPU buddy structure. >>> + * @lock: the lock used to protect the gpu buddy. The locking >>> +primitive >>> + * must contain a dep_map field. >>> + * >>> + * Call this to annotate gpu_buddy APIs which access/modify gpu_buddy >>> +manager */ #define gpu_buddy_driver_set_lock(mm, lock) \ >>> + do { \ >>> + struct gpu_buddy *__mm = (mm); \ >>> + if (!WARN(__mm->lock_dep_map, "GPU BUDDY MM lock >> should be set only once.")) \ >>> + __mm->lock_dep_map = &(lock)->dep_map; \ >>> + } while (0) >>> +#else >>> +#define gpu_buddy_driver_set_lock(mm, lock) do { (void)(mm); >>> +(void)(lock); } while (0) #endif >>> + >>> +#ifdef CONFIG_LOCKDEP >>> +/** >>> + * gpu_buddy_driver_lock_held() - Assert GPU BUDDY manager lock is >>> +held >>> + * @mm: Pointer to the GPU BUDDY structure. >>> + * >>> + * Ensure driver lock is held. >>> + */ >>> +static inline void gpu_buddy_driver_lock_held(struct gpu_buddy *mm) { >>> + if ((mm)->lock_dep_map) >>> + lockdep_assert(lock_is_held_type((mm)->lock_dep_map, 0)); >> } #else >>> +static inline gpu_buddy_driver_lock_held(struct gpu_buddy *mm) { } >>> +#endif >>> + >>> static inline u64 >>> gpu_buddy_block_offset(const struct gpu_buddy_block *block) >>> { >