From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 78652CA0EFF for ; Thu, 21 Aug 2025 11:46:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id BAEC610E2AB; Thu, 21 Aug 2025 11:46:51 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="kUCDk1TB"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.17]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6F17710E947; Thu, 21 Aug 2025 11:46:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1755776810; x=1787312810; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ul9zx6zgdYUSqlYBtzIUybb4bVdjZ5vOSGPRmGr83Q0=; b=kUCDk1TBTUT2QTNqdd55ZvQ8gWH7rljEUhAMV9upLPYiZ//cA6FBTv0o jxyB4/xcKQTGGrLcbPK+hSxWmCo/efQszUhNbFKvG8H0UJEPMFTH5aJme ukJ+nT7xpcgISJoYPt/6LHuajQa3M3NdSsXlLbbhWu5XwaR5Hm0n6yc+i AlaRw3T3Wt+ZpVW3/IR4uiixOK4fvtjj04E3XNqPKPVdGnI14g3PHnVWc bAUvS70GWZOpUG048UMNBydtuyUgBZ4EHv0khct946Ii8YMTtD1hosZAe dPUbG4HSEfnkKYbQUQEocHSSM6Stt2d0M3LEIKjCVIz2dr6W9vX59whcn Q==; X-CSE-ConnectionGUID: NG/qE9CGTUKD4FiZhjoe5A== X-CSE-MsgGUID: Za4Qtx+ZRPi/3WqnOBIxdA== X-IronPort-AV: E=McAfee;i="6800,10657,11527"; a="57989443" X-IronPort-AV: E=Sophos;i="6.17,306,1747724400"; d="scan'208";a="57989443" Received: from orviesa003.jf.intel.com ([10.64.159.143]) by fmvoesa111.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2025 04:46:49 -0700 X-CSE-ConnectionGUID: zuMx8QyhR1eNklrQdYSkXA== X-CSE-MsgGUID: QE3AVlLpR16TWlYwIvGUvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.17,306,1747724400"; d="scan'208";a="172613627" Received: from johunt-mobl9.ger.corp.intel.com (HELO fedora) ([10.245.245.201]) by ORVIESA003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2025 04:46:46 -0700 From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= To: intel-xe@lists.freedesktop.org Cc: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= , Jason Gunthorpe , Andrew Morton , Simona Vetter , Dave Airlie , Alistair Popple , dri-devel@lists.freedesktop.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Matthew Brost , =?UTF-8?q?Christian=20K=C3=B6nig?= Subject: [PATCH 1/6] mm/mmu_notifier: Allow two-pass struct mmu_interval_notifiers Date: Thu, 21 Aug 2025 13:46:21 +0200 Message-ID: <20250821114626.89818-2-thomas.hellstrom@linux.intel.com> X-Mailer: git-send-email 2.50.1 In-Reply-To: <20250821114626.89818-1-thomas.hellstrom@linux.intel.com> References: <20250821114626.89818-1-thomas.hellstrom@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" GPU use-cases for mmu_interval_notifiers with hmm often involve starting a gpu operation and then waiting for it to complete. These operations are typically context preemption or TLB flushing. With single-pass notifiers per GPU this doesn't scale in multi-gpu scenarios. In those scenarios we'd want to first start preemption- or TLB flushing on all GPUs and as a second pass wait for them to complete. One can do this on per-driver basis multiplexing per-driver notifiers but that would mean sharing the notifier "user" lock across all GPUs and that doesn't scale well either, so adding support for multi-pass in the core appears to be the right choice. Implement two-pass capability in the mmu_interval_notifier. Use a linked list for the final passes to minimize the impact for use-cases that don't need the multi-pass functionality by avoiding a second interval tree walk, and to be able to easily pass data between the two passes. v1: - Restrict to two passes (Jason Gunthorpe) - Improve on documentation (Jason Gunthorpe) - Improve on function naming (Alistair Popple) Cc: Jason Gunthorpe Cc: Andrew Morton Cc: Simona Vetter Cc: Dave Airlie Cc: Alistair Popple Cc: Cc: Cc: Signed-off-by: Thomas Hellström --- include/linux/mmu_notifier.h | 42 ++++++++++++++++++++++++ mm/mmu_notifier.c | 63 ++++++++++++++++++++++++++++++------ 2 files changed, 96 insertions(+), 9 deletions(-) diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index d1094c2d5fb6..14cfb3735699 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -233,16 +233,58 @@ struct mmu_notifier { unsigned int users; }; +/** + * struct mmu_interval_notifier_finish - mmu_interval_notifier two-pass abstraction + * @link: List link for the notifiers pending pass list + * + * Allocate, typically using GFP_NOWAIT in the interval notifier's first pass. + * If allocation fails (which is not unlikely under memory pressure), fall back + * to single-pass operation. Note that with a large number of notifiers + * implementing two passes, allocation with GFP_NOWAIT will become increasingly + * likely to fail, so consider implementing a small pool instead of using + * kmalloc() allocations. + * + * If the implementation needs to pass data between the two passes, + * the recommended way is to embed strct mmu_interval_notifier_finish into a larger + * structure that also contains the data needed to be shared. Keep in mind that + * a notifier callback can be invoked in parallel, and each invocation needs its + * own struct mmu_interval_notifier_finish. + */ +struct mmu_interval_notifier_finish { + struct list_head link; + /** + * @finish: Driver callback for the finish pass. + * @final: Pointer to the mmu_interval_notifier_finish structure. + * @range: The mmu_notifier_range. + * @cur_seq: The current sequence set by the first pass. + * + * Note that there is no error reporting for additional passes. + */ + void (*finish)(struct mmu_interval_notifier_finish *final, + const struct mmu_notifier_range *range, + unsigned long cur_seq); +}; + /** * struct mmu_interval_notifier_ops * @invalidate: Upon return the caller must stop using any SPTEs within this * range. This function can sleep. Return false only if sleeping * was required but mmu_notifier_range_blockable(range) is false. + * @invalidate_start: Similar to @invalidate, but intended for two-pass notifier + * callbacks where the callto @invalidate_start is the first + * pass and any struct mmu_interval_notifier_finish pointer + * returned in the @fini parameter describes the final pass. + * If @fini is %NULL on return, then no final pass will be + * called. */ struct mmu_interval_notifier_ops { bool (*invalidate)(struct mmu_interval_notifier *interval_sub, const struct mmu_notifier_range *range, unsigned long cur_seq); + bool (*invalidate_start)(struct mmu_interval_notifier *interval_sub, + const struct mmu_notifier_range *range, + unsigned long cur_seq, + struct mmu_interval_notifier_finish **final); }; struct mmu_interval_notifier { diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c index 8e0125dc0522..fceadcd8ca24 100644 --- a/mm/mmu_notifier.c +++ b/mm/mmu_notifier.c @@ -260,6 +260,18 @@ mmu_interval_read_begin(struct mmu_interval_notifier *interval_sub) } EXPORT_SYMBOL_GPL(mmu_interval_read_begin); +static void mn_itree_final_pass(struct list_head *final_passes, + const struct mmu_notifier_range *range, + unsigned long cur_seq) +{ + struct mmu_interval_notifier_finish *f, *next; + + list_for_each_entry_safe(f, next, final_passes, link) { + list_del(&f->link); + f->finish(f, range, cur_seq); + } +} + static void mn_itree_release(struct mmu_notifier_subscriptions *subscriptions, struct mm_struct *mm) { @@ -271,6 +283,7 @@ static void mn_itree_release(struct mmu_notifier_subscriptions *subscriptions, .end = ULONG_MAX, }; struct mmu_interval_notifier *interval_sub; + LIST_HEAD(final_passes); unsigned long cur_seq; bool ret; @@ -278,11 +291,25 @@ static void mn_itree_release(struct mmu_notifier_subscriptions *subscriptions, mn_itree_inv_start_range(subscriptions, &range, &cur_seq); interval_sub; interval_sub = mn_itree_inv_next(interval_sub, &range)) { - ret = interval_sub->ops->invalidate(interval_sub, &range, - cur_seq); + if (interval_sub->ops->invalidate_start) { + struct mmu_interval_notifier_finish *final = NULL; + + ret = interval_sub->ops->invalidate_start(interval_sub, + &range, + cur_seq, + &final); + if (ret && final) + list_add_tail(&final->link, &final_passes); + + } else { + ret = interval_sub->ops->invalidate(interval_sub, + &range, + cur_seq); + } WARN_ON(!ret); } + mn_itree_final_pass(&final_passes, &range, cur_seq); mn_itree_inv_end(subscriptions); } @@ -430,7 +457,9 @@ static int mn_itree_invalidate(struct mmu_notifier_subscriptions *subscriptions, const struct mmu_notifier_range *range) { struct mmu_interval_notifier *interval_sub; + LIST_HEAD(final_passes); unsigned long cur_seq; + int err = 0; for (interval_sub = mn_itree_inv_start_range(subscriptions, range, &cur_seq); @@ -438,23 +467,39 @@ static int mn_itree_invalidate(struct mmu_notifier_subscriptions *subscriptions, interval_sub = mn_itree_inv_next(interval_sub, range)) { bool ret; - ret = interval_sub->ops->invalidate(interval_sub, range, - cur_seq); + if (interval_sub->ops->invalidate_start) { + struct mmu_interval_notifier_finish *final = NULL; + + ret = interval_sub->ops->invalidate_start(interval_sub, + range, + cur_seq, + &final); + if (ret && final) + list_add_tail(&final->link, &final_passes); + + } else { + ret = interval_sub->ops->invalidate(interval_sub, + range, + cur_seq); + } if (!ret) { if (WARN_ON(mmu_notifier_range_blockable(range))) continue; - goto out_would_block; + err = -EAGAIN; + break; } } - return 0; -out_would_block: + mn_itree_final_pass(&final_passes, range, cur_seq); + /* * On -EAGAIN the non-blocking caller is not allowed to call * invalidate_range_end() */ - mn_itree_inv_end(subscriptions); - return -EAGAIN; + if (err) + mn_itree_inv_end(subscriptions); + + return err; } static int mn_hlist_invalidate_range_start( -- 2.50.1