From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 650B9C021AA for ; Fri, 21 Feb 2025 11:34:17 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 04CE910E1DE; Fri, 21 Feb 2025 11:34:17 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=intel.com header.i=@intel.com header.b="C4lfG50O"; dkim-atps=neutral Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) by gabe.freedesktop.org (Postfix) with ESMTPS id 76D3210E1DE for ; Fri, 21 Feb 2025 11:34:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1740137657; x=1771673657; h=message-id:subject:from:to:cc:date:in-reply-to: references:content-transfer-encoding:mime-version; bh=YpTEkAVZb8GOVYuW8lWUT5/Frze43Rav7pda/MXlZzY=; b=C4lfG50O4CaDOG4UN83YI6LU/WjKfqrej4kVbtq6FtyCcxi8Kc2YYAJ+ peEyvuA849zK/23gAb8JQg4OPD9BesozoeqNT1pAnq6P/xilqIPCiGnnh WxxMAElohH6RrxtzWaR9VQByczUPmm7Lx8Hww6YdAYEF/4BWiGjGwcyWz XxKthXMQxmiPFjXqRjOzciXz2MGJbrfVUMOHAPwvb0CvI+RWQWaH0u+MZ XU+D6DmRqe2TOzxmLIbM9ao1YeSNT7PONfkYp46qdqzB0D3qHkdQFw5Vv cobv+vIYOS36UXQrm6IYMjrAPzurJjoL1WhHEmLaClcgiCjsGL6LnZhCR Q==; X-CSE-ConnectionGUID: N9YyR9f3Qx6tqPEcLIyxWg== X-CSE-MsgGUID: HnBFQ8F1SCysDWXWJbbGdw== X-IronPort-AV: E=McAfee;i="6700,10204,11351"; a="51944844" X-IronPort-AV: E=Sophos;i="6.13,304,1732608000"; d="scan'208";a="51944844" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2025 03:34:17 -0800 X-CSE-ConnectionGUID: y9zpLGEyRB2lZEb9Zy460A== X-CSE-MsgGUID: +IUKbSWKSs2Gii/79P6OhA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.12,224,1728975600"; d="scan'208";a="115183201" Received: from carterle-desk.ger.corp.intel.com (HELO [10.245.246.42]) ([10.245.246.42]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Feb 2025 03:34:15 -0800 Message-ID: <7685842d544a4e9673dc4ec2645044a8ded58848.camel@linux.intel.com> Subject: Re: [PATCH 2/2] drm/xe: Userptr invalidation race with binds fixes From: Thomas =?ISO-8859-1?Q?Hellstr=F6m?= To: Matthew Brost , intel-xe@lists.freedesktop.org Cc: matthew.auld@intel.com Date: Fri, 21 Feb 2025 12:34:12 +0100 In-Reply-To: <20250220234557.2613010-3-matthew.brost@intel.com> References: <20250220234557.2613010-1-matthew.brost@intel.com> <20250220234557.2613010-3-matthew.brost@intel.com> Organization: Intel Sweden AB, Registration Number: 556189-6027 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.3 (3.54.3-1.fc41) MIME-Version: 1.0 X-BeenThere: intel-xe@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel Xe graphics driver List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-xe-bounces@lists.freedesktop.org Sender: "Intel-xe" On Thu, 2025-02-20 at 15:45 -0800, Matthew Brost wrote: > Squash bind operation after userptr invalidation into a clearing of > PTEs. Will prevent valid GPU page tables from pointing to stale CPU > pages. >=20 > Fixup initial bind handling always add VMAs to invalidation list and > clear PTEs. >=20 > Remove unused rebind variable in xe_pt. >=20 > Always hold notifier across TLB invalidation in notifier to prevent a > UAF if an unbind races. >=20 > Including all of the above changes for Fixes patch in hopes of an > easier > backport which fix a single patch. >=20 > Cc: Thomas Hellstr=C3=B6m > Cc: > Fixes: e8babb280b5e: ("drm/xe: Convert multiple bind ops into single > job") > Signed-off-by: Matthew Brost > --- > =C2=A0drivers/gpu/drm/xe/regs/xe_gtt_defs.h |=C2=A0 1 + > =C2=A0drivers/gpu/drm/xe/xe_pt.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 | 86 +++++++++++++++++++------ > -- > =C2=A0drivers/gpu/drm/xe/xe_pt_types.h=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2= =A0 1 - > =C2=A0drivers/gpu/drm/xe/xe_vm.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 4 +- > =C2=A04 files changed, 64 insertions(+), 28 deletions(-) >=20 > diff --git a/drivers/gpu/drm/xe/regs/xe_gtt_defs.h > b/drivers/gpu/drm/xe/regs/xe_gtt_defs.h > index 4389e5a76f89..ab281e721d94 100644 > --- a/drivers/gpu/drm/xe/regs/xe_gtt_defs.h > +++ b/drivers/gpu/drm/xe/regs/xe_gtt_defs.h > @@ -13,6 +13,7 @@ > =C2=A0 > =C2=A0#define GUC_GGTT_TOP 0xFEE00000 > =C2=A0 > +#define XE_PPGTT_IS_LEAF BIT_ULL(63) > =C2=A0#define XELPG_PPGTT_PTE_PAT3 BIT_ULL(62) > =C2=A0#define XE2_PPGTT_PTE_PAT4 BIT_ULL(61) > =C2=A0#define XE_PPGTT_PDE_PDPE_PAT2 BIT_ULL(12) > diff --git a/drivers/gpu/drm/xe/xe_pt.c b/drivers/gpu/drm/xe/xe_pt.c > index 1ddcc7e79a93..26b513a54113 100644 > --- a/drivers/gpu/drm/xe/xe_pt.c > +++ b/drivers/gpu/drm/xe/xe_pt.c > @@ -389,6 +389,8 @@ xe_pt_insert_entry(struct xe_pt_stage_bind_walk > *xe_walk, struct xe_pt *parent, > =C2=A0 idx =3D offset - entry->ofs; > =C2=A0 entry->pt_entries[idx].pt =3D xe_child; > =C2=A0 entry->pt_entries[idx].pte =3D pte; > + if (!xe_child) > + entry->pt_entries[idx].pte |=3D > XE_PPGTT_IS_LEAF; I don't think this is needed. > =C2=A0 entry->qwords++; > =C2=A0 } > =C2=A0 > @@ -792,6 +794,24 @@ static const struct xe_pt_walk_ops > xe_pt_zap_ptes_ops =3D { > =C2=A0 .pt_entry =3D xe_pt_zap_ptes_entry, > =C2=A0}; > =C2=A0 > +static bool __xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma > *vma) > +{ > + struct xe_pt_zap_ptes_walk xe_walk =3D { > + .base =3D { > + .ops =3D &xe_pt_zap_ptes_ops, > + .shifts =3D xe_normal_pt_shifts, > + .max_level =3D XE_PT_HIGHEST_LEVEL, > + }, > + .tile =3D tile, > + }; > + struct xe_pt *pt =3D xe_vma_vm(vma)->pt_root[tile->id]; > + > + (void)xe_pt_walk_shared(&pt->base, pt->level, > xe_vma_start(vma), > + xe_vma_end(vma), &xe_walk.base); > + > + return xe_walk.needs_invalidate; > +} > + > =C2=A0/** > =C2=A0 * xe_pt_zap_ptes() - Zap (zero) gpu ptes of an address range > =C2=A0 * @tile: The tile we're zapping for. > @@ -810,24 +830,12 @@ static const struct xe_pt_walk_ops > xe_pt_zap_ptes_ops =3D { > =C2=A0 */ > =C2=A0bool xe_pt_zap_ptes(struct xe_tile *tile, struct xe_vma *vma) > =C2=A0{ > - struct xe_pt_zap_ptes_walk xe_walk =3D { > - .base =3D { > - .ops =3D &xe_pt_zap_ptes_ops, > - .shifts =3D xe_normal_pt_shifts, > - .max_level =3D XE_PT_HIGHEST_LEVEL, > - }, > - .tile =3D tile, > - }; > - struct xe_pt *pt =3D xe_vma_vm(vma)->pt_root[tile->id]; > =C2=A0 u8 pt_mask =3D (vma->tile_present & ~vma->tile_invalidated); > =C2=A0 > =C2=A0 if (!(pt_mask & BIT(tile->id))) > =C2=A0 return false; > =C2=A0 > - (void)xe_pt_walk_shared(&pt->base, pt->level, > xe_vma_start(vma), > - xe_vma_end(vma), &xe_walk.base); > - > - return xe_walk.needs_invalidate; > + return __xe_pt_zap_ptes(tile, vma); > =C2=A0} > =C2=A0 > =C2=A0static void > @@ -843,9 +851,10 @@ xe_vm_populate_pgtable(struct > xe_migrate_pt_update *pt_update, struct xe_tile *t > =C2=A0 for (i =3D 0; i < num_qwords; i++) { > =C2=A0 if (map) > =C2=A0 xe_map_wr(tile_to_xe(tile), map, (qword_ofs > + i) * > - =C2=A0 sizeof(u64), u64, ptes[i].pte); > + =C2=A0 sizeof(u64), u64, ptes[i].pte & > + =C2=A0 ~XE_PPGTT_IS_LEAF); > =C2=A0 else > - ptr[i] =3D ptes[i].pte; > + ptr[i] =3D ptes[i].pte & ~XE_PPGTT_IS_LEAF; > =C2=A0 } > =C2=A0} > =C2=A0 > @@ -1201,7 +1210,30 @@ static bool xe_pt_userptr_inject_eagain(struct > xe_userptr_vma *uvma) > =C2=A0 > =C2=A0#endif > =C2=A0 > -static int vma_check_userptr(struct xe_vm *vm, struct xe_vma *vma, > +static void > +vma_convert_to_invalidation(struct xe_tile *tile, struct xe_vma > *vma, > + =C2=A0=C2=A0=C2=A0 struct xe_vm_pgtable_update_ops > *pt_update) > +{ > + int i, j, k; > + > + for (i =3D 0; i < pt_update->current_op; ++i) { > + struct xe_vm_pgtable_update_op *op =3D &pt_update- > >ops[i]; > + > + if (vma =3D=3D op->vma && (op->bind || op->rebind)) { > + for (j =3D 0; j < op->num_entries; ++j) { > + for (k =3D 0; k < op- > >entries[j].qwords; ++k) > + if (op- > >entries[j].pt_entries[k].pte & > + =C2=A0=C2=A0=C2=A0 XE_PPGTT_IS_LEAF) IIRC It's a leaf iff op->entries[j].pt->level =3D=3D 0 || xe_pte_is_huge(pte) (xe_pte_is_huge(pte) needs to be coded) > + op- > >entries[j].pt_entries[k].pte =3D 0; This should be a scratch pte unless it's a faulting VM? > + } > + } > + } > + > + __xe_pt_zap_ptes(tile, vma); Umm, A zap here would race completely with other pipelined updates right? Why is it needed? > +} > + > +static int vma_check_userptr(struct xe_tile *tile, struct xe_vm *vm, > + =C2=A0=C2=A0=C2=A0=C2=A0 struct xe_vma *vma, > =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 struct xe_vm_pgtable_update_ops > *pt_update) > =C2=A0{ > =C2=A0 struct xe_userptr_vma *uvma; > @@ -1215,9 +1247,6 @@ static int vma_check_userptr(struct xe_vm *vm, > struct xe_vma *vma, > =C2=A0 uvma =3D to_userptr_vma(vma); > =C2=A0 notifier_seq =3D uvma->userptr.notifier_seq; > =C2=A0 > - if (uvma->userptr.initial_bind && !xe_vm_in_fault_mode(vm)) > - return 0; > - > =C2=A0 if (!mmu_interval_read_retry(&uvma->userptr.notifier, > =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0 notifier_seq) && > =C2=A0 =C2=A0=C2=A0=C2=A0 !xe_pt_userptr_inject_eagain(uvma)) > @@ -1231,6 +1260,8 @@ static int vma_check_userptr(struct xe_vm *vm, > struct xe_vma *vma, > =C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 &vm->userptr.invalidated); > =C2=A0 spin_unlock(&vm->userptr.invalidated_lock); > =C2=A0 > + vma_convert_to_invalidation(tile, vma, pt_update); > + > =C2=A0 if (xe_vm_in_preempt_fence_mode(vm)) { > =C2=A0 struct dma_resv_iter cursor; > =C2=A0 struct dma_fence *fence; And I really think here we should remove the open-coded and undocumented invalidation (It took me like 15 minutes to understand what the code did, and then I'm already up-to-date with how invalidation works). If just returning -EAGAIN doesn't really work here for the preempt- fence case, Then I suggest that the error injection intstead triggers a real invalidation outside of the notifier lock. Perhaps using mmu_notifier_invalidate_range[start|end]()? But for now since this is a fix, restrict the -EINVAL inject to cases where it works and that we want to test. Thanks, /Thomas > @@ -1252,7 +1283,8 @@ static int vma_check_userptr(struct xe_vm *vm, > struct xe_vma *vma, > =C2=A0 return 0; > =C2=A0} > =C2=A0 > -static int op_check_userptr(struct xe_vm *vm, struct xe_vma_op *op, > +static int op_check_userptr(struct xe_tile *tile, struct xe_vm *vm, > + =C2=A0=C2=A0=C2=A0 struct xe_vma_op *op, > =C2=A0 =C2=A0=C2=A0=C2=A0 struct xe_vm_pgtable_update_ops > *pt_update) > =C2=A0{ > =C2=A0 int err =3D 0; > @@ -1264,18 +1296,21 @@ static int op_check_userptr(struct xe_vm *vm, > struct xe_vma_op *op, > =C2=A0 if (!op->map.immediate && xe_vm_in_fault_mode(vm)) > =C2=A0 break; > =C2=A0 > - err =3D vma_check_userptr(vm, op->map.vma, pt_update); > + err =3D vma_check_userptr(tile, vm, op->map.vma, > pt_update); > =C2=A0 break; > =C2=A0 case DRM_GPUVA_OP_REMAP: > =C2=A0 if (op->remap.prev) > - err =3D vma_check_userptr(vm, op->remap.prev, > pt_update); > + err =3D vma_check_userptr(tile, vm, op- > >remap.prev, > + pt_update); > =C2=A0 if (!err && op->remap.next) > - err =3D vma_check_userptr(vm, op->remap.next, > pt_update); > + err =3D vma_check_userptr(tile, vm, op- > >remap.next, > + pt_update); > =C2=A0 break; > =C2=A0 case DRM_GPUVA_OP_UNMAP: > =C2=A0 break; > =C2=A0 case DRM_GPUVA_OP_PREFETCH: > - err =3D vma_check_userptr(vm, gpuva_to_vma(op- > >base.prefetch.va), > + err =3D vma_check_userptr(tile, vm, > + gpuva_to_vma(op- > >base.prefetch.va), > =C2=A0 pt_update); > =C2=A0 break; > =C2=A0 default: > @@ -1301,7 +1336,8 @@ static int xe_pt_userptr_pre_commit(struct > xe_migrate_pt_update *pt_update) > =C2=A0 down_read(&vm->userptr.notifier_lock); > =C2=A0 > =C2=A0 list_for_each_entry(op, &vops->list, link) { > - err =3D op_check_userptr(vm, op, pt_update_ops); > + err =3D op_check_userptr(&vm->xe->tiles[pt_update- > >tile_id], > + =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vm, op, pt_update_ops); > =C2=A0 if (err) { > =C2=A0 up_read(&vm->userptr.notifier_lock); > =C2=A0 break; > diff --git a/drivers/gpu/drm/xe/xe_pt_types.h > b/drivers/gpu/drm/xe/xe_pt_types.h > index 384cc04de719..0f9b7650cd6f 100644 > --- a/drivers/gpu/drm/xe/xe_pt_types.h > +++ b/drivers/gpu/drm/xe/xe_pt_types.h > @@ -29,7 +29,6 @@ struct xe_pt { > =C2=A0 struct xe_bo *bo; > =C2=A0 unsigned int level; > =C2=A0 unsigned int num_live; > - bool rebind; > =C2=A0 bool is_compact; > =C2=A0#if IS_ENABLED(CONFIG_DRM_XE_DEBUG_VM) > =C2=A0 /** addr: Virtual address start address of the PT. */ > diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c > index ea2e287e6526..f90e5c92010c 100644 > --- a/drivers/gpu/drm/xe/xe_vm.c > +++ b/drivers/gpu/drm/xe/xe_vm.c > @@ -623,8 +623,6 @@ static bool vma_userptr_invalidate(struct > mmu_interval_notifier *mni, > =C2=A0 spin_unlock(&vm->userptr.invalidated_lock); > =C2=A0 } > =C2=A0 > - up_write(&vm->userptr.notifier_lock); > - > =C2=A0 /* > =C2=A0 * Preempt fences turn into schedule disables, pipeline > these. > =C2=A0 * Note that even in fault mode, we need to wait for binds > and > @@ -647,6 +645,8 @@ static bool vma_userptr_invalidate(struct > mmu_interval_notifier *mni, > =C2=A0 XE_WARN_ON(err); > =C2=A0 } > =C2=A0 > + up_write(&vm->userptr.notifier_lock); > + > =C2=A0 trace_xe_vma_userptr_invalidate_complete(vma); > =C2=A0 > =C2=A0 return true;