From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id EF64AC43458 for ; Sat, 27 Jun 2026 04:10:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5860C10E448; Sat, 27 Jun 2026 04:10:37 +0000 (UTC) Authentication-Results: gabe.freedesktop.org; dkim=pass (2048-bit key; unprotected) header.d=kernel.org header.i=@kernel.org header.b="YCIVVcqe"; dkim-atps=neutral Received: from sea.source.kernel.org (sea.source.kernel.org [172.234.252.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 893D310E448 for ; Sat, 27 Jun 2026 04:10:36 +0000 (UTC) Received: from smtp.kernel.org (quasi.space.kernel.org [100.103.45.18]) by sea.source.kernel.org (Postfix) with ESMTP id 30EFC407AF; Sat, 27 Jun 2026 04:10:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id DF0091F000E9; Sat, 27 Jun 2026 04:10:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1782533436; bh=600H/TcEXzMbT+uYVvnEzo/XNkjluiZ8R0QHfXOkHNw=; h=From:Subject:Reply-To:To:Cc:In-Reply-To:References:Date; b=YCIVVcqejnWc7hhB4FXOiWhlC80ommInpC1TwfuuMoxvpmTR+wdBmiDpJRDZnatHi j3NLjunCqQIlCSAgmVLYsjmvfGz9rjUzZDTrIGiLqSpu2DJcvnuBpExmkp2Ci97Mo8 pQJlu9h/PDEP9JRsAKTryREFWmvRy0R803ZR282S2bOP15/5g4C0chGSlG/bBQMasU wvlNQNQgnCt6tzpi0rtqA16/SOdyKHdWpCL70TYaIyJeTQXIsH4T8Qt9Zw+K90EsLJ eri9z+0cTXFDSwbQcSnkpeiTljsyboNXGBvLhBruJadU97gGXhH0pNkLPF5i0Vc116 UYWTudbsBiT6w== From: sashiko-bot@kernel.org Subject: Re: [PATCH v5 1/5] drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags To: "Honglei Huang" Cc: dri-devel@lists.freedesktop.org In-Reply-To: <20260627035418.3796481-1-honglei1.huang@amd.com> References: <20260627035418.3796481-1-honglei1.huang@amd.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Date: Sat, 27 Jun 2026 04:10:35 +0000 Message-Id: <20260627041035.DF0091F000E9@smtp.kernel.org> X-BeenThere: dri-devel@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Direct Rendering Infrastructure - Development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: sashiko-reviews@lists.linux.dev Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" Thank you for your contribution! Sashiko AI review found 2 potential issue(= s) to consider: - [High] The page fault handler silently drops the `-EACCES` error and inco= rrectly returns 0 (success) when migration to device memory is not permitte= d, leading to an infinite page fault loop. - [Medium] Concurrent lockless read of `migrate_devmem` races with bitfield= update of `unmapped`, causing undefined behavior (KCSAN warning). -- commit d82186ac8e065d21c32eea3f706b65fa120ad8c5 Author: Honglei Huang drm/gpusvm: split MM state flags out of drm_gpusvm_pages_flags This commit splits the MM state flags (like unmapped and partial_unmap) out of drm_gpusvm_pages_flags into a new drm_gpusvm_range_flags struct. It also updates drivers to check these new range-level flags to avoid handling unmapped pages. Link: https://gitlab.freedesktop.org/mbrost/xe-kernel-driver-svn-perf-6-15-= 2025/-/commit/623f6a50c037d9e44f6c9fbe6859a0ba7ad50177 > diff --git a/drivers/gpu/drm/drm_gpusvm.c b/drivers/gpu/drm/drm_gpusvm.c > index 958cb605aedd5..df900553f21a4 100644 > --- a/drivers/gpu/drm/drm_gpusvm.c > +++ b/drivers/gpu/drm/drm_gpusvm.c [ ... ] > @@ -1784,20 +1784,35 @@ EXPORT_SYMBOL_GPL(drm_gpusvm_has_mapping); > /** > * drm_gpusvm_range_set_unmapped() - Mark a GPU SVM range as unmapped > * @range: Pointer to the GPU SVM range structure. > + * @pages: Pointer to the GPU SVM pages structure(s). > + * @pages_count: Number of GPU SVM pages structure(s) passed in. > * @mmu_range: Pointer to the MMU notifier range structure. > * > * This function marks a GPU SVM range as unmapped and sets the partial_= unmap flag > * if the range partially falls within the provided MMU notifier range. > */ > void drm_gpusvm_range_set_unmapped(struct drm_gpusvm_range *range, > + struct drm_gpusvm_pages *pages, > + unsigned int pages_count, > const struct mmu_notifier_range *mmu_range) > { > + unsigned int i; > + > lockdep_assert_held_write(&range->gpusvm->notifier_lock); > =20 > - range->pages.flags.unmapped =3D true; > + range->flags.unmapped =3D true; [Severity: Medium] Will this direct bitfield assignment cause a data race? This assignment generates a read-modify-write on the shared u16 word. If a concurrent GPU page fault triggers __xe_svm_handle_pagefault() and does a lockless read of range->base.flags.migrate_devmem, it can race with this update and trigger undefined behavior. The patch correctly uses a WRITE_ONCE on a union for pages[i].flags just below this to prevent this exact issue. Is the same union and WRITE_ONCE pattern needed here for range->flags? > + for (i =3D 0; i < pages_count; ++i) { > + struct drm_gpusvm_pages_flags flags =3D { > + .__flags =3D pages[i].flags.__flags, > + }; > + > + flags.unmapped =3D true; > + /* WRITE_ONCE pairs with READ_ONCE for opportunistic checks */ > + WRITE_ONCE(pages[i].flags.__flags, flags.__flags); > + } [ ... ] > diff --git a/drivers/gpu/drm/xe/xe_svm.c b/drivers/gpu/drm/xe/xe_svm.c > index b1e1ac26c66d9..63da149f3b7fd 100644 > --- a/drivers/gpu/drm/xe/xe_svm.c > +++ b/drivers/gpu/drm/xe/xe_svm.c [ ... ] > @@ -1248,8 +1249,10 @@ static int __xe_svm_handle_pagefault(struct xe_vm = *vm, struct xe_vma *vma, > =20 > xe_svm_range_fault_count_stats_incr(gt, range); > =20 > - if (ctx.devmem_only && !range->base.pages.flags.migrate_devmem) > - return -EACCES; > + if (ctx.devmem_only && !range->base.flags.migrate_devmem) { > + err =3D -EACCES; > + goto out; > + } [Severity: High] Does this silently drop the -EACCES error? Looking at the rest of __xe_svm_handle_pagefault(), the out label unconditionally returns 0: out: xe_svm_range_fault_us_stats_incr(gt, range, start); return 0; err_out: if (err =3D=3D -EAGAIN) { Jumping to out instead of returning the error (or jumping to err_out) signa= ls incorrect success for the unhandled GPU page fault. Could this lead to an infinite page fault loop where the hardware continually retries and faults again on the same address? --=20 Sashiko AI review =C2=B7 https://sashiko.dev/#/patchset/20260627035418.3796= 481-1-honglei1.huang@amd.com?part=3D1