public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault()
@ 2025-01-24 18:15 David Hildenbrand
  2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple

One fix and a minor cleanup.

Only compile-tested due to lack of HW, so I'd be happy if someone with
access to HW could test. But not sure how easy this is to trigger. Likely
some concurrent MADV_DONTNEED on the PTE we just converted might be able
to trigger it.

Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Alistair Popple <apopple@nvidia.com>

David Hildenbrand (2):
  nouveau/svm: fix missing folio unlock + put after
    make_device_exclusive_range()
  nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()

 drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

-- 
2.47.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
  2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand
@ 2025-01-24 18:15 ` David Hildenbrand
  2025-02-12  1:27   ` Alistair Popple
  2025-02-14 15:17   ` Danilo Krummrich
  2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand
  2025-01-28  0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple
  2 siblings, 2 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple

In case we have to retry the loop, we are missing to unlock+put the
folio. In that case, we will keep failing make_device_exclusive_range()
because we cannot grab the folio lock, and even return from the function
with the folio locked and referenced, effectively never succeeding the
make_device_exclusive_range().

While at it, convert the other unlock+put to use a folio as well.

This was found by code inspection.

Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access")
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index b4da82ddbb6b2..8ea98f06d39af 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
 	unsigned long timeout =
 		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
 	struct mm_struct *mm = svmm->notifier.mm;
+	struct folio *folio;
 	struct page *page;
 	unsigned long start = args->p.addr;
 	unsigned long notifier_seq;
@@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
 			ret = -EINVAL;
 			goto out;
 		}
+		folio = page_folio(page);
 
 		mutex_lock(&svmm->mutex);
 		if (!mmu_interval_read_retry(&notifier->notifier,
 					     notifier_seq))
 			break;
 		mutex_unlock(&svmm->mutex);
+
+		folio_unlock(folio);
+		folio_put(folio);
 	}
 
 	/* Map the page on the GPU. */
@@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
 	ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL);
 	mutex_unlock(&svmm->mutex);
 
-	unlock_page(page);
-	put_page(page);
+	folio_unlock(folio);
+	folio_put(folio);
 
 out:
 	mmu_interval_notifier_remove(&notifier->notifier);
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()
  2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand
  2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
@ 2025-01-24 18:15 ` David Hildenbrand
  2025-02-12  1:28   ` Alistair Popple
  2025-01-28  0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple
  2 siblings, 1 reply; 7+ messages in thread
From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw)
  To: linux-kernel
  Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple

ret will be modified immediately afterwards.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
index 8ea98f06d39af..2f8b8b978fc08 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -594,7 +594,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
 	struct page *page;
 	unsigned long start = args->p.addr;
 	unsigned long notifier_seq;
-	int ret = 0;
+	int ret;
 
 	ret = mmu_interval_notifier_insert(&notifier->notifier, mm,
 					args->p.addr, args->p.size,
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault()
  2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand
  2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
  2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand
@ 2025-01-28  0:13 ` Alistair Popple
  2 siblings, 0 replies; 7+ messages in thread
From: Alistair Popple @ 2025-01-28  0:13 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter

On Fri, Jan 24, 2025 at 07:15:22PM +0100, David Hildenbrand wrote:
> One fix and a minor cleanup.
> 
> Only compile-tested due to lack of HW, so I'd be happy if someone with
> access to HW could test. But not sure how easy this is to trigger. Likely
> some concurrent MADV_DONTNEED on the PTE we just converted might be able
> to trigger it.

I have this setup so will certainly test it and see if I can trigger the bug
as well. Thanks for the fixes.
 
> Cc: Karol Herbst <kherbst@redhat.com>
> Cc: Lyude Paul <lyude@redhat.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: David Airlie <airlied@gmail.com>
> Cc: Simona Vetter <simona@ffwll.ch>
> Cc: Alistair Popple <apopple@nvidia.com>
> 
> David Hildenbrand (2):
>   nouveau/svm: fix missing folio unlock + put after
>     make_device_exclusive_range()
>   nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()
> 
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++---
>  1 file changed, 8 insertions(+), 3 deletions(-)
> 
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
  2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
@ 2025-02-12  1:27   ` Alistair Popple
  2025-02-14 15:17   ` Danilo Krummrich
  1 sibling, 0 replies; 7+ messages in thread
From: Alistair Popple @ 2025-02-12  1:27 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter

On Fri, Jan 24, 2025 at 07:15:23PM +0100, David Hildenbrand wrote:
> In case we have to retry the loop, we are missing to unlock+put the
> folio. In that case, we will keep failing make_device_exclusive_range()
> because we cannot grab the folio lock, and even return from the function
> with the folio locked and referenced, effectively never succeeding the
> make_device_exclusive_range().
> 
> While at it, convert the other unlock+put to use a folio as well.
> 
> This was found by code inspection.

I couldn't (easily at least) trigger this condition from userspace, probably
because the window is pretty tight between mmu_interval_read_begin() and
mmu_interval_read_retry(). But I tested it by manually forcing at least one
retry and it fixes the issue observed via inspection. There's no reason to think
it would be entirely impossible to hit either, and the change looks good so
please add:

Reviewed-by: Alistair Popple <apopple@nvidia.com>
Tested-by: Alistair Popple <apopple@nvidia.com>

> Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access")
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index b4da82ddbb6b2..8ea98f06d39af 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
>  	unsigned long timeout =
>  		jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
>  	struct mm_struct *mm = svmm->notifier.mm;
> +	struct folio *folio;
>  	struct page *page;
>  	unsigned long start = args->p.addr;
>  	unsigned long notifier_seq;
> @@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
>  			ret = -EINVAL;
>  			goto out;
>  		}
> +		folio = page_folio(page);
>  
>  		mutex_lock(&svmm->mutex);
>  		if (!mmu_interval_read_retry(&notifier->notifier,
>  					     notifier_seq))
>  			break;
>  		mutex_unlock(&svmm->mutex);
> +
> +		folio_unlock(folio);
> +		folio_put(folio);
>  	}
>  
>  	/* Map the page on the GPU. */
> @@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
>  	ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL);
>  	mutex_unlock(&svmm->mutex);
>  
> -	unlock_page(page);
> -	put_page(page);
> +	folio_unlock(folio);
> +	folio_put(folio);
>  
>  out:
>  	mmu_interval_notifier_remove(&notifier->notifier);
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()
  2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand
@ 2025-02-12  1:28   ` Alistair Popple
  0 siblings, 0 replies; 7+ messages in thread
From: Alistair Popple @ 2025-02-12  1:28 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul,
	Danilo Krummrich, David Airlie, Simona Vetter

On Fri, Jan 24, 2025 at 07:15:24PM +0100, David Hildenbrand wrote:
> ret will be modified immediately afterwards.

Yep. Thanks for fixing.

Reviewed-by: Alistair Popple <apopple@nvidia.com>

> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c
> index 8ea98f06d39af..2f8b8b978fc08 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_svm.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
> @@ -594,7 +594,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
>  	struct page *page;
>  	unsigned long start = args->p.addr;
>  	unsigned long notifier_seq;
> -	int ret = 0;
> +	int ret;
>  
>  	ret = mmu_interval_notifier_insert(&notifier->notifier, mm,
>  					args->p.addr, args->p.size,
> -- 
> 2.47.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range()
  2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
  2025-02-12  1:27   ` Alistair Popple
@ 2025-02-14 15:17   ` Danilo Krummrich
  1 sibling, 0 replies; 7+ messages in thread
From: Danilo Krummrich @ 2025-02-14 15:17 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul,
	David Airlie, Simona Vetter, Alistair Popple

On Fri, Jan 24, 2025 at 07:15:23PM +0100, David Hildenbrand wrote:
> In case we have to retry the loop, we are missing to unlock+put the
> folio. In that case, we will keep failing make_device_exclusive_range()
> because we cannot grab the folio lock, and even return from the function
> with the folio locked and referenced, effectively never succeeding the
> make_device_exclusive_range().
> 
> While at it, convert the other unlock+put to use a folio as well.
> 
> This was found by code inspection.
> 
> Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access")
> Signed-off-by: David Hildenbrand <david@redhat.com>

Applied to drm-misc-fixes, thanks!

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-02-14 15:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand
2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
2025-02-12  1:27   ` Alistair Popple
2025-02-14 15:17   ` Danilo Krummrich
2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand
2025-02-12  1:28   ` Alistair Popple
2025-01-28  0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox