* [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault()
@ 2025-01-24 18:15 David Hildenbrand
2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand
` (2 more replies)
0 siblings, 3 replies; 7+ messages in thread
From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw)
To: linux-kernel
Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul,
Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple
One fix and a minor cleanup.
Only compile-tested due to lack of HW, so I'd be happy if someone with
access to HW could test. But not sure how easy this is to trigger. Likely
some concurrent MADV_DONTNEED on the PTE we just converted might be able
to trigger it.
Cc: Karol Herbst <kherbst@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Alistair Popple <apopple@nvidia.com>
David Hildenbrand (2):
nouveau/svm: fix missing folio unlock + put after
make_device_exclusive_range()
nouveau/svm: don't initialize ret in nouveau_atomic_range_fault()
drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--
2.47.1
^ permalink raw reply [flat|nested] 7+ messages in thread* [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() 2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand @ 2025-01-24 18:15 ` David Hildenbrand 2025-02-12 1:27 ` Alistair Popple 2025-02-14 15:17 ` Danilo Krummrich 2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand 2025-01-28 0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple 2 siblings, 2 replies; 7+ messages in thread From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw) To: linux-kernel Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul, Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple In case we have to retry the loop, we are missing to unlock+put the folio. In that case, we will keep failing make_device_exclusive_range() because we cannot grab the folio lock, and even return from the function with the folio locked and referenced, effectively never succeeding the make_device_exclusive_range(). While at it, convert the other unlock+put to use a folio as well. This was found by code inspection. Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") Signed-off-by: David Hildenbrand <david@redhat.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index b4da82ddbb6b2..8ea98f06d39af 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, unsigned long timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); struct mm_struct *mm = svmm->notifier.mm; + struct folio *folio; struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; @@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = -EINVAL; goto out; } + folio = page_folio(page); mutex_lock(&svmm->mutex); if (!mmu_interval_read_retry(¬ifier->notifier, notifier_seq)) break; mutex_unlock(&svmm->mutex); + + folio_unlock(folio); + folio_put(folio); } /* Map the page on the GPU. */ @@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); mutex_unlock(&svmm->mutex); - unlock_page(page); - put_page(page); + folio_unlock(folio); + folio_put(folio); out: mmu_interval_notifier_remove(¬ifier->notifier); -- 2.47.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() 2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand @ 2025-02-12 1:27 ` Alistair Popple 2025-02-14 15:17 ` Danilo Krummrich 1 sibling, 0 replies; 7+ messages in thread From: Alistair Popple @ 2025-02-12 1:27 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul, Danilo Krummrich, David Airlie, Simona Vetter On Fri, Jan 24, 2025 at 07:15:23PM +0100, David Hildenbrand wrote: > In case we have to retry the loop, we are missing to unlock+put the > folio. In that case, we will keep failing make_device_exclusive_range() > because we cannot grab the folio lock, and even return from the function > with the folio locked and referenced, effectively never succeeding the > make_device_exclusive_range(). > > While at it, convert the other unlock+put to use a folio as well. > > This was found by code inspection. I couldn't (easily at least) trigger this condition from userspace, probably because the window is pretty tight between mmu_interval_read_begin() and mmu_interval_read_retry(). But I tested it by manually forcing at least one retry and it fixes the issue observed via inspection. There's no reason to think it would be entirely impossible to hit either, and the change looks good so please add: Reviewed-by: Alistair Popple <apopple@nvidia.com> Tested-by: Alistair Popple <apopple@nvidia.com> > Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > drivers/gpu/drm/nouveau/nouveau_svm.c | 9 +++++++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c > index b4da82ddbb6b2..8ea98f06d39af 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_svm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c > @@ -590,6 +590,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, > unsigned long timeout = > jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT); > struct mm_struct *mm = svmm->notifier.mm; > + struct folio *folio; > struct page *page; > unsigned long start = args->p.addr; > unsigned long notifier_seq; > @@ -616,12 +617,16 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, > ret = -EINVAL; > goto out; > } > + folio = page_folio(page); > > mutex_lock(&svmm->mutex); > if (!mmu_interval_read_retry(¬ifier->notifier, > notifier_seq)) > break; > mutex_unlock(&svmm->mutex); > + > + folio_unlock(folio); > + folio_put(folio); > } > > /* Map the page on the GPU. */ > @@ -637,8 +642,8 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, > ret = nvif_object_ioctl(&svmm->vmm->vmm.object, args, size, NULL); > mutex_unlock(&svmm->mutex); > > - unlock_page(page); > - put_page(page); > + folio_unlock(folio); > + folio_put(folio); > > out: > mmu_interval_notifier_remove(¬ifier->notifier); > -- > 2.47.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() 2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand 2025-02-12 1:27 ` Alistair Popple @ 2025-02-14 15:17 ` Danilo Krummrich 1 sibling, 0 replies; 7+ messages in thread From: Danilo Krummrich @ 2025-02-14 15:17 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul, David Airlie, Simona Vetter, Alistair Popple On Fri, Jan 24, 2025 at 07:15:23PM +0100, David Hildenbrand wrote: > In case we have to retry the loop, we are missing to unlock+put the > folio. In that case, we will keep failing make_device_exclusive_range() > because we cannot grab the folio lock, and even return from the function > with the folio locked and referenced, effectively never succeeding the > make_device_exclusive_range(). > > While at it, convert the other unlock+put to use a folio as well. > > This was found by code inspection. > > Fixes: 8f187163eb89 ("nouveau/svm: implement atomic SVM access") > Signed-off-by: David Hildenbrand <david@redhat.com> Applied to drm-misc-fixes, thanks! ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() 2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand 2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand @ 2025-01-24 18:15 ` David Hildenbrand 2025-02-12 1:28 ` Alistair Popple 2025-01-28 0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple 2 siblings, 1 reply; 7+ messages in thread From: David Hildenbrand @ 2025-01-24 18:15 UTC (permalink / raw) To: linux-kernel Cc: dri-devel, nouveau, David Hildenbrand, Karol Herbst, Lyude Paul, Danilo Krummrich, David Airlie, Simona Vetter, Alistair Popple ret will be modified immediately afterwards. Signed-off-by: David Hildenbrand <david@redhat.com> --- drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c index 8ea98f06d39af..2f8b8b978fc08 100644 --- a/drivers/gpu/drm/nouveau/nouveau_svm.c +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c @@ -594,7 +594,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, struct page *page; unsigned long start = args->p.addr; unsigned long notifier_seq; - int ret = 0; + int ret; ret = mmu_interval_notifier_insert(¬ifier->notifier, mm, args->p.addr, args->p.size, -- 2.47.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() 2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand @ 2025-02-12 1:28 ` Alistair Popple 0 siblings, 0 replies; 7+ messages in thread From: Alistair Popple @ 2025-02-12 1:28 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul, Danilo Krummrich, David Airlie, Simona Vetter On Fri, Jan 24, 2025 at 07:15:24PM +0100, David Hildenbrand wrote: > ret will be modified immediately afterwards. Yep. Thanks for fixing. Reviewed-by: Alistair Popple <apopple@nvidia.com> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > drivers/gpu/drm/nouveau/nouveau_svm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c b/drivers/gpu/drm/nouveau/nouveau_svm.c > index 8ea98f06d39af..2f8b8b978fc08 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_svm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_svm.c > @@ -594,7 +594,7 @@ static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm, > struct page *page; > unsigned long start = args->p.addr; > unsigned long notifier_seq; > - int ret = 0; > + int ret; > > ret = mmu_interval_notifier_insert(¬ifier->notifier, mm, > args->p.addr, args->p.size, > -- > 2.47.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() 2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand 2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand 2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand @ 2025-01-28 0:13 ` Alistair Popple 2 siblings, 0 replies; 7+ messages in thread From: Alistair Popple @ 2025-01-28 0:13 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, dri-devel, nouveau, Karol Herbst, Lyude Paul, Danilo Krummrich, David Airlie, Simona Vetter On Fri, Jan 24, 2025 at 07:15:22PM +0100, David Hildenbrand wrote: > One fix and a minor cleanup. > > Only compile-tested due to lack of HW, so I'd be happy if someone with > access to HW could test. But not sure how easy this is to trigger. Likely > some concurrent MADV_DONTNEED on the PTE we just converted might be able > to trigger it. I have this setup so will certainly test it and see if I can trigger the bug as well. Thanks for the fixes. > Cc: Karol Herbst <kherbst@redhat.com> > Cc: Lyude Paul <lyude@redhat.com> > Cc: Danilo Krummrich <dakr@kernel.org> > Cc: David Airlie <airlied@gmail.com> > Cc: Simona Vetter <simona@ffwll.ch> > Cc: Alistair Popple <apopple@nvidia.com> > > David Hildenbrand (2): > nouveau/svm: fix missing folio unlock + put after > make_device_exclusive_range() > nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() > > drivers/gpu/drm/nouveau/nouveau_svm.c | 11 ++++++++--- > 1 file changed, 8 insertions(+), 3 deletions(-) > > -- > 2.47.1 > ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-14 15:17 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-01-24 18:15 [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() David Hildenbrand 2025-01-24 18:15 ` [PATCH v1 1/2] nouveau/svm: fix missing folio unlock + put after make_device_exclusive_range() David Hildenbrand 2025-02-12 1:27 ` Alistair Popple 2025-02-14 15:17 ` Danilo Krummrich 2025-01-24 18:15 ` [PATCH v1 2/2] nouveau/svm: don't initialize ret in nouveau_atomic_range_fault() David Hildenbrand 2025-02-12 1:28 ` Alistair Popple 2025-01-28 0:13 ` [PATCH v1 0/2] nouveau/svm: fix + cleanup for nouveau_atomic_range_fault() Alistair Popple
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox