* [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings
2015-08-10 8:51 [PATCH v3 1/3] drm/i915: Only update the current userptr worker Chris Wilson
@ 2015-08-10 8:51 ` Chris Wilson
2015-09-09 13:56 ` [Intel-gfx] " Tvrtko Ursulin
2015-08-10 8:51 ` [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range Chris Wilson
2015-09-09 10:39 ` [PATCH v3 1/3] drm/i915: Only update the current userptr worker Tvrtko Ursulin
2 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2015-08-10 8:51 UTC (permalink / raw)
To: intel-gfx; +Cc: Chris Wilson, Michał Winiarski, Tvrtko Ursulin, stable
Michał Winiarski found a really evil way to trigger a struct_mutex
deadlock with userptr. He found that if he allocated a userptr bo and
then GTT mmaped another bo, or even itself, at the same address as the
userptr using MAP_FIXED, he could then cause a deadlock any time we then
had to invalidate the GTT mmappings (so at will). Tvrtko then found by
repeatedly allocating GTT mmappings he could alias with an old userptr
mmap and also trigger the deadlock.
To counter act the deadlock, we make the observation that we only need
to take the struct_mutex if the object has any pages to revoke, and that
before userspace can alias with the userptr address space, it must have
invalidated the userptr->pages. Thus if we can check for those pages
outside of the struct_mutex, we can avoid the deadlock. To do so we
introduce a separate flag for userptr objects that we can inspect from
the mmu-notifier underneath its spinlock.
The patch makes one eye-catching change. That is the removal serial=0
after detecting a to-be-freed object inside the invalidate walker. I
felt setting serial=0 was a questionable pessimisation: it denies us the
chance to reuse the current iterator for the next loop (before it is
freed) and being explicit makes the reader question the validity of the
locking (since the object-free race could occur elsewhere). The
serialisation of the iterator is through the spinlock, if the object is
freed before the next loop then the notifier.serial will be incremented
and we start the walk from the beginning as we detect the invalid cache.
To try and tame the error paths and interactions with the userptr->active
flag, we have to do a fair amount of rearranging of get_pages_userptr().
v2: Grammar fixes
v3: Reorder set-active so that it is only set when obj->pages is set
(and so needs cancellation). Only the order of setting obj->pages and
the active-flag is crucial. Calling gup after invalidate-range begin
means the userptr sees the new set of backing storage (and so will not
need to invalidate its new pages), but we have to be careful not to set
the active-flag prior to successfully establishing obj->pages.
v4: Take the active->flag early so we know in the mmu-notifier when we
have to cancel a pending gup-worker.
Reported-by: Michał Winiarski <michal.winiarski@intel.com>
Testcase: igt/gem_userptr_blits/map-fixed*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 111 ++++++++++++++++++++++----------
1 file changed, 78 insertions(+), 33 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index 800a5394aa1e..e21f885db87b 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -59,6 +59,7 @@ struct i915_mmu_object {
struct interval_tree_node it;
struct list_head link;
struct drm_i915_gem_object *obj;
+ bool active;
bool is_linear;
};
@@ -114,7 +115,8 @@ restart:
obj = mo->obj;
- if (!kref_get_unless_zero(&obj->base.refcount))
+ if (!mo->active ||
+ !kref_get_unless_zero(&obj->base.refcount))
continue;
spin_unlock(&mn->lock);
@@ -151,7 +153,8 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
else
it = interval_tree_iter_first(&mn->objects, start, end);
if (it != NULL) {
- obj = container_of(it, struct i915_mmu_object, it)->obj;
+ struct i915_mmu_object *mo =
+ container_of(it, struct i915_mmu_object, it);
/* The mmu_object is released late when destroying the
* GEM object so it is entirely possible to gain a
@@ -160,11 +163,9 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
* the struct_mutex - and consequently use it after it
* is freed and then double free it.
*/
- if (!kref_get_unless_zero(&obj->base.refcount)) {
- spin_unlock(&mn->lock);
- serial = 0;
- continue;
- }
+ if (mo->active &&
+ kref_get_unless_zero(&mo->obj->base.refcount))
+ obj = mo->obj;
serial = mn->serial;
}
@@ -566,6 +567,30 @@ __i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
}
static void
+__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj,
+ bool value)
+{
+ /* During mm_invalidate_range we need to cancel any userptr that
+ * overlaps the range being invalidated. Doing so requires the
+ * struct_mutex, and that risks recursion. In order to cause
+ * recursion, the user must alias the userptr address space with
+ * a GTT mmapping (possible with a MAP_FIXED) - then when we have
+ * to invalidate that mmaping, mm_invalidate_range is called with
+ * the userptr address *and* the struct_mutex held. To prevent that
+ * we set a flag under the i915_mmu_notifier spinlock to indicate
+ * whether this object is valid.
+ */
+#if defined(CONFIG_MMU_NOTIFIER)
+ if (obj->userptr.mmu_object == NULL)
+ return;
+
+ spin_lock(&obj->userptr.mmu_object->mn->lock);
+ obj->userptr.mmu_object->active = value;
+ spin_unlock(&obj->userptr.mmu_object->mn->lock);
+#endif
+}
+
+static void
__i915_gem_userptr_get_pages_worker(struct work_struct *_work)
{
struct get_pages_work *work = container_of(_work, typeof(*work), work);
@@ -613,6 +638,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
}
}
obj->userptr.work = ERR_PTR(ret);
+ if (ret)
+ __i915_gem_userptr_set_active(obj, false);
}
obj->userptr.workers--;
@@ -649,6 +676,18 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
* to the vma (discard or cloning) which should prevent the more
* egregious cases from causing harm.
*/
+ if (IS_ERR(obj->userptr.work)) {
+ /* active flag will have been dropped already by the worker */
+ ret = PTR_ERR(obj->userptr.work);
+ obj->userptr.work = NULL;
+ return ret;
+ }
+ if (obj->userptr.work)
+ /* active flag should still be held for the pending work */
+ return -EAGAIN;
+
+ /* Let the mmu-notifier know that we have begun and need cancellation */
+ __i915_gem_userptr_set_active(obj, true);
pvec = NULL;
pinned = 0;
@@ -657,8 +696,10 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
if (pvec == NULL) {
pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
- if (pvec == NULL)
- return -ENOMEM;
+ if (pvec == NULL) {
+ ret = -ENOMEM;
+ goto err;
+ }
}
pinned = __get_user_pages_fast(obj->userptr.ptr, num_pages,
@@ -667,7 +708,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
if (pinned < num_pages) {
if (pinned < 0) {
ret = pinned;
- pinned = 0;
+ goto err;
} else {
/* Spawn a worker so that we can acquire the
* user pages without holding our mutex. Access
@@ -688,44 +729,47 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
* that error back to this function through
* obj->userptr.work = ERR_PTR.
*/
+
ret = -EAGAIN;
- if (obj->userptr.work == NULL &&
- obj->userptr.workers < I915_GEM_USERPTR_MAX_WORKERS) {
+ if (obj->userptr.workers < I915_GEM_USERPTR_MAX_WORKERS) {
struct get_pages_work *work;
work = kmalloc(sizeof(*work), GFP_KERNEL);
- if (work != NULL) {
- obj->userptr.work = &work->work;
- obj->userptr.workers++;
+ if (work == NULL) {
+ ret = -ENOMEM;
+ goto err;
+ }
+ obj->userptr.work = &work->work;
+ obj->userptr.workers++;
- work->obj = obj;
- drm_gem_object_reference(&obj->base);
+ work->obj = obj;
+ drm_gem_object_reference(&obj->base);
- work->task = current;
- get_task_struct(work->task);
+ work->task = current;
+ get_task_struct(work->task);
- INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
- schedule_work(&work->work);
- } else
- ret = -ENOMEM;
- } else {
- if (IS_ERR(obj->userptr.work)) {
- ret = PTR_ERR(obj->userptr.work);
- obj->userptr.work = NULL;
- }
+ INIT_WORK(&work->work,
+ __i915_gem_userptr_get_pages_worker);
+ schedule_work(&work->work);
}
+ goto err_active;
}
} else {
ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
- if (ret == 0) {
- obj->userptr.work = NULL;
- pinned = 0;
- }
+ if (ret)
+ goto err;
}
- release_pages(pvec, pinned, 0);
+out:
drm_free_large(pvec);
return ret;
+
+err:
+ /* No pages here, no need for the mmu-notifier to wake us */
+ __i915_gem_userptr_set_active(obj, false);
+err_active:
+ release_pages(pvec, pinned, 0);
+ goto out;
}
static void
@@ -734,6 +778,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj)
struct sg_page_iter sg_iter;
BUG_ON(obj->userptr.work != NULL);
+ __i915_gem_userptr_set_active(obj, false);
if (obj->madv != I915_MADV_WILLNEED)
obj->dirty = 0;
--
2.5.0
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [Intel-gfx] [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings
2015-08-10 8:51 ` [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings Chris Wilson
@ 2015-09-09 13:56 ` Tvrtko Ursulin
2015-09-09 15:03 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-09 13:56 UTC (permalink / raw)
To: Chris Wilson, intel-gfx; +Cc: stable
Hi,
On 08/10/2015 09:51 AM, Chris Wilson wrote:
> Michał Winiarski found a really evil way to trigger a struct_mutex
> deadlock with userptr. He found that if he allocated a userptr bo and
> then GTT mmaped another bo, or even itself, at the same address as the
> userptr using MAP_FIXED, he could then cause a deadlock any time we then
> had to invalidate the GTT mmappings (so at will). Tvrtko then found by
> repeatedly allocating GTT mmappings he could alias with an old userptr
> mmap and also trigger the deadlock.
>
> To counter act the deadlock, we make the observation that we only need
> to take the struct_mutex if the object has any pages to revoke, and that
> before userspace can alias with the userptr address space, it must have
> invalidated the userptr->pages. Thus if we can check for those pages
> outside of the struct_mutex, we can avoid the deadlock. To do so we
> introduce a separate flag for userptr objects that we can inspect from
> the mmu-notifier underneath its spinlock.
>
> The patch makes one eye-catching change. That is the removal serial=0
> after detecting a to-be-freed object inside the invalidate walker. I
> felt setting serial=0 was a questionable pessimisation: it denies us the
> chance to reuse the current iterator for the next loop (before it is
> freed) and being explicit makes the reader question the validity of the
> locking (since the object-free race could occur elsewhere). The
> serialisation of the iterator is through the spinlock, if the object is
> freed before the next loop then the notifier.serial will be incremented
> and we start the walk from the beginning as we detect the invalid cache.
>
> To try and tame the error paths and interactions with the userptr->active
> flag, we have to do a fair amount of rearranging of get_pages_userptr().
>
> v2: Grammar fixes
> v3: Reorder set-active so that it is only set when obj->pages is set
> (and so needs cancellation). Only the order of setting obj->pages and
> the active-flag is crucial. Calling gup after invalidate-range begin
> means the userptr sees the new set of backing storage (and so will not
> need to invalidate its new pages), but we have to be careful not to set
> the active-flag prior to successfully establishing obj->pages.
> v4: Take the active->flag early so we know in the mmu-notifier when we
> have to cancel a pending gup-worker.
>
> Reported-by: Michał Winiarski <michal.winiarski@intel.com>
> Testcase: igt/gem_userptr_blits/map-fixed*
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: stable@vger.kernel.org
> ---
> drivers/gpu/drm/i915/i915_gem_userptr.c | 111 ++++++++++++++++++++++----------
> 1 file changed, 78 insertions(+), 33 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index 800a5394aa1e..e21f885db87b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -59,6 +59,7 @@ struct i915_mmu_object {
> struct interval_tree_node it;
> struct list_head link;
> struct drm_i915_gem_object *obj;
> + bool active;
> bool is_linear;
> };
>
> @@ -114,7 +115,8 @@ restart:
>
> obj = mo->obj;
>
> - if (!kref_get_unless_zero(&obj->base.refcount))
> + if (!mo->active ||
> + !kref_get_unless_zero(&obj->base.refcount))
> continue;
>
> spin_unlock(&mn->lock);
> @@ -151,7 +153,8 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> else
> it = interval_tree_iter_first(&mn->objects, start, end);
> if (it != NULL) {
> - obj = container_of(it, struct i915_mmu_object, it)->obj;
> + struct i915_mmu_object *mo =
> + container_of(it, struct i915_mmu_object, it);
>
> /* The mmu_object is released late when destroying the
> * GEM object so it is entirely possible to gain a
> @@ -160,11 +163,9 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
> * the struct_mutex - and consequently use it after it
> * is freed and then double free it.
> */
> - if (!kref_get_unless_zero(&obj->base.refcount)) {
> - spin_unlock(&mn->lock);
> - serial = 0;
> - continue;
> - }
> + if (mo->active &&
> + kref_get_unless_zero(&mo->obj->base.refcount))
> + obj = mo->obj;
>
> serial = mn->serial;
> }
> @@ -566,6 +567,30 @@ __i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
> }
>
> static void
> +__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj,
> + bool value)
> +{
> + /* During mm_invalidate_range we need to cancel any userptr that
> + * overlaps the range being invalidated. Doing so requires the
> + * struct_mutex, and that risks recursion. In order to cause
> + * recursion, the user must alias the userptr address space with
> + * a GTT mmapping (possible with a MAP_FIXED) - then when we have
> + * to invalidate that mmaping, mm_invalidate_range is called with
> + * the userptr address *and* the struct_mutex held. To prevent that
> + * we set a flag under the i915_mmu_notifier spinlock to indicate
> + * whether this object is valid.
> + */
> +#if defined(CONFIG_MMU_NOTIFIER)
> + if (obj->userptr.mmu_object == NULL)
> + return;
> +
> + spin_lock(&obj->userptr.mmu_object->mn->lock);
> + obj->userptr.mmu_object->active = value;
> + spin_unlock(&obj->userptr.mmu_object->mn->lock);
> +#endif
> +}
> +
> +static void
> __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> {
> struct get_pages_work *work = container_of(_work, typeof(*work), work);
> @@ -613,6 +638,8 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> }
> }
> obj->userptr.work = ERR_PTR(ret);
> + if (ret)
> + __i915_gem_userptr_set_active(obj, false);
> }
>
> obj->userptr.workers--;
> @@ -649,6 +676,18 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> * to the vma (discard or cloning) which should prevent the more
> * egregious cases from causing harm.
> */
> + if (IS_ERR(obj->userptr.work)) {
> + /* active flag will have been dropped already by the worker */
> + ret = PTR_ERR(obj->userptr.work);
> + obj->userptr.work = NULL;
> + return ret;
> + }
> + if (obj->userptr.work)
> + /* active flag should still be held for the pending work */
> + return -EAGAIN;
> +
> + /* Let the mmu-notifier know that we have begun and need cancellation */
> + __i915_gem_userptr_set_active(obj, true);
>
> pvec = NULL;
> pinned = 0;
> @@ -657,8 +696,10 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
> if (pvec == NULL) {
> pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
> - if (pvec == NULL)
> - return -ENOMEM;
> + if (pvec == NULL) {
> + ret = -ENOMEM;
> + goto err;
> + }
> }
>
> pinned = __get_user_pages_fast(obj->userptr.ptr, num_pages,
> @@ -667,7 +708,7 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> if (pinned < num_pages) {
> if (pinned < 0) {
> ret = pinned;
> - pinned = 0;
> + goto err;
> } else {
> /* Spawn a worker so that we can acquire the
> * user pages without holding our mutex. Access
> @@ -688,44 +729,47 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
> * that error back to this function through
> * obj->userptr.work = ERR_PTR.
> */
> +
> ret = -EAGAIN;
> - if (obj->userptr.work == NULL &&
> - obj->userptr.workers < I915_GEM_USERPTR_MAX_WORKERS) {
> + if (obj->userptr.workers < I915_GEM_USERPTR_MAX_WORKERS) {
> struct get_pages_work *work;
>
> work = kmalloc(sizeof(*work), GFP_KERNEL);
> - if (work != NULL) {
> - obj->userptr.work = &work->work;
> - obj->userptr.workers++;
> + if (work == NULL) {
> + ret = -ENOMEM;
> + goto err;
> + }
> + obj->userptr.work = &work->work;
> + obj->userptr.workers++;
>
> - work->obj = obj;
> - drm_gem_object_reference(&obj->base);
> + work->obj = obj;
> + drm_gem_object_reference(&obj->base);
>
> - work->task = current;
> - get_task_struct(work->task);
> + work->task = current;
> + get_task_struct(work->task);
>
> - INIT_WORK(&work->work, __i915_gem_userptr_get_pages_worker);
> - schedule_work(&work->work);
> - } else
> - ret = -ENOMEM;
> - } else {
> - if (IS_ERR(obj->userptr.work)) {
> - ret = PTR_ERR(obj->userptr.work);
> - obj->userptr.work = NULL;
> - }
> + INIT_WORK(&work->work,
> + __i915_gem_userptr_get_pages_worker);
> + schedule_work(&work->work);
> }
> + goto err_active;
> }
> } else {
> ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> - if (ret == 0) {
> - obj->userptr.work = NULL;
> - pinned = 0;
> - }
> + if (ret)
> + goto err;
> }
>
> - release_pages(pvec, pinned, 0);
> +out:
> drm_free_large(pvec);
> return ret;
> +
> +err:
> + /* No pages here, no need for the mmu-notifier to wake us */
> + __i915_gem_userptr_set_active(obj, false);
> +err_active:
> + release_pages(pvec, pinned, 0);
> + goto out;
> }
I don't like the goto dance. Would something like the below be clearer?
========================================================================
static int
i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
{
const int num_pages = obj->base.size >> PAGE_SHIFT;
struct page **pvec;
int pinned, ret;
/* If userspace should engineer that these pages are replaced in
* the vma between us binding this page into the GTT and completion
* of rendering... Their loss. If they change the mapping of their
* pages they need to create a new bo to point to the new vma.
*
* However, that still leaves open the possibility of the vma
* being copied upon fork. Which falls under the same userspace
* synchronisation issue as a regular bo, except that this time
* the process may not be expecting that a particular piece of
* memory is tied to the GPU.
*
* Fortunately, we can hook into the mmu_notifier in order to
* discard the page references prior to anything nasty happening
* to the vma (discard or cloning) which should prevent the more
* egregious cases from causing harm.
*/
if (IS_ERR(obj->userptr.work)) {
/* active flag will have been dropped already by the worker */
ret = PTR_ERR(obj->userptr.work);
obj->userptr.work = NULL;
return ret;
}
if (obj->userptr.work)
/* active flag should still be held for the pending work */
return -EAGAIN;
/* Let the mmu-notifier know that we have begun and need cancellation */
__i915_gem_userptr_set_active(obj, true);
pvec = NULL;
pinned = 0;
if (obj->userptr.mm->mm == current->mm) {
pvec = kmalloc(num_pages*sizeof(struct page *),
GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
if (pvec == NULL) {
pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
if (pvec == NULL) {
ret = -ENOMEM;
goto err_pvec;
}
}
pinned = __get_user_pages_fast(obj->userptr.ptr, num_pages,
!obj->userptr.read_only, pvec);
}
if (pinned == num_pages) {
ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
if (ret)
release_pages(pvec, pinned, 0);
goto out_gup;
} else if (pinned < 0)
ret = pinned;
goto out_gup;
}
/* Spawn a worker so that we can acquire the
* user pages without holding our mutex. Access
* to the user pages requires mmap_sem, and we have
* a strict lock ordering of mmap_sem, struct_mutex -
* we already hold struct_mutex here and so cannot
* call gup without encountering a lock inversion.
*
* Userspace will keep on repeating the operation
* (thanks to EAGAIN) until either we hit the fast
* path or the worker completes. If the worker is
* cancelled or superseded, the task is still run
* but the results ignored. (This leads to
* complications that we may have a stray object
* refcount that we need to be wary of when
* checking for existing objects during creation.)
* If the worker encounters an error, it reports
* that error back to this function through
* obj->userptr.work = ERR_PTR.
*/
if (pvec) {
release_pages(pvec, pinned, 0);
drm_free_large(pvec);
}
ret = -EAGAIN;
if (obj->userptr.workers < I915_GEM_USERPTR_MAX_WORKERS) {
struct get_pages_work *work;
work = kmalloc(sizeof(*work), GFP_KERNEL);
if (work == NULL) {
ret = -ENOMEM;
goto err_pvec;
}
obj->userptr.work = &work->work;
obj->userptr.workers++;
work->obj = obj;
drm_gem_object_reference(&obj->base);
work->task = current;
get_task_struct(work->task);
INIT_WORK(&work->work,
__i915_gem_userptr_get_pages_worker);
schedule_work(&work->work);
}
return ret;
out_gup:
drm_free_large(pvec);
err_pvec:
/* All done, no need for the mmu-notifier to wake us */
__i915_gem_userptr_set_active(obj, false);
return ret;
}
======================================================================
I think it is correct but please double check me. Maybe needs more comments
for the main three options (gup ok, gup fail, gup incomplete) but I think it
is worth if for nothing for avoidance of goto up-down jumps.
Regards,
Tvrtko
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings
2015-09-09 13:56 ` [Intel-gfx] " Tvrtko Ursulin
@ 2015-09-09 15:03 ` Chris Wilson
2015-09-10 9:44 ` [Intel-gfx] " Tvrtko Ursulin
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2015-09-09 15:03 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx, stable
On Wed, Sep 09, 2015 at 02:56:16PM +0100, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 08/10/2015 09:51 AM, Chris Wilson wrote:
> > +out:
> > drm_free_large(pvec);
> > return ret;
> > +
> > +err:
> > + /* No pages here, no need for the mmu-notifier to wake us */
> > + __i915_gem_userptr_set_active(obj, false);
> > +err_active:
> > + release_pages(pvec, pinned, 0);
> > + goto out;
> > }
>
> I don't like the goto dance. Would something like the below be clearer?
We can condense it if we use a bool active and then feed everything
through the single exit path:
active = false;
if (pinned < 0)
ret = pinned, pinned = 0;
else if (pinned < num_pages)
ret = __i915_gem_userptr_get_pages_queue(obj, &active);
else
ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
if (ret) {
__i915_gem_userptr_set_active(obj, active);
release_pages(pvec, pinned, 0);
}
drm_free_large(pvec);
return ret;
Not happy with _queue. I guess i915_gem_userptr_get_pages_via_worker()
is better. Or i915_gem_userptr_get_pages_deferred().
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [Intel-gfx] [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings
2015-09-09 15:03 ` Chris Wilson
@ 2015-09-10 9:44 ` Tvrtko Ursulin
2015-09-10 9:51 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-10 9:44 UTC (permalink / raw)
To: Chris Wilson, intel-gfx, stable
On 09/09/2015 04:03 PM, Chris Wilson wrote:
> On Wed, Sep 09, 2015 at 02:56:16PM +0100, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 08/10/2015 09:51 AM, Chris Wilson wrote:
>>> +out:
>>> drm_free_large(pvec);
>>> return ret;
>>> +
>>> +err:
>>> + /* No pages here, no need for the mmu-notifier to wake us */
>>> + __i915_gem_userptr_set_active(obj, false);
>>> +err_active:
>>> + release_pages(pvec, pinned, 0);
>>> + goto out;
>>> }
>>
>> I don't like the goto dance. Would something like the below be clearer?
>
> We can condense it if we use a bool active and then feed everything
> through the single exit path:
>
> active = false;
> if (pinned < 0)
> ret = pinned, pinned = 0;
> else if (pinned < num_pages)
> ret = __i915_gem_userptr_get_pages_queue(obj, &active);
> else
> ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> if (ret) {
> __i915_gem_userptr_set_active(obj, active);
> release_pages(pvec, pinned, 0);
> }
> drm_free_large(pvec);
> return ret;
>
> Not happy with _queue. I guess i915_gem_userptr_get_pages_via_worker()
> is better. Or i915_gem_userptr_get_pages_deferred().
Looks much better on a glance. If release_pages with pinned = 0 is OK.
For the queueue/via_worker/deferred maybe _schedule_get_pages_worker?
Tvrtko
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings
2015-09-10 9:44 ` [Intel-gfx] " Tvrtko Ursulin
@ 2015-09-10 9:51 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2015-09-10 9:51 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx, stable
On Thu, Sep 10, 2015 at 10:44:14AM +0100, Tvrtko Ursulin wrote:
>
> On 09/09/2015 04:03 PM, Chris Wilson wrote:
> >On Wed, Sep 09, 2015 at 02:56:16PM +0100, Tvrtko Ursulin wrote:
> >>
> >>Hi,
> >>
> >>On 08/10/2015 09:51 AM, Chris Wilson wrote:
> >>>+out:
> >>> drm_free_large(pvec);
> >>> return ret;
> >>>+
> >>>+err:
> >>>+ /* No pages here, no need for the mmu-notifier to wake us */
> >>>+ __i915_gem_userptr_set_active(obj, false);
> >>>+err_active:
> >>>+ release_pages(pvec, pinned, 0);
> >>>+ goto out;
> >>> }
> >>
> >>I don't like the goto dance. Would something like the below be clearer?
> >
> >We can condense it if we use a bool active and then feed everything
> >through the single exit path:
> >
> > active = false;
> > if (pinned < 0)
> > ret = pinned, pinned = 0;
> > else if (pinned < num_pages)
> > ret = __i915_gem_userptr_get_pages_queue(obj, &active);
> > else
> > ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> > if (ret) {
> > __i915_gem_userptr_set_active(obj, active);
> > release_pages(pvec, pinned, 0);
> > }
> > drm_free_large(pvec);
> > return ret;
> >
> >Not happy with _queue. I guess i915_gem_userptr_get_pages_via_worker()
> >is better. Or i915_gem_userptr_get_pages_deferred().
>
> Looks much better on a glance. If release_pages with pinned = 0 is OK.
It does a loop over for(int i = 0; i < pinned; i++); so it is fine as it
was.
> For the queueue/via_worker/deferred maybe _schedule_get_pages_worker?
I want to keep the i915_gem_object_get_pages prefix (haven't yet broken
that pattern, so no reason to start now), but _schedule seems
reasonable.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-08-10 8:51 [PATCH v3 1/3] drm/i915: Only update the current userptr worker Chris Wilson
2015-08-10 8:51 ` [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings Chris Wilson
@ 2015-08-10 8:51 ` Chris Wilson
2015-09-09 14:45 ` Tvrtko Ursulin
2015-09-09 10:39 ` [PATCH v3 1/3] drm/i915: Only update the current userptr worker Tvrtko Ursulin
2 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2015-08-10 8:51 UTC (permalink / raw)
To: intel-gfx
Whilst discussing possible ways to trigger an invalidate_range on a
userptr with an aliased GGTT mmapping (and so cause a struct_mutex
deadlock), the conclusion is that we can, and we must, prevent any
possible deadlock by avoiding taking the mutex at all during
invalidate_range. This has numerous advantages all of which stem from
avoid the sleeping function from inside the unknown context. In
particular, it simplifies the invalidate_range because we no longer
have to juggle the spinlock/mutex and can just hold the spinlock
for the entire walk. To compensate, we have to make get_pages a bit more
complicated in order to serialise with a pending cancel_userptr worker.
As we hold the struct_mutex, we have no choice but to return EAGAIN and
hope that the worker is then flushed before we retry after reacquiring
the struct_mutex.
The important caveat is that the invalidate_range itself is no longer
synchronous. There exists a small but definite period in time in which
the old PTE's page remain accessible via the GPU. Note however that the
physical pages themselves are not invalidated by the mmu_notifier, just
the CPU view of the address space. The impact should be limited to a
delay in pages being flushed, rather than a possibility of writing to
the wrong pages. The only race condition that this worsens is remapping
an userptr active on the GPU where fresh work may still reference the
old pages due to struct_mutex contention. Given that userspace is racing
with the GPU, it is fair to say that the results are undefined.
v2: Only queue (and importantly only take one refcnt) the worker once.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
drivers/gpu/drm/i915/i915_gem_userptr.c | 146 +++++++++++++-------------------
1 file changed, 60 insertions(+), 86 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index e21f885db87b..17344dac807e 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -50,7 +50,6 @@ struct i915_mmu_notifier {
struct mmu_notifier mn;
struct rb_root objects;
struct list_head linear;
- unsigned long serial;
bool has_linear;
};
@@ -59,14 +58,16 @@ struct i915_mmu_object {
struct interval_tree_node it;
struct list_head link;
struct drm_i915_gem_object *obj;
+ struct work_struct work;
bool active;
bool is_linear;
};
-static unsigned long cancel_userptr(struct drm_i915_gem_object *obj)
+static void __cancel_userptr__worker(struct work_struct *work)
{
+ struct i915_mmu_object *mo = container_of(work, typeof(*mo), work);
+ struct drm_i915_gem_object *obj = mo->obj;
struct drm_device *dev = obj->base.dev;
- unsigned long end;
mutex_lock(&dev->struct_mutex);
/* Cancel any active worker and force us to re-evaluate gup */
@@ -89,46 +90,28 @@ static unsigned long cancel_userptr(struct drm_i915_gem_object *obj)
dev_priv->mm.interruptible = was_interruptible;
}
- end = obj->userptr.ptr + obj->base.size;
-
drm_gem_object_unreference(&obj->base);
mutex_unlock(&dev->struct_mutex);
-
- return end;
}
-static void *invalidate_range__linear(struct i915_mmu_notifier *mn,
- struct mm_struct *mm,
- unsigned long start,
- unsigned long end)
+static unsigned long cancel_userptr(struct i915_mmu_object *mo)
{
- struct i915_mmu_object *mo;
- unsigned long serial;
-
-restart:
- serial = mn->serial;
- list_for_each_entry(mo, &mn->linear, link) {
- struct drm_i915_gem_object *obj;
-
- if (mo->it.last < start || mo->it.start > end)
- continue;
-
- obj = mo->obj;
-
- if (!mo->active ||
- !kref_get_unless_zero(&obj->base.refcount))
- continue;
-
- spin_unlock(&mn->lock);
-
- cancel_userptr(obj);
-
- spin_lock(&mn->lock);
- if (serial != mn->serial)
- goto restart;
+ unsigned long end = mo->obj->userptr.ptr + mo->obj->base.size;
+
+ /* The mmu_object is released late when destroying the
+ * GEM object so it is entirely possible to gain a
+ * reference on an object in the process of being freed
+ * since our serialisation is via the spinlock and not
+ * the struct_mutex - and consequently use it after it
+ * is freed and then double free it.
+ */
+ if (mo->active && kref_get_unless_zero(&mo->obj->base.refcount)) {
+ schedule_work(&mo->work);
+ /* only schedule one work packet to avoid the refleak */
+ mo->active = false;
}
- return NULL;
+ return end;
}
static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
@@ -136,45 +119,32 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
unsigned long start,
unsigned long end)
{
- struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn);
- struct interval_tree_node *it = NULL;
- unsigned long next = start;
- unsigned long serial = 0;
-
- end--; /* interval ranges are inclusive, but invalidate range is exclusive */
- while (next < end) {
- struct drm_i915_gem_object *obj = NULL;
-
- spin_lock(&mn->lock);
- if (mn->has_linear)
- it = invalidate_range__linear(mn, mm, start, end);
- else if (serial == mn->serial)
- it = interval_tree_iter_next(it, next, end);
- else
- it = interval_tree_iter_first(&mn->objects, start, end);
- if (it != NULL) {
- struct i915_mmu_object *mo =
- container_of(it, struct i915_mmu_object, it);
-
- /* The mmu_object is released late when destroying the
- * GEM object so it is entirely possible to gain a
- * reference on an object in the process of being freed
- * since our serialisation is via the spinlock and not
- * the struct_mutex - and consequently use it after it
- * is freed and then double free it.
- */
- if (mo->active &&
- kref_get_unless_zero(&mo->obj->base.refcount))
- obj = mo->obj;
+ struct i915_mmu_notifier *mn =
+ container_of(_mn, struct i915_mmu_notifier, mn);
+ struct i915_mmu_object *mo;
+
+ /* interval ranges are inclusive, but invalidate range is exclusive */
+ end--;
+
+ spin_lock(&mn->lock);
+ if (mn->has_linear) {
+ list_for_each_entry(mo, &mn->linear, link) {
+ if (mo->it.last < start || mo->it.start > end)
+ continue;
- serial = mn->serial;
+ cancel_userptr(mo);
}
- spin_unlock(&mn->lock);
- if (obj == NULL)
- return;
+ } else {
+ struct interval_tree_node *it;
- next = cancel_userptr(obj);
+ it = interval_tree_iter_first(&mn->objects, start, end);
+ while (it) {
+ mo = container_of(it, struct i915_mmu_object, it);
+ start = cancel_userptr(mo);
+ it = interval_tree_iter_next(it, start, end);
+ }
}
+ spin_unlock(&mn->lock);
}
static const struct mmu_notifier_ops i915_gem_userptr_notifier = {
@@ -194,7 +164,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
spin_lock_init(&mn->lock);
mn->mn.ops = &i915_gem_userptr_notifier;
mn->objects = RB_ROOT;
- mn->serial = 1;
INIT_LIST_HEAD(&mn->linear);
mn->has_linear = false;
@@ -208,12 +177,6 @@ i915_mmu_notifier_create(struct mm_struct *mm)
return mn;
}
-static void __i915_mmu_notifier_update_serial(struct i915_mmu_notifier *mn)
-{
- if (++mn->serial == 0)
- mn->serial = 1;
-}
-
static int
i915_mmu_notifier_add(struct drm_device *dev,
struct i915_mmu_notifier *mn,
@@ -260,10 +223,9 @@ i915_mmu_notifier_add(struct drm_device *dev,
} else
interval_tree_insert(&mo->it, &mn->objects);
- if (ret == 0) {
+ if (ret == 0)
list_add(&mo->link, &mn->linear);
- __i915_mmu_notifier_update_serial(mn);
- }
+
spin_unlock(&mn->lock);
mutex_unlock(&dev->struct_mutex);
@@ -291,7 +253,6 @@ i915_mmu_notifier_del(struct i915_mmu_notifier *mn,
mn->has_linear = i915_mmu_notifier_has_linear(mn);
else
interval_tree_remove(&mo->it, &mn->objects);
- __i915_mmu_notifier_update_serial(mn);
spin_unlock(&mn->lock);
}
@@ -358,6 +319,7 @@ i915_gem_userptr_init__mmu_notifier(struct drm_i915_gem_object *obj,
mo->it.start = obj->userptr.ptr;
mo->it.last = mo->it.start + obj->base.size - 1;
mo->obj = obj;
+ INIT_WORK(&mo->work, __cancel_userptr__worker);
ret = i915_mmu_notifier_add(obj->base.dev, mn, mo);
if (ret) {
@@ -566,10 +528,12 @@ __i915_gem_userptr_set_pages(struct drm_i915_gem_object *obj,
return ret;
}
-static void
+static int
__i915_gem_userptr_set_active(struct drm_i915_gem_object *obj,
bool value)
{
+ int ret = 0;
+
/* During mm_invalidate_range we need to cancel any userptr that
* overlaps the range being invalidated. Doing so requires the
* struct_mutex, and that risks recursion. In order to cause
@@ -582,12 +546,20 @@ __i915_gem_userptr_set_active(struct drm_i915_gem_object *obj,
*/
#if defined(CONFIG_MMU_NOTIFIER)
if (obj->userptr.mmu_object == NULL)
- return;
+ return 0;
spin_lock(&obj->userptr.mmu_object->mn->lock);
- obj->userptr.mmu_object->active = value;
+ /* In order to serialise get_pages with an outstanding
+ * cancel_userptr, we must drop the struct_mutex and try again.
+ */
+ if (!value || !work_pending(&obj->userptr.mmu_object->work))
+ obj->userptr.mmu_object->active = value;
+ else
+ ret = -EAGAIN;
spin_unlock(&obj->userptr.mmu_object->mn->lock);
#endif
+
+ return ret;
}
static void
@@ -687,7 +659,9 @@ i915_gem_userptr_get_pages(struct drm_i915_gem_object *obj)
return -EAGAIN;
/* Let the mmu-notifier know that we have begun and need cancellation */
- __i915_gem_userptr_set_active(obj, true);
+ ret = __i915_gem_userptr_set_active(obj, true);
+ if (ret)
+ return ret;
pvec = NULL;
pinned = 0;
--
2.5.0
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply related [flat|nested] 14+ messages in thread* Re: [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-08-10 8:51 ` [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range Chris Wilson
@ 2015-09-09 14:45 ` Tvrtko Ursulin
2015-09-09 15:08 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-09 14:45 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 08/10/2015 09:51 AM, Chris Wilson wrote:
> Whilst discussing possible ways to trigger an invalidate_range on a
> userptr with an aliased GGTT mmapping (and so cause a struct_mutex
> deadlock), the conclusion is that we can, and we must, prevent any
> possible deadlock by avoiding taking the mutex at all during
> invalidate_range. This has numerous advantages all of which stem from
> avoid the sleeping function from inside the unknown context. In
> particular, it simplifies the invalidate_range because we no longer
> have to juggle the spinlock/mutex and can just hold the spinlock
> for the entire walk. To compensate, we have to make get_pages a bit more
> complicated in order to serialise with a pending cancel_userptr worker.
> As we hold the struct_mutex, we have no choice but to return EAGAIN and
> hope that the worker is then flushed before we retry after reacquiring
> the struct_mutex.
>
> The important caveat is that the invalidate_range itself is no longer
> synchronous. There exists a small but definite period in time in which
> the old PTE's page remain accessible via the GPU. Note however that the
> physical pages themselves are not invalidated by the mmu_notifier, just
> the CPU view of the address space. The impact should be limited to a
> delay in pages being flushed, rather than a possibility of writing to
> the wrong pages. The only race condition that this worsens is remapping
> an userptr active on the GPU where fresh work may still reference the
> old pages due to struct_mutex contention. Given that userspace is racing
> with the GPU, it is fair to say that the results are undefined.
>
> v2: Only queue (and importantly only take one refcnt) the worker once.
This one I looked at at the time of previous posting and it looked fine,
minus one wrong line of thinking of mine. On a brief look it still looks
good, so:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
I assume Michał has run all these through the relevant test cases?
Slightly related, I now worry about the WARN_ONs in
__cancel_userptr__worker since they look to be triggerable by malicious
userspace which is not good.
Also my proposed error handling for the previous patch is slightly wrong
because I misremebered what mo->active stands for.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-09-09 14:45 ` Tvrtko Ursulin
@ 2015-09-09 15:08 ` Chris Wilson
2015-09-09 15:20 ` Tvrtko Ursulin
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2015-09-09 15:08 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Wed, Sep 09, 2015 at 03:45:40PM +0100, Tvrtko Ursulin wrote:
> On 08/10/2015 09:51 AM, Chris Wilson wrote:
> >Whilst discussing possible ways to trigger an invalidate_range on a
> >userptr with an aliased GGTT mmapping (and so cause a struct_mutex
> >deadlock), the conclusion is that we can, and we must, prevent any
> >possible deadlock by avoiding taking the mutex at all during
> >invalidate_range. This has numerous advantages all of which stem from
> >avoid the sleeping function from inside the unknown context. In
> >particular, it simplifies the invalidate_range because we no longer
> >have to juggle the spinlock/mutex and can just hold the spinlock
> >for the entire walk. To compensate, we have to make get_pages a bit more
> >complicated in order to serialise with a pending cancel_userptr worker.
> >As we hold the struct_mutex, we have no choice but to return EAGAIN and
> >hope that the worker is then flushed before we retry after reacquiring
> >the struct_mutex.
> >
> >The important caveat is that the invalidate_range itself is no longer
> >synchronous. There exists a small but definite period in time in which
> >the old PTE's page remain accessible via the GPU. Note however that the
> >physical pages themselves are not invalidated by the mmu_notifier, just
> >the CPU view of the address space. The impact should be limited to a
> >delay in pages being flushed, rather than a possibility of writing to
> >the wrong pages. The only race condition that this worsens is remapping
> >an userptr active on the GPU where fresh work may still reference the
> >old pages due to struct_mutex contention. Given that userspace is racing
> >with the GPU, it is fair to say that the results are undefined.
> >
> >v2: Only queue (and importantly only take one refcnt) the worker once.
>
> This one I looked at at the time of previous posting and it looked
> fine, minus one wrong line of thinking of mine. On a brief look it
> still looks good, so:
>
> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> I assume Michał has run all these through the relevant test cases?
>
> Slightly related, I now worry about the WARN_ONs in
> __cancel_userptr__worker since they look to be triggerable by
> malicious userspace which is not good.
They could always be I thought, if you could somehow pin the userptr
into a hardware register and then unmap the vma. That is a scary thought
and one I would like a WARN for. That should be the only way, and I shudder
at the prospect of working out who to send the SIGBUS to.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-09-09 15:08 ` Chris Wilson
@ 2015-09-09 15:20 ` Tvrtko Ursulin
2015-09-09 15:42 ` Chris Wilson
0 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-09 15:20 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 09/09/2015 04:08 PM, Chris Wilson wrote:
> On Wed, Sep 09, 2015 at 03:45:40PM +0100, Tvrtko Ursulin wrote:
>> On 08/10/2015 09:51 AM, Chris Wilson wrote:
>>> Whilst discussing possible ways to trigger an invalidate_range on a
>>> userptr with an aliased GGTT mmapping (and so cause a struct_mutex
>>> deadlock), the conclusion is that we can, and we must, prevent any
>>> possible deadlock by avoiding taking the mutex at all during
>>> invalidate_range. This has numerous advantages all of which stem from
>>> avoid the sleeping function from inside the unknown context. In
>>> particular, it simplifies the invalidate_range because we no longer
>>> have to juggle the spinlock/mutex and can just hold the spinlock
>>> for the entire walk. To compensate, we have to make get_pages a bit more
>>> complicated in order to serialise with a pending cancel_userptr worker.
>>> As we hold the struct_mutex, we have no choice but to return EAGAIN and
>>> hope that the worker is then flushed before we retry after reacquiring
>>> the struct_mutex.
>>>
>>> The important caveat is that the invalidate_range itself is no longer
>>> synchronous. There exists a small but definite period in time in which
>>> the old PTE's page remain accessible via the GPU. Note however that the
>>> physical pages themselves are not invalidated by the mmu_notifier, just
>>> the CPU view of the address space. The impact should be limited to a
>>> delay in pages being flushed, rather than a possibility of writing to
>>> the wrong pages. The only race condition that this worsens is remapping
>>> an userptr active on the GPU where fresh work may still reference the
>>> old pages due to struct_mutex contention. Given that userspace is racing
>>> with the GPU, it is fair to say that the results are undefined.
>>>
>>> v2: Only queue (and importantly only take one refcnt) the worker once.
>>
>> This one I looked at at the time of previous posting and it looked
>> fine, minus one wrong line of thinking of mine. On a brief look it
>> still looks good, so:
>>
>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> I assume Michał has run all these through the relevant test cases?
>>
>> Slightly related, I now worry about the WARN_ONs in
>> __cancel_userptr__worker since they look to be triggerable by
>> malicious userspace which is not good.
>
> They could always be I thought, if you could somehow pin the userptr
> into a hardware register and then unmap the vma. That is a scary thought
> and one I would like a WARN for. That should be the only way, and I shudder
> at the prospect of working out who to send the SIGBUS to.
Is it not enough to submit work to the GPU and while it is running
engineer a lot of signals and munmap?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-09-09 15:20 ` Tvrtko Ursulin
@ 2015-09-09 15:42 ` Chris Wilson
2015-09-10 9:50 ` Tvrtko Ursulin
0 siblings, 1 reply; 14+ messages in thread
From: Chris Wilson @ 2015-09-09 15:42 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Wed, Sep 09, 2015 at 04:20:08PM +0100, Tvrtko Ursulin wrote:
>
> On 09/09/2015 04:08 PM, Chris Wilson wrote:
> >On Wed, Sep 09, 2015 at 03:45:40PM +0100, Tvrtko Ursulin wrote:
> >>On 08/10/2015 09:51 AM, Chris Wilson wrote:
> >>>Whilst discussing possible ways to trigger an invalidate_range on a
> >>>userptr with an aliased GGTT mmapping (and so cause a struct_mutex
> >>>deadlock), the conclusion is that we can, and we must, prevent any
> >>>possible deadlock by avoiding taking the mutex at all during
> >>>invalidate_range. This has numerous advantages all of which stem from
> >>>avoid the sleeping function from inside the unknown context. In
> >>>particular, it simplifies the invalidate_range because we no longer
> >>>have to juggle the spinlock/mutex and can just hold the spinlock
> >>>for the entire walk. To compensate, we have to make get_pages a bit more
> >>>complicated in order to serialise with a pending cancel_userptr worker.
> >>>As we hold the struct_mutex, we have no choice but to return EAGAIN and
> >>>hope that the worker is then flushed before we retry after reacquiring
> >>>the struct_mutex.
> >>>
> >>>The important caveat is that the invalidate_range itself is no longer
> >>>synchronous. There exists a small but definite period in time in which
> >>>the old PTE's page remain accessible via the GPU. Note however that the
> >>>physical pages themselves are not invalidated by the mmu_notifier, just
> >>>the CPU view of the address space. The impact should be limited to a
> >>>delay in pages being flushed, rather than a possibility of writing to
> >>>the wrong pages. The only race condition that this worsens is remapping
> >>>an userptr active on the GPU where fresh work may still reference the
> >>>old pages due to struct_mutex contention. Given that userspace is racing
> >>>with the GPU, it is fair to say that the results are undefined.
> >>>
> >>>v2: Only queue (and importantly only take one refcnt) the worker once.
> >>
> >>This one I looked at at the time of previous posting and it looked
> >>fine, minus one wrong line of thinking of mine. On a brief look it
> >>still looks good, so:
> >>
> >>Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>I assume Michał has run all these through the relevant test cases?
> >>
> >>Slightly related, I now worry about the WARN_ONs in
> >>__cancel_userptr__worker since they look to be triggerable by
> >>malicious userspace which is not good.
> >
> >They could always be I thought, if you could somehow pin the userptr
> >into a hardware register and then unmap the vma. That is a scary thought
> >and one I would like a WARN for. That should be the only way, and I shudder
> >at the prospect of working out who to send the SIGBUS to.
>
> Is it not enough to submit work to the GPU and while it is running
> engineer a lot of signals and munmap?
No, we block signals inside the worker, which should reduce it down to
EINVAL/EBUSY or EIO from unbind (and a subsequent WARN from put).
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range
2015-09-09 15:42 ` Chris Wilson
@ 2015-09-10 9:50 ` Tvrtko Ursulin
0 siblings, 0 replies; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-10 9:50 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 09/09/2015 04:42 PM, Chris Wilson wrote:
> On Wed, Sep 09, 2015 at 04:20:08PM +0100, Tvrtko Ursulin wrote:
>>
>> On 09/09/2015 04:08 PM, Chris Wilson wrote:
>>> On Wed, Sep 09, 2015 at 03:45:40PM +0100, Tvrtko Ursulin wrote:
>>>> On 08/10/2015 09:51 AM, Chris Wilson wrote:
>>>>> Whilst discussing possible ways to trigger an invalidate_range on a
>>>>> userptr with an aliased GGTT mmapping (and so cause a struct_mutex
>>>>> deadlock), the conclusion is that we can, and we must, prevent any
>>>>> possible deadlock by avoiding taking the mutex at all during
>>>>> invalidate_range. This has numerous advantages all of which stem from
>>>>> avoid the sleeping function from inside the unknown context. In
>>>>> particular, it simplifies the invalidate_range because we no longer
>>>>> have to juggle the spinlock/mutex and can just hold the spinlock
>>>>> for the entire walk. To compensate, we have to make get_pages a bit more
>>>>> complicated in order to serialise with a pending cancel_userptr worker.
>>>>> As we hold the struct_mutex, we have no choice but to return EAGAIN and
>>>>> hope that the worker is then flushed before we retry after reacquiring
>>>>> the struct_mutex.
>>>>>
>>>>> The important caveat is that the invalidate_range itself is no longer
>>>>> synchronous. There exists a small but definite period in time in which
>>>>> the old PTE's page remain accessible via the GPU. Note however that the
>>>>> physical pages themselves are not invalidated by the mmu_notifier, just
>>>>> the CPU view of the address space. The impact should be limited to a
>>>>> delay in pages being flushed, rather than a possibility of writing to
>>>>> the wrong pages. The only race condition that this worsens is remapping
>>>>> an userptr active on the GPU where fresh work may still reference the
>>>>> old pages due to struct_mutex contention. Given that userspace is racing
>>>>> with the GPU, it is fair to say that the results are undefined.
>>>>>
>>>>> v2: Only queue (and importantly only take one refcnt) the worker once.
>>>>
>>>> This one I looked at at the time of previous posting and it looked
>>>> fine, minus one wrong line of thinking of mine. On a brief look it
>>>> still looks good, so:
>>>>
>>>> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> I assume Michał has run all these through the relevant test cases?
>>>>
>>>> Slightly related, I now worry about the WARN_ONs in
>>>> __cancel_userptr__worker since they look to be triggerable by
>>>> malicious userspace which is not good.
>>>
>>> They could always be I thought, if you could somehow pin the userptr
>>> into a hardware register and then unmap the vma. That is a scary thought
>>> and one I would like a WARN for. That should be the only way, and I shudder
>>> at the prospect of working out who to send the SIGBUS to.
>>
>> Is it not enough to submit work to the GPU and while it is running
>> engineer a lot of signals and munmap?
>
> No, we block signals inside the worker, which should reduce it down to
> EINVAL/EBUSY or EIO from unbind (and a subsequent WARN from put).
Yeah two lines above was obviously too far for me to spot the
interruptible business...
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH v3 1/3] drm/i915: Only update the current userptr worker
2015-08-10 8:51 [PATCH v3 1/3] drm/i915: Only update the current userptr worker Chris Wilson
2015-08-10 8:51 ` [PATCH v3 2/3] drm/i915: Fix userptr deadlock with aliased GTT mmappings Chris Wilson
2015-08-10 8:51 ` [PATCH v3 3/3] drm/i915: Use a task to cancel the userptr on invalidate_range Chris Wilson
@ 2015-09-09 10:39 ` Tvrtko Ursulin
2015-09-09 10:44 ` Chris Wilson
2 siblings, 1 reply; 14+ messages in thread
From: Tvrtko Ursulin @ 2015-09-09 10:39 UTC (permalink / raw)
To: Chris Wilson, intel-gfx
On 08/10/2015 09:51 AM, Chris Wilson wrote:
> The userptr worker allows for a slight race condition where upon there
> may two or more threads calling get_user_pages for the same object. When
> we have the array of pages, then we serialise the update of the object.
> However, the worker should only overwrite the obj->userptr.work pointer
> if and only if it is the active one. Currently we clear it for a
> secondary worker with the effect that we may rarely force a second
> lookup.
v2 changelog?
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
> drivers/gpu/drm/i915/i915_gem_userptr.c | 32 ++++++++++++++++----------------
> 1 file changed, 16 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index d11901d590ac..800a5394aa1e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -571,25 +571,25 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> struct get_pages_work *work = container_of(_work, typeof(*work), work);
> struct drm_i915_gem_object *obj = work->obj;
> struct drm_device *dev = obj->base.dev;
> - const int num_pages = obj->base.size >> PAGE_SHIFT;
> + const int npages = obj->base.size >> PAGE_SHIFT;
> struct page **pvec;
> int pinned, ret;
>
> ret = -ENOMEM;
> pinned = 0;
>
> - pvec = kmalloc(num_pages*sizeof(struct page *),
> + pvec = kmalloc(npages*sizeof(struct page *),
> GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
> if (pvec == NULL)
> - pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
> + pvec = drm_malloc_ab(npages, sizeof(struct page *));
> if (pvec != NULL) {
> struct mm_struct *mm = obj->userptr.mm->mm;
>
> down_read(&mm->mmap_sem);
> - while (pinned < num_pages) {
> + while (pinned < npages) {
> ret = get_user_pages(work->task, mm,
> obj->userptr.ptr + pinned * PAGE_SIZE,
> - num_pages - pinned,
> + npages - pinned,
If you hadn't done this renaming you could have gotten away without a v2
changelog request... :)
> !obj->userptr.read_only, 0,
> pvec + pinned, NULL);
> if (ret < 0)
> @@ -601,20 +601,20 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> }
>
> mutex_lock(&dev->struct_mutex);
> - if (obj->userptr.work != &work->work) {
> - ret = 0;
> - } else if (pinned == num_pages) {
> - ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> - if (ret == 0) {
> - list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list);
> - obj->get_page.sg = obj->pages->sgl;
> - obj->get_page.last = 0;
> -
> - pinned = 0;
> + if (obj->userptr.work == &work->work) {
> + if (pinned == npages) {
> + ret = __i915_gem_userptr_set_pages(obj, pvec, npages);
> + if (ret == 0) {
> + list_add_tail(&obj->global_list,
> + &to_i915(dev)->mm.unbound_list);
> + obj->get_page.sg = obj->pages->sgl;
> + obj->get_page.last = 0;
Wouldn't obj->get_page init fit better into
__i915_gem_userptr_set_pages? Although that code is not from this patch.
How come it is OK not to initialize them in the non-worker case?
With the v2 changelog, or dropped rename:
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread* Re: [PATCH v3 1/3] drm/i915: Only update the current userptr worker
2015-09-09 10:39 ` [PATCH v3 1/3] drm/i915: Only update the current userptr worker Tvrtko Ursulin
@ 2015-09-09 10:44 ` Chris Wilson
0 siblings, 0 replies; 14+ messages in thread
From: Chris Wilson @ 2015-09-09 10:44 UTC (permalink / raw)
To: Tvrtko Ursulin; +Cc: intel-gfx
On Wed, Sep 09, 2015 at 11:39:01AM +0100, Tvrtko Ursulin wrote:
>
> On 08/10/2015 09:51 AM, Chris Wilson wrote:
> >The userptr worker allows for a slight race condition where upon there
> >may two or more threads calling get_user_pages for the same object. When
> >we have the array of pages, then we serialise the update of the object.
> >However, the worker should only overwrite the obj->userptr.work pointer
> >if and only if it is the active one. Currently we clear it for a
> >secondary worker with the effect that we may rarely force a second
> >lookup.
>
> v2 changelog?
>
> >Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >---
> > drivers/gpu/drm/i915/i915_gem_userptr.c | 32 ++++++++++++++++----------------
> > 1 file changed, 16 insertions(+), 16 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> >index d11901d590ac..800a5394aa1e 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> >@@ -571,25 +571,25 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> > struct get_pages_work *work = container_of(_work, typeof(*work), work);
> > struct drm_i915_gem_object *obj = work->obj;
> > struct drm_device *dev = obj->base.dev;
> >- const int num_pages = obj->base.size >> PAGE_SHIFT;
> >+ const int npages = obj->base.size >> PAGE_SHIFT;
> > struct page **pvec;
> > int pinned, ret;
> >
> > ret = -ENOMEM;
> > pinned = 0;
> >
> >- pvec = kmalloc(num_pages*sizeof(struct page *),
> >+ pvec = kmalloc(npages*sizeof(struct page *),
> > GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
> > if (pvec == NULL)
> >- pvec = drm_malloc_ab(num_pages, sizeof(struct page *));
> >+ pvec = drm_malloc_ab(npages, sizeof(struct page *));
> > if (pvec != NULL) {
> > struct mm_struct *mm = obj->userptr.mm->mm;
> >
> > down_read(&mm->mmap_sem);
> >- while (pinned < num_pages) {
> >+ while (pinned < npages) {
> > ret = get_user_pages(work->task, mm,
> > obj->userptr.ptr + pinned * PAGE_SIZE,
> >- num_pages - pinned,
> >+ npages - pinned,
>
> If you hadn't done this renaming you could have gotten away without
> a v2 changelog request... :)
v2: rebase for some recent changes, rename to fix in 80 cols.
> > !obj->userptr.read_only, 0,
> > pvec + pinned, NULL);
> > if (ret < 0)
> >@@ -601,20 +601,20 @@ __i915_gem_userptr_get_pages_worker(struct work_struct *_work)
> > }
> >
> > mutex_lock(&dev->struct_mutex);
> >- if (obj->userptr.work != &work->work) {
> >- ret = 0;
> >- } else if (pinned == num_pages) {
> >- ret = __i915_gem_userptr_set_pages(obj, pvec, num_pages);
> >- if (ret == 0) {
> >- list_add_tail(&obj->global_list, &to_i915(dev)->mm.unbound_list);
> >- obj->get_page.sg = obj->pages->sgl;
> >- obj->get_page.last = 0;
> >-
> >- pinned = 0;
> >+ if (obj->userptr.work == &work->work) {
> >+ if (pinned == npages) {
> >+ ret = __i915_gem_userptr_set_pages(obj, pvec, npages);
> >+ if (ret == 0) {
> >+ list_add_tail(&obj->global_list,
> >+ &to_i915(dev)->mm.unbound_list);
> >+ obj->get_page.sg = obj->pages->sgl;
> >+ obj->get_page.last = 0;
>
> Wouldn't obj->get_page init fit better into
> __i915_gem_userptr_set_pages? Although that code is not from this
> patch. How come it is OK not to initialize them in the non-worker
> case?
It's done for us, the worker is the special case. I wanted to write the
set_pages initialiser differently so I could avoid this code, but I did
not prevail.
-Chris
--
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx
^ permalink raw reply [flat|nested] 14+ messages in thread