stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo()
@ 2024-09-10 13:11 Matthew Auld
  2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Matthew Auld @ 2024-09-10 13:11 UTC (permalink / raw)
  To: intel-xe
  Cc: Himal Prasad Ghimiray, Tejas Upadhyay, Thomas Hellström,
	stable

There is a real deadlock as well as sleeping in atomic() bug in here, if
the bo put happens to be the last ref, since bo destruction wants to
grab the same spinlock and sleeping locks.  Fix that by dropping the ref
using xe_bo_put_deferred(), and moving the final commit outside of the
lock. Dropping the lock around the put is tricky since the bo can go
out of scope and delete itself from the list, making it difficult to
navigate to the next list entry.

Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
---
 drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index e64f4b645e2e..badfa045ead8 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 	struct xe_drm_client *client;
 	struct drm_gem_object *obj;
 	struct xe_bo *bo;
+	LLIST_HEAD(deferred);
 	unsigned int id;
 	u32 mem_type;
 
@@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 	list_for_each_entry(bo, &client->bos_list, client_link) {
 		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
 			continue;
+
 		bo_meminfo(bo, stats);
-		xe_bo_put(bo);
+		xe_bo_put_deferred(bo, &deferred);
 	}
 	spin_unlock(&client->bos_lock);
 
+	xe_bo_put_commit(&deferred);
+
 	for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
 		if (!xe_mem_type_to_name[mem_type])
 			continue;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
  2024-09-10 13:11 [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo() Matthew Auld
@ 2024-09-10 13:11 ` Matthew Auld
  2024-09-10 14:16   ` Matthew Brost
  2024-09-11  5:39   ` Upadhyay, Tejas
  2024-09-10 13:55 ` [PATCH 1/4] drm/xe/client: fix deadlock " Matthew Brost
  2024-09-11  5:19 ` Upadhyay, Tejas
  2 siblings, 2 replies; 8+ messages in thread
From: Matthew Auld @ 2024-09-10 13:11 UTC (permalink / raw)
  To: intel-xe
  Cc: Himal Prasad Ghimiray, Tejas Upadhyay, Thomas Hellström,
	stable

bo_meminfo() wants to inspect bo state like tt and the ttm resource,
however this state can change at any point leading to stuff like NPD and
UAF, if the bo lock is not held. Grab the bo lock when calling
bo_meminfo(), ensuring we drop any spinlocks first. In the case of
object_idr we now also need to hold a ref.

Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: <stable@vger.kernel.org> # v6.8+
---
 drivers/gpu/drm/xe/xe_drm_client.c | 37 +++++++++++++++++++++++++++---
 1 file changed, 34 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
index badfa045ead8..3cca741c500c 100644
--- a/drivers/gpu/drm/xe/xe_drm_client.c
+++ b/drivers/gpu/drm/xe/xe_drm_client.c
@@ -10,6 +10,7 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 
+#include "xe_assert.h"
 #include "xe_bo.h"
 #include "xe_bo_types.h"
 #include "xe_device_types.h"
@@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct xe_drm_client *client,
  */
 void xe_drm_client_remove_bo(struct xe_bo *bo)
 {
+	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
 	struct xe_drm_client *client = bo->client;
 
+	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
+
 	spin_lock(&client->bos_lock);
-	list_del(&bo->client_link);
+	list_del_init(&bo->client_link);
 	spin_unlock(&client->bos_lock);
 
 	xe_drm_client_put(client);
@@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 	idr_for_each_entry(&file->object_idr, obj, id) {
 		struct xe_bo *bo = gem_to_xe_bo(obj);
 
-		bo_meminfo(bo, stats);
+		if (dma_resv_trylock(bo->ttm.base.resv)) {
+			bo_meminfo(bo, stats);
+			xe_bo_unlock(bo);
+		} else {
+			xe_bo_get(bo);
+			spin_unlock(&file->table_lock);
+
+			xe_bo_lock(bo, false);
+			bo_meminfo(bo, stats);
+			xe_bo_unlock(bo);
+
+			xe_bo_put(bo);
+			spin_lock(&file->table_lock);
+		}
 	}
 	spin_unlock(&file->table_lock);
 
@@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
 		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
 			continue;
 
-		bo_meminfo(bo, stats);
+		if (dma_resv_trylock(bo->ttm.base.resv)) {
+			bo_meminfo(bo, stats);
+			xe_bo_unlock(bo);
+		} else {
+			spin_unlock(&client->bos_lock);
+
+			xe_bo_lock(bo, false);
+			bo_meminfo(bo, stats);
+			xe_bo_unlock(bo);
+
+			spin_lock(&client->bos_lock);
+			/* The bo ref will prevent this bo from being removed from the list */
+			xe_assert(xef->xe, !list_empty(&bo->client_link));
+		}
+
 		xe_bo_put_deferred(bo, &deferred);
 	}
 	spin_unlock(&client->bos_lock);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo()
  2024-09-10 13:11 [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo() Matthew Auld
  2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
@ 2024-09-10 13:55 ` Matthew Brost
  2024-09-11  5:19 ` Upadhyay, Tejas
  2 siblings, 0 replies; 8+ messages in thread
From: Matthew Brost @ 2024-09-10 13:55 UTC (permalink / raw)
  To: Matthew Auld
  Cc: intel-xe, Himal Prasad Ghimiray, Tejas Upadhyay,
	Thomas Hellström, stable

On Tue, Sep 10, 2024 at 02:11:46PM +0100, Matthew Auld wrote:
> There is a real deadlock as well as sleeping in atomic() bug in here, if
> the bo put happens to be the last ref, since bo destruction wants to
> grab the same spinlock and sleeping locks.  Fix that by dropping the ref
> using xe_bo_put_deferred(), and moving the final commit outside of the
> lock. Dropping the lock around the put is tricky since the bo can go
> out of scope and delete itself from the list, making it difficult to
> navigate to the next list entry.
> 
> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>

Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> ---
>  drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index e64f4b645e2e..badfa045ead8 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
>  	struct xe_drm_client *client;
>  	struct drm_gem_object *obj;
>  	struct xe_bo *bo;
> +	LLIST_HEAD(deferred);
>  	unsigned int id;
>  	u32 mem_type;
>  
> @@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
>  	list_for_each_entry(bo, &client->bos_list, client_link) {
>  		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>  			continue;
> +
>  		bo_meminfo(bo, stats);
> -		xe_bo_put(bo);
> +		xe_bo_put_deferred(bo, &deferred);
>  	}
>  	spin_unlock(&client->bos_lock);
>  
> +	xe_bo_put_commit(&deferred);
> +
>  	for (mem_type = XE_PL_SYSTEM; mem_type < TTM_NUM_MEM_TYPES; ++mem_type) {
>  		if (!xe_mem_type_to_name[mem_type])
>  			continue;
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
  2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
@ 2024-09-10 14:16   ` Matthew Brost
  2024-09-11  5:39   ` Upadhyay, Tejas
  1 sibling, 0 replies; 8+ messages in thread
From: Matthew Brost @ 2024-09-10 14:16 UTC (permalink / raw)
  To: Matthew Auld
  Cc: intel-xe, Himal Prasad Ghimiray, Tejas Upadhyay,
	Thomas Hellström, stable

On Tue, Sep 10, 2024 at 02:11:47PM +0100, Matthew Auld wrote:
> bo_meminfo() wants to inspect bo state like tt and the ttm resource,
> however this state can change at any point leading to stuff like NPD and
> UAF, if the bo lock is not held. Grab the bo lock when calling
> bo_meminfo(), ensuring we drop any spinlocks first. In the case of
> object_idr we now also need to hold a ref.
> 

Path LGTM, one suggestion.

> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> ---
>  drivers/gpu/drm/xe/xe_drm_client.c | 37 +++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c b/drivers/gpu/drm/xe/xe_drm_client.c
> index badfa045ead8..3cca741c500c 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/types.h>
>  
> +#include "xe_assert.h"
>  #include "xe_bo.h"
>  #include "xe_bo_types.h"
>  #include "xe_device_types.h"
> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct xe_drm_client *client,
>   */
>  void xe_drm_client_remove_bo(struct xe_bo *bo)
>  {
> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>  	struct xe_drm_client *client = bo->client;
>  
> +	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
> +
>  	spin_lock(&client->bos_lock);
> -	list_del(&bo->client_link);
> +	list_del_init(&bo->client_link);
>  	spin_unlock(&client->bos_lock);
>  
>  	xe_drm_client_put(client);
> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
>  	idr_for_each_entry(&file->object_idr, obj, id) {
>  		struct xe_bo *bo = gem_to_xe_bo(obj);
>  
> -		bo_meminfo(bo, stats);
> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> +			bo_meminfo(bo, stats);

Add a xe_bo_assert_held to bo_meminfo.

With that exrta assert:
Reviewed-by: Matthew Brost <matthew.brost@intel.com>

> +			xe_bo_unlock(bo);
> +		} else {
> +			xe_bo_get(bo);
> +			spin_unlock(&file->table_lock);
> +
> +			xe_bo_lock(bo, false);
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +
> +			xe_bo_put(bo);
> +			spin_lock(&file->table_lock);
> +		}
>  	}
>  	spin_unlock(&file->table_lock);
>  
> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p, struct drm_file *file)
>  		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>  			continue;
>  
> -		bo_meminfo(bo, stats);
> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +		} else {
> +			spin_unlock(&client->bos_lock);
> +
> +			xe_bo_lock(bo, false);
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +
> +			spin_lock(&client->bos_lock);
> +			/* The bo ref will prevent this bo from being removed from the list */
> +			xe_assert(xef->xe, !list_empty(&bo->client_link));
> +		}
> +
>  		xe_bo_put_deferred(bo, &deferred);
>  	}
>  	spin_unlock(&client->bos_lock);
> -- 
> 2.46.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo()
  2024-09-10 13:11 [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo() Matthew Auld
  2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
  2024-09-10 13:55 ` [PATCH 1/4] drm/xe/client: fix deadlock " Matthew Brost
@ 2024-09-11  5:19 ` Upadhyay, Tejas
  2 siblings, 0 replies; 8+ messages in thread
From: Upadhyay, Tejas @ 2024-09-11  5:19 UTC (permalink / raw)
  To: Auld, Matthew, intel-xe@lists.freedesktop.org
  Cc: Ghimiray, Himal Prasad, Thomas Hellström,
	stable@vger.kernel.org



> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Tuesday, September 10, 2024 6:42 PM
> To: intel-xe@lists.freedesktop.org
> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Upadhyay,
> Tejas <tejas.upadhyay@intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
> Subject: [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo()
> 
> There is a real deadlock as well as sleeping in atomic() bug in here, if the bo
> put happens to be the last ref, since bo destruction wants to grab the same
> spinlock and sleeping locks.  Fix that by dropping the ref using
> xe_bo_put_deferred(), and moving the final commit outside of the lock.
> Dropping the lock around the put is tricky since the bo can go out of scope
> and delete itself from the list, making it difficult to navigate to the next list
> entry.
> 
> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2727
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> ---
>  drivers/gpu/drm/xe/xe_drm_client.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
> b/drivers/gpu/drm/xe/xe_drm_client.c
> index e64f4b645e2e..badfa045ead8 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -196,6 +196,7 @@ static void show_meminfo(struct drm_printer *p,
> struct drm_file *file)
>  	struct xe_drm_client *client;
>  	struct drm_gem_object *obj;
>  	struct xe_bo *bo;
> +	LLIST_HEAD(deferred);
>  	unsigned int id;
>  	u32 mem_type;
> 
> @@ -215,11 +216,14 @@ static void show_meminfo(struct drm_printer *p,
> struct drm_file *file)
>  	list_for_each_entry(bo, &client->bos_list, client_link) {
>  		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>  			continue;
> +
>  		bo_meminfo(bo, stats);
> -		xe_bo_put(bo);
> +		xe_bo_put_deferred(bo, &deferred);
>  	}
>  	spin_unlock(&client->bos_lock);
> 
> +	xe_bo_put_commit(&deferred);
> +

Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>

Thanks,
Tejas
>  	for (mem_type = XE_PL_SYSTEM; mem_type <
> TTM_NUM_MEM_TYPES; ++mem_type) {
>  		if (!xe_mem_type_to_name[mem_type])
>  			continue;
> --
> 2.46.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
  2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
  2024-09-10 14:16   ` Matthew Brost
@ 2024-09-11  5:39   ` Upadhyay, Tejas
  2024-09-11  8:35     ` Matthew Auld
  1 sibling, 1 reply; 8+ messages in thread
From: Upadhyay, Tejas @ 2024-09-11  5:39 UTC (permalink / raw)
  To: Auld, Matthew, intel-xe@lists.freedesktop.org
  Cc: Ghimiray, Himal Prasad, Thomas Hellström,
	stable@vger.kernel.org



> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Tuesday, September 10, 2024 6:42 PM
> To: intel-xe@lists.freedesktop.org
> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Upadhyay,
> Tejas <tejas.upadhyay@intel.com>; Thomas Hellström
> <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
> Subject: [PATCH 2/4] drm/xe/client: add missing bo locking in
> show_meminfo()
> 
> bo_meminfo() wants to inspect bo state like tt and the ttm resource, however
> this state can change at any point leading to stuff like NPD and UAF, if the bo
> lock is not held. Grab the bo lock when calling bo_meminfo(), ensuring we
> drop any spinlocks first. In the case of object_idr we now also need to hold a
> ref.
> 
> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> Cc: <stable@vger.kernel.org> # v6.8+
> ---
>  drivers/gpu/drm/xe/xe_drm_client.c | 37 +++++++++++++++++++++++++++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
> b/drivers/gpu/drm/xe/xe_drm_client.c
> index badfa045ead8..3cca741c500c 100644
> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> @@ -10,6 +10,7 @@
>  #include <linux/slab.h>
>  #include <linux/types.h>
> 
> +#include "xe_assert.h"
>  #include "xe_bo.h"
>  #include "xe_bo_types.h"
>  #include "xe_device_types.h"
> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct xe_drm_client
> *client,
>   */
>  void xe_drm_client_remove_bo(struct xe_bo *bo)  {
> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>  	struct xe_drm_client *client = bo->client;
> 
> +	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
> +
>  	spin_lock(&client->bos_lock);
> -	list_del(&bo->client_link);
> +	list_del_init(&bo->client_link);
>  	spin_unlock(&client->bos_lock);
> 
>  	xe_drm_client_put(client);
> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p,
> struct drm_file *file)
>  	idr_for_each_entry(&file->object_idr, obj, id) {
>  		struct xe_bo *bo = gem_to_xe_bo(obj);
> 
> -		bo_meminfo(bo, stats);
> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +		} else {
> +			xe_bo_get(bo);
> +			spin_unlock(&file->table_lock);
> +
> +			xe_bo_lock(bo, false);
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +
> +			xe_bo_put(bo);
> +			spin_lock(&file->table_lock);
> +		}
>  	}
>  	spin_unlock(&file->table_lock);
> 
> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p,
> struct drm_file *file)
>  		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>  			continue;
> 

While we have ref to BO, why would it need lock here, can you please explain if I am missing something. I though BO cant be deleted till will hold ref?

Thanks,
Tejas
> -		bo_meminfo(bo, stats);
> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +		} else {
> +			spin_unlock(&client->bos_lock);
> +
> +			xe_bo_lock(bo, false);
> +			bo_meminfo(bo, stats);
> +			xe_bo_unlock(bo);
> +
> +			spin_lock(&client->bos_lock);
> +			/* The bo ref will prevent this bo from being removed
> from the list */
> +			xe_assert(xef->xe, !list_empty(&bo->client_link));
> +		}
> +
>  		xe_bo_put_deferred(bo, &deferred);
>  	}
>  	spin_unlock(&client->bos_lock);
> --
> 2.46.0


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
  2024-09-11  5:39   ` Upadhyay, Tejas
@ 2024-09-11  8:35     ` Matthew Auld
  2024-09-11  9:40       ` Upadhyay, Tejas
  0 siblings, 1 reply; 8+ messages in thread
From: Matthew Auld @ 2024-09-11  8:35 UTC (permalink / raw)
  To: Upadhyay, Tejas, intel-xe@lists.freedesktop.org
  Cc: Ghimiray, Himal Prasad, Thomas Hellström,
	stable@vger.kernel.org

Hi,

On 11/09/2024 06:39, Upadhyay, Tejas wrote:
> 
> 
>> -----Original Message-----
>> From: Auld, Matthew <matthew.auld@intel.com>
>> Sent: Tuesday, September 10, 2024 6:42 PM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Upadhyay,
>> Tejas <tejas.upadhyay@intel.com>; Thomas Hellström
>> <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
>> Subject: [PATCH 2/4] drm/xe/client: add missing bo locking in
>> show_meminfo()
>>
>> bo_meminfo() wants to inspect bo state like tt and the ttm resource, however
>> this state can change at any point leading to stuff like NPD and UAF, if the bo
>> lock is not held. Grab the bo lock when calling bo_meminfo(), ensuring we
>> drop any spinlocks first. In the case of object_idr we now also need to hold a
>> ref.
>>
>> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
>> Cc: <stable@vger.kernel.org> # v6.8+
>> ---
>>   drivers/gpu/drm/xe/xe_drm_client.c | 37 +++++++++++++++++++++++++++---
>>   1 file changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
>> b/drivers/gpu/drm/xe/xe_drm_client.c
>> index badfa045ead8..3cca741c500c 100644
>> --- a/drivers/gpu/drm/xe/xe_drm_client.c
>> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/slab.h>
>>   #include <linux/types.h>
>>
>> +#include "xe_assert.h"
>>   #include "xe_bo.h"
>>   #include "xe_bo_types.h"
>>   #include "xe_device_types.h"
>> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct xe_drm_client
>> *client,
>>    */
>>   void xe_drm_client_remove_bo(struct xe_bo *bo)  {
>> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>>   	struct xe_drm_client *client = bo->client;
>>
>> +	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
>> +
>>   	spin_lock(&client->bos_lock);
>> -	list_del(&bo->client_link);
>> +	list_del_init(&bo->client_link);
>>   	spin_unlock(&client->bos_lock);
>>
>>   	xe_drm_client_put(client);
>> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p,
>> struct drm_file *file)
>>   	idr_for_each_entry(&file->object_idr, obj, id) {
>>   		struct xe_bo *bo = gem_to_xe_bo(obj);
>>
>> -		bo_meminfo(bo, stats);
>> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +		} else {
>> +			xe_bo_get(bo);
>> +			spin_unlock(&file->table_lock);
>> +
>> +			xe_bo_lock(bo, false);
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +
>> +			xe_bo_put(bo);
>> +			spin_lock(&file->table_lock);
>> +		}
>>   	}
>>   	spin_unlock(&file->table_lock);
>>
>> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p,
>> struct drm_file *file)
>>   		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>>   			continue;
>>
> 
> While we have ref to BO, why would it need lock here, can you please explain if I am missing something. I though BO cant be deleted till will hold ref?

The ref is just about protecting the lifetime of the bo, however the 
internal bo state in particular the ttm stuff, is all protected by 
holding the dma-resv bo lock.

For example the bo can be moved/evicted around at will and the object 
state changes with it, but that should be done only when also holding 
the bo lock. If we are holding the bo lock here then the object state 
should be stable, making it safe to inspect stuff like bo->ttm.ttm and 
bo->ttm.resource. As an example, if you look at ttm_bo_move_null() and 
imagine xe_bo_has_pages() racing with that, then NPD or UAF is possible.

> 
> Thanks,
> Tejas
>> -		bo_meminfo(bo, stats);
>> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +		} else {
>> +			spin_unlock(&client->bos_lock);
>> +
>> +			xe_bo_lock(bo, false);
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +
>> +			spin_lock(&client->bos_lock);
>> +			/* The bo ref will prevent this bo from being removed
>> from the list */
>> +			xe_assert(xef->xe, !list_empty(&bo->client_link));
>> +		}
>> +
>>   		xe_bo_put_deferred(bo, &deferred);
>>   	}
>>   	spin_unlock(&client->bos_lock);
>> --
>> 2.46.0
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
  2024-09-11  8:35     ` Matthew Auld
@ 2024-09-11  9:40       ` Upadhyay, Tejas
  0 siblings, 0 replies; 8+ messages in thread
From: Upadhyay, Tejas @ 2024-09-11  9:40 UTC (permalink / raw)
  To: Auld, Matthew, intel-xe@lists.freedesktop.org
  Cc: Ghimiray, Himal Prasad, Thomas Hellström,
	stable@vger.kernel.org



> -----Original Message-----
> From: Auld, Matthew <matthew.auld@intel.com>
> Sent: Wednesday, September 11, 2024 2:05 PM
> To: Upadhyay, Tejas <tejas.upadhyay@intel.com>; intel-
> xe@lists.freedesktop.org
> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Thomas
> Hellström <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
> Subject: Re: [PATCH 2/4] drm/xe/client: add missing bo locking in
> show_meminfo()
> 
> Hi,
> 
> On 11/09/2024 06:39, Upadhyay, Tejas wrote:
> >
> >
> >> -----Original Message-----
> >> From: Auld, Matthew <matthew.auld@intel.com>
> >> Sent: Tuesday, September 10, 2024 6:42 PM
> >> To: intel-xe@lists.freedesktop.org
> >> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>;
> >> Upadhyay, Tejas <tejas.upadhyay@intel.com>; Thomas Hellström
> >> <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
> >> Subject: [PATCH 2/4] drm/xe/client: add missing bo locking in
> >> show_meminfo()
> >>
> >> bo_meminfo() wants to inspect bo state like tt and the ttm resource,
> >> however this state can change at any point leading to stuff like NPD
> >> and UAF, if the bo lock is not held. Grab the bo lock when calling
> >> bo_meminfo(), ensuring we drop any spinlocks first. In the case of
> >> object_idr we now also need to hold a ref.
> >>
> >> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats
> >> printing")
> >> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> >> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
> >> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
> >> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
> >> Cc: <stable@vger.kernel.org> # v6.8+
> >> ---
> >>   drivers/gpu/drm/xe/xe_drm_client.c | 37
> +++++++++++++++++++++++++++---
> >>   1 file changed, 34 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
> >> b/drivers/gpu/drm/xe/xe_drm_client.c
> >> index badfa045ead8..3cca741c500c 100644
> >> --- a/drivers/gpu/drm/xe/xe_drm_client.c
> >> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
> >> @@ -10,6 +10,7 @@
> >>   #include <linux/slab.h>
> >>   #include <linux/types.h>
> >>
> >> +#include "xe_assert.h"
> >>   #include "xe_bo.h"
> >>   #include "xe_bo_types.h"
> >>   #include "xe_device_types.h"
> >> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct
> xe_drm_client
> >> *client,
> >>    */
> >>   void xe_drm_client_remove_bo(struct xe_bo *bo)  {
> >> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
> >>   	struct xe_drm_client *client = bo->client;
> >>
> >> +	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
> >> +
> >>   	spin_lock(&client->bos_lock);
> >> -	list_del(&bo->client_link);
> >> +	list_del_init(&bo->client_link);
> >>   	spin_unlock(&client->bos_lock);
> >>
> >>   	xe_drm_client_put(client);
> >> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p,
> >> struct drm_file *file)
> >>   	idr_for_each_entry(&file->object_idr, obj, id) {
> >>   		struct xe_bo *bo = gem_to_xe_bo(obj);
> >>
> >> -		bo_meminfo(bo, stats);
> >> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> >> +			bo_meminfo(bo, stats);
> >> +			xe_bo_unlock(bo);
> >> +		} else {
> >> +			xe_bo_get(bo);
> >> +			spin_unlock(&file->table_lock);
> >> +
> >> +			xe_bo_lock(bo, false);
> >> +			bo_meminfo(bo, stats);
> >> +			xe_bo_unlock(bo);
> >> +
> >> +			xe_bo_put(bo);
> >> +			spin_lock(&file->table_lock);
> >> +		}
> >>   	}
> >>   	spin_unlock(&file->table_lock);
> >>
> >> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p,
> >> struct drm_file *file)
> >>   		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
> >>   			continue;
> >>
> >
> > While we have ref to BO, why would it need lock here, can you please
> explain if I am missing something. I though BO cant be deleted till will hold
> ref?
> 
> The ref is just about protecting the lifetime of the bo, however the internal bo
> state in particular the ttm stuff, is all protected by holding the dma-resv bo
> lock.
> 
> For example the bo can be moved/evicted around at will and the object state
> changes with it, but that should be done only when also holding the bo lock.
> If we are holding the bo lock here then the object state should be stable,
> making it safe to inspect stuff like bo->ttm.ttm and
> bo->ttm.resource. As an example, if you look at ttm_bo_move_null() and
> imagine xe_bo_has_pages() racing with that, then NPD or UAF is possible.
> 

Ok, meaning ref will protect deletion of bo while in used, but internal state of bo's fields can still change with holding ref. Got it,
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>

Thanks,
Tejas
> >
> > Thanks,
> > Tejas
> >> -		bo_meminfo(bo, stats);
> >> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
> >> +			bo_meminfo(bo, stats);
> >> +			xe_bo_unlock(bo);
> >> +		} else {
> >> +			spin_unlock(&client->bos_lock);
> >> +
> >> +			xe_bo_lock(bo, false);
> >> +			bo_meminfo(bo, stats);
> >> +			xe_bo_unlock(bo);
> >> +
> >> +			spin_lock(&client->bos_lock);
> >> +			/* The bo ref will prevent this bo from being removed
> >> from the list */
> >> +			xe_assert(xef->xe, !list_empty(&bo->client_link));
> >> +		}
> >> +
> >>   		xe_bo_put_deferred(bo, &deferred);
> >>   	}
> >>   	spin_unlock(&client->bos_lock);
> >> --
> >> 2.46.0
> >

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-09-11  9:41 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-10 13:11 [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo() Matthew Auld
2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
2024-09-10 14:16   ` Matthew Brost
2024-09-11  5:39   ` Upadhyay, Tejas
2024-09-11  8:35     ` Matthew Auld
2024-09-11  9:40       ` Upadhyay, Tejas
2024-09-10 13:55 ` [PATCH 1/4] drm/xe/client: fix deadlock " Matthew Brost
2024-09-11  5:19 ` Upadhyay, Tejas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).