Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Upadhyay, Tejas" <tejas.upadhyay@intel.com>,
	"intel-xe@lists.freedesktop.org" <intel-xe@lists.freedesktop.org>
Cc: "Ghimiray, Himal Prasad" <himal.prasad.ghimiray@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"stable@vger.kernel.org" <stable@vger.kernel.org>
Subject: Re: [PATCH 2/4] drm/xe/client: add missing bo locking in show_meminfo()
Date: Wed, 11 Sep 2024 09:35:22 +0100	[thread overview]
Message-ID: <fea806cc-ed5e-4fd0-adf5-87d98ea5b99d@intel.com> (raw)
In-Reply-To: <SJ1PR11MB620426C360C56AE7A55D6DA7819B2@SJ1PR11MB6204.namprd11.prod.outlook.com>

Hi,

On 11/09/2024 06:39, Upadhyay, Tejas wrote:
> 
> 
>> -----Original Message-----
>> From: Auld, Matthew <matthew.auld@intel.com>
>> Sent: Tuesday, September 10, 2024 6:42 PM
>> To: intel-xe@lists.freedesktop.org
>> Cc: Ghimiray, Himal Prasad <himal.prasad.ghimiray@intel.com>; Upadhyay,
>> Tejas <tejas.upadhyay@intel.com>; Thomas Hellström
>> <thomas.hellstrom@linux.intel.com>; stable@vger.kernel.org
>> Subject: [PATCH 2/4] drm/xe/client: add missing bo locking in
>> show_meminfo()
>>
>> bo_meminfo() wants to inspect bo state like tt and the ttm resource, however
>> this state can change at any point leading to stuff like NPD and UAF, if the bo
>> lock is not held. Grab the bo lock when calling bo_meminfo(), ensuring we
>> drop any spinlocks first. In the case of object_idr we now also need to hold a
>> ref.
>>
>> Fixes: 0845233388f8 ("drm/xe: Implement fdinfo memory stats printing")
>> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
>> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
>> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com>
>> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
>> Cc: <stable@vger.kernel.org> # v6.8+
>> ---
>>   drivers/gpu/drm/xe/xe_drm_client.c | 37 +++++++++++++++++++++++++++---
>>   1 file changed, 34 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_drm_client.c
>> b/drivers/gpu/drm/xe/xe_drm_client.c
>> index badfa045ead8..3cca741c500c 100644
>> --- a/drivers/gpu/drm/xe/xe_drm_client.c
>> +++ b/drivers/gpu/drm/xe/xe_drm_client.c
>> @@ -10,6 +10,7 @@
>>   #include <linux/slab.h>
>>   #include <linux/types.h>
>>
>> +#include "xe_assert.h"
>>   #include "xe_bo.h"
>>   #include "xe_bo_types.h"
>>   #include "xe_device_types.h"
>> @@ -151,10 +152,13 @@ void xe_drm_client_add_bo(struct xe_drm_client
>> *client,
>>    */
>>   void xe_drm_client_remove_bo(struct xe_bo *bo)  {
>> +	struct xe_device *xe = ttm_to_xe_device(bo->ttm.bdev);
>>   	struct xe_drm_client *client = bo->client;
>>
>> +	xe_assert(xe, !kref_read(&bo->ttm.base.refcount));
>> +
>>   	spin_lock(&client->bos_lock);
>> -	list_del(&bo->client_link);
>> +	list_del_init(&bo->client_link);
>>   	spin_unlock(&client->bos_lock);
>>
>>   	xe_drm_client_put(client);
>> @@ -207,7 +211,20 @@ static void show_meminfo(struct drm_printer *p,
>> struct drm_file *file)
>>   	idr_for_each_entry(&file->object_idr, obj, id) {
>>   		struct xe_bo *bo = gem_to_xe_bo(obj);
>>
>> -		bo_meminfo(bo, stats);
>> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +		} else {
>> +			xe_bo_get(bo);
>> +			spin_unlock(&file->table_lock);
>> +
>> +			xe_bo_lock(bo, false);
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +
>> +			xe_bo_put(bo);
>> +			spin_lock(&file->table_lock);
>> +		}
>>   	}
>>   	spin_unlock(&file->table_lock);
>>
>> @@ -217,7 +234,21 @@ static void show_meminfo(struct drm_printer *p,
>> struct drm_file *file)
>>   		if (!kref_get_unless_zero(&bo->ttm.base.refcount))
>>   			continue;
>>
> 
> While we have ref to BO, why would it need lock here, can you please explain if I am missing something. I though BO cant be deleted till will hold ref?

The ref is just about protecting the lifetime of the bo, however the 
internal bo state in particular the ttm stuff, is all protected by 
holding the dma-resv bo lock.

For example the bo can be moved/evicted around at will and the object 
state changes with it, but that should be done only when also holding 
the bo lock. If we are holding the bo lock here then the object state 
should be stable, making it safe to inspect stuff like bo->ttm.ttm and 
bo->ttm.resource. As an example, if you look at ttm_bo_move_null() and 
imagine xe_bo_has_pages() racing with that, then NPD or UAF is possible.

> 
> Thanks,
> Tejas
>> -		bo_meminfo(bo, stats);
>> +		if (dma_resv_trylock(bo->ttm.base.resv)) {
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +		} else {
>> +			spin_unlock(&client->bos_lock);
>> +
>> +			xe_bo_lock(bo, false);
>> +			bo_meminfo(bo, stats);
>> +			xe_bo_unlock(bo);
>> +
>> +			spin_lock(&client->bos_lock);
>> +			/* The bo ref will prevent this bo from being removed
>> from the list */
>> +			xe_assert(xef->xe, !list_empty(&bo->client_link));
>> +		}
>> +
>>   		xe_bo_put_deferred(bo, &deferred);
>>   	}
>>   	spin_unlock(&client->bos_lock);
>> --
>> 2.46.0
> 

  reply	other threads:[~2024-09-11  8:35 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-10 13:11 [PATCH 1/4] drm/xe/client: fix deadlock in show_meminfo() Matthew Auld
2024-09-10 13:11 ` [PATCH 2/4] drm/xe/client: add missing bo locking " Matthew Auld
2024-09-10 14:16   ` Matthew Brost
2024-09-11  5:39   ` Upadhyay, Tejas
2024-09-11  8:35     ` Matthew Auld [this message]
2024-09-11  9:40       ` Upadhyay, Tejas
2024-09-10 13:11 ` [PATCH 3/4] drm/xe/client: use mem_type from the current resource Matthew Auld
2024-09-10 14:18   ` Matthew Brost
2024-09-11  5:45   ` Upadhyay, Tejas
2024-09-10 13:11 ` [PATCH 4/4] drm/xe/bo: add some annotations in bo_put() Matthew Auld
2024-09-10 13:59   ` Matthew Brost
2024-09-10 14:52     ` Matthew Auld
2024-09-10 15:59       ` Matthew Brost
2024-09-10 14:49   ` Matthew Brost
2024-09-10 15:03     ` Matthew Auld
2024-09-10 15:26       ` Matthew Brost
2024-09-10 15:29         ` Matthew Brost
2024-09-11  5:40   ` Upadhyay, Tejas
2024-09-10 13:29 ` ✓ CI.Patch_applied: success for series starting with [1/4] drm/xe/client: fix deadlock in show_meminfo() Patchwork
2024-09-10 13:30 ` ✓ CI.checkpatch: " Patchwork
2024-09-10 13:31 ` ✓ CI.KUnit: " Patchwork
2024-09-10 13:48 ` ✓ CI.Build: " Patchwork
2024-09-10 13:53 ` ✓ CI.Hooks: " Patchwork
2024-09-10 13:55 ` [PATCH 1/4] " Matthew Brost
2024-09-10 13:56 ` ✓ CI.checksparse: success for series starting with [1/4] " Patchwork
2024-09-10 14:39 ` ✓ CI.BAT: " Patchwork
2024-09-10 15:45 ` ✗ CI.FULL: failure " Patchwork
2024-09-11  5:19 ` [PATCH 1/4] " Upadhyay, Tejas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fea806cc-ed5e-4fd0-adf5-87d98ea5b99d@intel.com \
    --to=matthew.auld@intel.com \
    --cc=himal.prasad.ghimiray@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=stable@vger.kernel.org \
    --cc=tejas.upadhyay@intel.com \
    --cc=thomas.hellstrom@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox