dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fix AMDGPU VRAM zeroing
@ 2025-05-27 19:43 Natalie Vock
  2025-05-27 19:43 ` [PATCH 1/2] drm/buddy: Add public helper to dirty blocks Natalie Vock
  2025-05-27 19:43 ` [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation Natalie Vock
  0 siblings, 2 replies; 10+ messages in thread
From: Natalie Vock @ 2025-05-27 19:43 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Arunpravin Paneer Selvam, Natalie Vock

Hi all,

I've stumbled upon this while investigating why AMDGPU seems to fail at
providing cleared VRAM allocations despite being explicitly asked to
with AMDGPU_GEM_CREATE_VRAM_CLEARED[1].

After some code inspection, I believe the problem is actually much worse
than not providing cleared VRAM. AMDGPU fails to track dirty vs. cleared
allocations properly in general, and still considers initially-cleared
memory that has since been overwritten by applications as cleared. In
consequence, it will skip wiping the memory after the application frees
it, leaking the contents to arbitrary other applications.

With the new drm_buddy helper, there is some cleanup potential as
drm_buddy.c defines an identical helper as a static function. However,
to keep the patch as minimal as possible for stable backporting, I'll
submit the cleanup as a follow-up patch instead.

Thanks,
Natalie

[1] https://gitlab.freedesktop.org/drm/amd/-/issues/3812

Natalie Vock (2):
  drm/buddy: Add public helper to dirty blocks
  drm/amdgpu: Dirty cleared blocks on allocation

 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
 include/drm/drm_buddy.h                      | 6 ++++++
 2 files changed, 13 insertions(+)

-- 
2.49.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] drm/buddy: Add public helper to dirty blocks
  2025-05-27 19:43 [PATCH 0/2] Fix AMDGPU VRAM zeroing Natalie Vock
@ 2025-05-27 19:43 ` Natalie Vock
  2025-05-29  7:05   ` Paneer Selvam, Arunpravin
  2025-05-27 19:43 ` [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation Natalie Vock
  1 sibling, 1 reply; 10+ messages in thread
From: Natalie Vock @ 2025-05-27 19:43 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Arunpravin Paneer Selvam, Natalie Vock, stable

Cleared blocks that are handed out to users after allocation cannot be
presumed to remain cleared. Thus, allocators using drm_buddy need to
dirty all blocks on the allocation success path. Provide a helper for
them to use.

Fixes: 96950929eb232 ("drm/buddy: Implement tracking clear page feature")
Cc: stable@vger.kernel.org

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 include/drm/drm_buddy.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
index 9689a7c5dd36..48628ff1c24f 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -142,6 +142,12 @@ drm_buddy_block_size(struct drm_buddy *mm,
 	return mm->chunk_size << drm_buddy_block_order(block);
 }
 
+static inline void
+drm_buddy_block_set_dirty(struct drm_buddy_block *block)
+{
+	block->header &= ~DRM_BUDDY_HEADER_CLEAR;
+}
+
 int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size);
 
 void drm_buddy_fini(struct drm_buddy *mm);
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-27 19:43 [PATCH 0/2] Fix AMDGPU VRAM zeroing Natalie Vock
  2025-05-27 19:43 ` [PATCH 1/2] drm/buddy: Add public helper to dirty blocks Natalie Vock
@ 2025-05-27 19:43 ` Natalie Vock
  2025-05-28  7:07   ` Christian König
  1 sibling, 1 reply; 10+ messages in thread
From: Natalie Vock @ 2025-05-27 19:43 UTC (permalink / raw)
  To: amd-gfx, dri-devel
  Cc: Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann,
	Arunpravin Paneer Selvam, Natalie Vock, stable

If we hand out cleared blocks to users, they are expected to write
at least some non-zero values somewhere. If we keep the CLEAR bit set on
the block, amdgpu_fill_buffer will assume there is nothing to do and
incorrectly skip clearing the block. Ultimately, the (still dirty) block
will be reused as if it were cleared, without any wiping of the memory
contents.

Most severely, this means that any buffer allocated with
AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE
(which is the case for **all userspace buffers**) are neither
guaranteed to contain cleared VRAM, nor are they being wiped on
release, potentially leaking application memory to arbitrary other
applications.

Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality")
Cc: stable@vger.kernel.org

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812

Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 2d7f82e98df9..cecc67d0f0b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
 	list_for_each_entry(block, &vres->blocks, link) {
 		unsigned long start;
 
+		/*
+		 * Allocated blocks may be dirtied as soon as we return.
+		 * Mark all blocks as dirty here, otherwise we might
+		 * incorrectly assume the memory is still zeroed.
+		 */
+		drm_buddy_block_set_dirty(block);
+
 		start = amdgpu_vram_mgr_block_start(block) +
 			amdgpu_vram_mgr_block_size(block);
 		start >>= PAGE_SHIFT;
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-27 19:43 ` [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation Natalie Vock
@ 2025-05-28  7:07   ` Christian König
  2025-05-28  9:29     ` Natalie Vock
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2025-05-28  7:07 UTC (permalink / raw)
  To: Natalie Vock, amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Arunpravin Paneer Selvam,
	stable

On 5/27/25 21:43, Natalie Vock wrote:
> If we hand out cleared blocks to users, they are expected to write
> at least some non-zero values somewhere. If we keep the CLEAR bit set on
> the block, amdgpu_fill_buffer will assume there is nothing to do and
> incorrectly skip clearing the block. Ultimately, the (still dirty) block
> will be reused as if it were cleared, without any wiping of the memory
> contents.
> 
> Most severely, this means that any buffer allocated with
> AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE
> (which is the case for **all userspace buffers**) are neither
> guaranteed to contain cleared VRAM, nor are they being wiped on
> release, potentially leaking application memory to arbitrary other
> applications.
> 
> Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality")
> Cc: stable@vger.kernel.org
> 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
> 
> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 2d7f82e98df9..cecc67d0f0b8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>  	list_for_each_entry(block, &vres->blocks, link) {
>  		unsigned long start;
>  
> +		/*
> +		 * Allocated blocks may be dirtied as soon as we return.
> +		 * Mark all blocks as dirty here, otherwise we might
> +		 * incorrectly assume the memory is still zeroed.
> +		 */
> +		drm_buddy_block_set_dirty(block);

Exactly that makes no sense.

We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.

So we should set them dirty as soon as we are done with the clearing.

But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.

Regards,
Christian.


> +
>  		start = amdgpu_vram_mgr_block_start(block) +
>  			amdgpu_vram_mgr_block_size(block);
>  		start >>= PAGE_SHIFT;


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-28  7:07   ` Christian König
@ 2025-05-28  9:29     ` Natalie Vock
  2025-05-28 10:52       ` Christian König
  2025-05-28 12:14       ` Paneer Selvam, Arunpravin
  0 siblings, 2 replies; 10+ messages in thread
From: Natalie Vock @ 2025-05-28  9:29 UTC (permalink / raw)
  To: Christian König, amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Arunpravin Paneer Selvam,
	stable

Hi,

On 5/28/25 09:07, Christian König wrote:
> On 5/27/25 21:43, Natalie Vock wrote:
>> If we hand out cleared blocks to users, they are expected to write
>> at least some non-zero values somewhere. If we keep the CLEAR bit set on
>> the block, amdgpu_fill_buffer will assume there is nothing to do and
>> incorrectly skip clearing the block. Ultimately, the (still dirty) block
>> will be reused as if it were cleared, without any wiping of the memory
>> contents.
>>
>> Most severely, this means that any buffer allocated with
>> AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE
>> (which is the case for **all userspace buffers**) are neither
>> guaranteed to contain cleared VRAM, nor are they being wiped on
>> release, potentially leaking application memory to arbitrary other
>> applications.
>>
>> Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality")
>> Cc: stable@vger.kernel.org
>>
>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
>>
>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> index 2d7f82e98df9..cecc67d0f0b8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>> @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>>   	list_for_each_entry(block, &vres->blocks, link) {
>>   		unsigned long start;
>>   
>> +		/*
>> +		 * Allocated blocks may be dirtied as soon as we return.
>> +		 * Mark all blocks as dirty here, otherwise we might
>> +		 * incorrectly assume the memory is still zeroed.
>> +		 */
>> +		drm_buddy_block_set_dirty(block);
> 
> Exactly that makes no sense.
> 
> We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.

Right, I missed that separate clear on allocation. I was put a bit 
off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee cleared 
pages, when in reality it's more like a preference.

> 
> So we should set them dirty as soon as we are done with the clearing.
> 
> But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.

Yes precisely - "some reason" being the aforementioned clear flags. We 
do always call amdgpu_clear_buffer on release, but that function will 
perform the same checks as the clear on allocation does - that means, if 
a block is marked clear then it will skip emitting any actual clears.

If we don't mark the blocks as dirty after allocating, then the 
amdgpu_clear_buffer call on release will skip actually performing the 
clear like it did during allocation - this is obviously really broken.

After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared 
which causes the drm_buddy blocks to be marked as "cleared" when freed. 
This part is correct in itself, but obviously breaks if 
amdgpu_clear_buffer didn't actually clear the buffer. That's how the 
dirty blocks end up in the buddy allocator as cleared ones.

I'm testing a v2 that sets the dirty flags after the initial clear, I'll 
send it once I confirmed it works.

Thanks,
Natalie

> 
> Regards,
> Christian.
> 
> 
>> +
>>   		start = amdgpu_vram_mgr_block_start(block) +
>>   			amdgpu_vram_mgr_block_size(block);
>>   		start >>= PAGE_SHIFT;
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-28  9:29     ` Natalie Vock
@ 2025-05-28 10:52       ` Christian König
  2025-05-28 12:14       ` Paneer Selvam, Arunpravin
  1 sibling, 0 replies; 10+ messages in thread
From: Christian König @ 2025-05-28 10:52 UTC (permalink / raw)
  To: Natalie Vock, amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, Arunpravin Paneer Selvam,
	stable

On 5/28/25 11:29, Natalie Vock wrote:
> Hi,
> 
> On 5/28/25 09:07, Christian König wrote:
>> On 5/27/25 21:43, Natalie Vock wrote:
>>> If we hand out cleared blocks to users, they are expected to write
>>> at least some non-zero values somewhere. If we keep the CLEAR bit set on
>>> the block, amdgpu_fill_buffer will assume there is nothing to do and
>>> incorrectly skip clearing the block. Ultimately, the (still dirty) block
>>> will be reused as if it were cleared, without any wiping of the memory
>>> contents.
>>>
>>> Most severely, this means that any buffer allocated with
>>> AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE
>>> (which is the case for **all userspace buffers**) are neither
>>> guaranteed to contain cleared VRAM, nor are they being wiped on
>>> release, potentially leaking application memory to arbitrary other
>>> applications.
>>>
>>> Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality")
>>> Cc: stable@vger.kernel.org
>>>
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 2d7f82e98df9..cecc67d0f0b8 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man,
>>>       list_for_each_entry(block, &vres->blocks, link) {
>>>           unsigned long start;
>>>   +        /*
>>> +         * Allocated blocks may be dirtied as soon as we return.
>>> +         * Mark all blocks as dirty here, otherwise we might
>>> +         * incorrectly assume the memory is still zeroed.
>>> +         */
>>> +        drm_buddy_block_set_dirty(block);
>>
>> Exactly that makes no sense.
>>
>> We need the information if it's dirty or not later while clearing the blocks. Otherwise we will clear all blocks and completely loose the advantage of the clear tracking.
> 
> Right, I missed that separate clear on allocation. I was put a bit off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee cleared pages, when in reality it's more like a preference.
> 
>>
>> So we should set them dirty as soon as we are done with the clearing.
>>
>> But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
> 
> Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
> 
> If we don't mark the blocks as dirty after allocating, then the amdgpu_clear_buffer call on release will skip actually performing the clear like it did during allocation - this is obviously really broken.
> 
> After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared which causes the drm_buddy blocks to be marked as "cleared" when freed. This part is correct in itself, but obviously breaks if amdgpu_clear_buffer didn't actually clear the buffer. That's how the dirty blocks end up in the buddy allocator as cleared ones.

IIRC I've pointed out exactly that during internal discussions as well and suggested that amdgpu_vram_mgr_set_cleared() is renamed to amdgpu_vram_mgr_set_state() and given a boolean flag to indicate the state.

So that the clear on allocation will clear the dirty and set the state dirty and the clear on release will clear everything and set the clean state.

Sounds like this was never implemented like it was originally planned.


> I'm testing a v2 that sets the dirty flags after the initial clear, I'll send it once I confirmed it works

Thanks a lot for looking into that!

Regards,
Christian.

.
> 
> Thanks,
> Natalie
> 
>>
>> Regards,
>> Christian.
>>
>>
>>> +
>>>           start = amdgpu_vram_mgr_block_start(block) +
>>>               amdgpu_vram_mgr_block_size(block);
>>>           start >>= PAGE_SHIFT;
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-28  9:29     ` Natalie Vock
  2025-05-28 10:52       ` Christian König
@ 2025-05-28 12:14       ` Paneer Selvam, Arunpravin
  2025-05-28 12:39         ` Michel Dänzer
  1 sibling, 1 reply; 10+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-28 12:14 UTC (permalink / raw)
  To: Natalie Vock, Christian König, amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, stable



On 5/28/2025 2:59 PM, Natalie Vock wrote:
> Hi,
>
> On 5/28/25 09:07, Christian König wrote:
>> On 5/27/25 21:43, Natalie Vock wrote:
>>> If we hand out cleared blocks to users, they are expected to write
>>> at least some non-zero values somewhere. If we keep the CLEAR bit 
>>> set on
>>> the block, amdgpu_fill_buffer will assume there is nothing to do and
>>> incorrectly skip clearing the block. Ultimately, the (still dirty) 
>>> block
>>> will be reused as if it were cleared, without any wiping of the memory
>>> contents.
>>>
>>> Most severely, this means that any buffer allocated with
>>> AMDGPU_GEM_CREATE_VRAM_CLEARED | AMDGPU_GEM_CREATE_WIPE_ON_RELEASE
>>> (which is the case for **all userspace buffers**) are neither
>>> guaranteed to contain cleared VRAM, nor are they being wiped on
>>> release, potentially leaking application memory to arbitrary other
>>> applications.
>>>
>>> Fixes: a68c7eaa7a8ff ("drm/amdgpu: Enable clear page functionality")
>>> Cc: stable@vger.kernel.org
>>>
>>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3812
>>>
>>> Signed-off-by: Natalie Vock <natalie.vock@gmx.de>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 7 +++++++
>>>   1 file changed, 7 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> index 2d7f82e98df9..cecc67d0f0b8 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
>>> @@ -591,6 +591,13 @@ static int amdgpu_vram_mgr_new(struct 
>>> ttm_resource_manager *man,
>>>       list_for_each_entry(block, &vres->blocks, link) {
>>>           unsigned long start;
>>>   +        /*
>>> +         * Allocated blocks may be dirtied as soon as we return.
>>> +         * Mark all blocks as dirty here, otherwise we might
>>> +         * incorrectly assume the memory is still zeroed.
>>> +         */
>>> +        drm_buddy_block_set_dirty(block);
>>
>> Exactly that makes no sense.
>>
>> We need the information if it's dirty or not later while clearing the 
>> blocks. Otherwise we will clear all blocks and completely loose the 
>> advantage of the clear tracking.
>
> Right, I missed that separate clear on allocation. I was put a bit 
> off-track by assuming DRM_BUDDY_ALLOCATE_CLEARED would guarantee 
> cleared pages, when in reality it's more like a preference.
>
>>
>> So we should set them dirty as soon as we are done with the clearing.
>>
>> But the problem rather seems to be that we sometimes don't clear the 
>> buffers on release for some reason, but still set it as cleared.
>
> Yes precisely - "some reason" being the aforementioned clear flags. We 
> do always call amdgpu_clear_buffer on release, but that function will 
> perform the same checks as the clear on allocation does - that means, 
> if a block is marked clear then it will skip emitting any actual clears.

On buffer release 
[https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c#L1318], 
we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in 
amdgpu_bo_release_notify() function), so the buffers are expected to be 
cleared without fail.

When the user space doesn't set the 
AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only 
AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() 
and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense.
I think the problem here is, when we don't clear the buffer during BO 
release, but the flag remains as cleared and that's why these blocks are 
skipped during clear on allocation (in amdgpu_bo_create() function).

Therefore, if the release path clear is skipped for any reasons (for 
example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we 
should set all buffer to dirty. Somehow, that is missed.

Thanks,
Arun.
>
> If we don't mark the blocks as dirty after allocating, then the 
> amdgpu_clear_buffer call on release will skip actually performing the 
> clear like it did during allocation - this is obviously really broken.
>
> After calling amdgpu_clear_buffer, we call amdgpu_vram_mgr_set_cleared 
> which causes the drm_buddy blocks to be marked as "cleared" when 
> freed. This part is correct in itself, but obviously breaks if 
> amdgpu_clear_buffer didn't actually clear the buffer. That's how the 
> dirty blocks end up in the buddy allocator as cleared ones.
>
> I'm testing a v2 that sets the dirty flags after the initial clear, 
> I'll send it once I confirmed it works.
>
> Thanks,
> Natalie
>
>>
>> Regards,
>> Christian.
>>
>>
>>> +
>>>           start = amdgpu_vram_mgr_block_start(block) +
>>>               amdgpu_vram_mgr_block_size(block);
>>>           start >>= PAGE_SHIFT;
>>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-28 12:14       ` Paneer Selvam, Arunpravin
@ 2025-05-28 12:39         ` Michel Dänzer
  2025-05-28 12:59           ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Michel Dänzer @ 2025-05-28 12:39 UTC (permalink / raw)
  To: Paneer Selvam, Arunpravin, Natalie Vock, Christian König,
	amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, stable

On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote:
> On 5/28/2025 2:59 PM, Natalie Vock wrote:
>> On 5/28/25 09:07, Christian König wrote:
>>>
>>> But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
>>
>> Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
> 
> On buffer release [https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c#L1318], we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in amdgpu_bo_release_notify() function), so the buffers are expected to be cleared without fail.
> 
> When the user space doesn't set the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense.
> I think the problem here is, when we don't clear the buffer during BO release, but the flag remains as cleared and that's why these blocks are skipped during clear on allocation (in amdgpu_bo_create() function).
> 
> Therefore, if the release path clear is skipped for any reasons (for example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we should set all buffer to dirty. Somehow, that is missed.
BTW, I asked this before, but didn't get an answer:

Now that VRAM is always cleared before handing it out to user space, does AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE really need to do anything anymore? How can user space access the contents of a destroyed BO?


-- 
Earthling Michel Dänzer       \        GNOME / Xwayland / Mesa developer
https://redhat.com             \               Libre software enthusiast

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation
  2025-05-28 12:39         ` Michel Dänzer
@ 2025-05-28 12:59           ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2025-05-28 12:59 UTC (permalink / raw)
  To: Michel Dänzer, Paneer Selvam, Arunpravin, Natalie Vock,
	amd-gfx, dri-devel
  Cc: Alex Deucher, David Airlie, Simona Vetter, Maarten Lankhorst,
	Maxime Ripard, Thomas Zimmermann, stable

On 5/28/25 14:39, Michel Dänzer wrote:
> On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote:
>> On 5/28/2025 2:59 PM, Natalie Vock wrote:
>>> On 5/28/25 09:07, Christian König wrote:
>>>>
>>>> But the problem rather seems to be that we sometimes don't clear the buffers on release for some reason, but still set it as cleared.
>>>
>>> Yes precisely - "some reason" being the aforementioned clear flags. We do always call amdgpu_clear_buffer on release, but that function will perform the same checks as the clear on allocation does - that means, if a block is marked clear then it will skip emitting any actual clears.
>>
>> On buffer release [https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c#L1318], we call amdgpu_fill_buffer() and not amdgpu_clear_buffer() (in amdgpu_bo_release_notify() function), so the buffers are expected to be cleared without fail.
>>
>> When the user space doesn't set the AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE flag and having only AMDGPU_GEM_CREATE_VRAM_CLEARED, we don't call this amdgpu_fill_buffer() and amdgpu_vram_mgr_set_cleared(), and that's kind of makes sense.
>> I think the problem here is, when we don't clear the buffer during BO release, but the flag remains as cleared and that's why these blocks are skipped during clear on allocation (in amdgpu_bo_create() function).
>>
>> Therefore, if the release path clear is skipped for any reasons (for example, in case of AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE not set), we should set all buffer to dirty. Somehow, that is missed.
> BTW, I asked this before, but didn't get an answer:
> 
> Now that VRAM is always cleared before handing it out to user space, does AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE really need to do anything anymore? How can user space access the contents of a destroyed BO?

The flag is now used to communicate to the backend that we want to wipe on release. E.g. we still don't do that for kernel allocations or during suspend.

Regards,
Christian. 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] drm/buddy: Add public helper to dirty blocks
  2025-05-27 19:43 ` [PATCH 1/2] drm/buddy: Add public helper to dirty blocks Natalie Vock
@ 2025-05-29  7:05   ` Paneer Selvam, Arunpravin
  0 siblings, 0 replies; 10+ messages in thread
From: Paneer Selvam, Arunpravin @ 2025-05-29  7:05 UTC (permalink / raw)
  To: Natalie Vock, amd-gfx, dri-devel
  Cc: Christian König, Alex Deucher, David Airlie, Simona Vetter,
	Maarten Lankhorst, Maxime Ripard, Thomas Zimmermann, stable

[-- Attachment #1: Type: text/plain, Size: 1298 bytes --]



On 5/28/2025 1:13 AM, Natalie Vock wrote:
> Cleared blocks that are handed out to users after allocation cannot be
> presumed to remain cleared. Thus, allocators using drm_buddy need to
> dirty all blocks on the allocation success path. Provide a helper for
> them to use.
>
> Fixes: 96950929eb232 ("drm/buddy: Implement tracking clear page feature")
> Cc:stable@vger.kernel.org
>
> Signed-off-by: Natalie Vock<natalie.vock@gmx.de>
> ---
>   include/drm/drm_buddy.h | 6 ++++++
>   1 file changed, 6 insertions(+)
>
> diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h
> index 9689a7c5dd36..48628ff1c24f 100644
> --- a/include/drm/drm_buddy.h
> +++ b/include/drm/drm_buddy.h
> @@ -142,6 +142,12 @@ drm_buddy_block_size(struct drm_buddy *mm,
>   	return mm->chunk_size << drm_buddy_block_order(block);
>   }
>   
> +static inline void
> +drm_buddy_block_set_dirty(struct drm_buddy_block *block)
> +{
> +	block->header &= ~DRM_BUDDY_HEADER_CLEAR;

The users should not modify this DRM_BUDDY_HEADER_CLEAR flag directly; 
instead, please use the DRM_BUDDY_CLEARED flag in the driver.

The buddy manager will take care of this DRM_BUDDY_HEADER_CLEAR flag.

> +}
> +
>   int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size);
>   
>   void drm_buddy_fini(struct drm_buddy *mm);

[-- Attachment #2: Type: text/html, Size: 2062 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-05-29  7:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-27 19:43 [PATCH 0/2] Fix AMDGPU VRAM zeroing Natalie Vock
2025-05-27 19:43 ` [PATCH 1/2] drm/buddy: Add public helper to dirty blocks Natalie Vock
2025-05-29  7:05   ` Paneer Selvam, Arunpravin
2025-05-27 19:43 ` [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation Natalie Vock
2025-05-28  7:07   ` Christian König
2025-05-28  9:29     ` Natalie Vock
2025-05-28 10:52       ` Christian König
2025-05-28 12:14       ` Paneer Selvam, Arunpravin
2025-05-28 12:39         ` Michel Dänzer
2025-05-28 12:59           ` Christian König

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).