* [PATCH] drm/amdgpu: Fix an oops with GTT | VRAM allocation
@ 2022-12-10 9:24 Luben Tuikov
2022-12-12 17:48 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Luben Tuikov
0 siblings, 1 reply; 11+ messages in thread
From: Luben Tuikov @ 2022-12-10 9:24 UTC (permalink / raw)
To: AMD Graphics; +Cc: Alex Deucher, Luben Tuikov, Christian König
Fix a kernel oops in amdgpu_bo_validate_size() when we allow allocation with
GTT | VRAM domains set. The problem is that we try to dereference a non-existing
TTM resource manager of the wanted type, GTT. In this allocation both GTT and
VRAM are set. The call takes place in amdgpu_ttm_reserve_tmr() at line 1716.
Dec 10 01:12:41 localhost.localdomain kernel: RIP: 0010:amdgpu_bo_create+0x8c/0x4b0 [amdgpu]
Dec 10 01:12:41 localhost.localdomain kernel: Code: c7 44 24 34 00 00 00 00 a8 30 0f 84 e6 01 00 00 49 63 f5 49 c1 e4 0c 48 89 34 24 a8 02 0f 84 ad 01 00 00 48 8b 85 d0 55 00 00 <4c> 3b 60 10 0f 83 b5 01 00 00 81 7b 0c 87 02 00 00 0f 86 61 03 00
Dec 10 01:12:41 localhost.localdomain kernel: RSP: 0018:ffffc3b580ba7980 EFLAGS: 00010202
Dec 10 01:12:41 localhost.localdomain kernel: RAX: 0000000000000000 RBX: ffffc3b580ba7a00 RCX: 0000000000000001
Dec 10 01:12:41 localhost.localdomain kernel: RDX: ffff9fa481586200 RSI: ffffc3b580ba7a00 RDI: ffff9fa481580000
Dec 10 01:12:41 localhost.localdomain kernel: RBP: ffff9fa481580000 R08: ffff9fa481586210 R09: 0000000000000000
Dec 10 01:12:41 localhost.localdomain kernel: R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000010000
Dec 10 01:12:41 localhost.localdomain kernel: R13: 0000000000000001 R14: ffff9fa481586210 R15: ffff9fa481586210
Dec 10 01:12:41 localhost.localdomain kernel: FS: 00007fc2505fbb40(0000) GS:ffff9fab4ed00000(0000) knlGS:0000000000000000
Dec 10 01:12:41 localhost.localdomain kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Dec 10 01:12:41 localhost.localdomain kernel: CR2: 0000000000000010 CR3: 0000000128934000 CR4: 00000000003506e0
Dec 10 01:12:41 localhost.localdomain kernel: Call Trace:
Dec 10 01:12:41 localhost.localdomain kernel: <TASK>
Dec 10 01:12:41 localhost.localdomain kernel: amdgpu_bo_create_reserved+0x15d/0x1b0 [amdgpu]
Dec 10 01:12:41 localhost.localdomain kernel: amdgpu_bo_create_kernel_at+0x54/0x1c0 [amdgpu]
Dec 10 01:12:41 localhost.localdomain kernel: amdgpu_ttm_init+0x1ad/0x470 [amdgpu]
...
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
/*
* If GTT is part of requested domains the check must succeed to
- * allow fall back to GTT
+ * allow fall back to GTT.
+ *
+ * Note that allocations can request from either domain. For
+ * this reason, check either in non-exclusive way, and if
+ * neither satisfies, fail the validation.
*/
if (domain & AMDGPU_GEM_DOMAIN_GTT) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
-
- if (size < man->size)
+ if (man && size < man->size)
return true;
- else
- goto fail;
}
if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
-
- if (size < man->size)
+ if (man && size < man->size)
return true;
- else
- goto fail;
}
-
/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
- return true;
-fail:
DRM_DEBUG("BO size %lu > total memory in domain: %llu\n", size,
man->size);
return false;
base-commit: 3c4ee2dc869cba283b0c667708090aefbc09aacf
prerequisite-patch-id: 3d9ce4e1252cf76ced92d755740a8df4f073d440
prerequisite-patch-id: c37f8050f6b285983085f62cd65f99fce034a0fb
prerequisite-patch-id: eff248bd978d8510bab4c51b960b71dd6a542138
prerequisite-patch-id: 539ef7082989c2fe194803c5b8041b931009397c
--
2.39.0.rc2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-10 9:24 [PATCH] drm/amdgpu: Fix an oops with GTT | VRAM allocation Luben Tuikov
@ 2022-12-12 17:48 ` Luben Tuikov
2022-12-12 19:19 ` Christian König
0 siblings, 1 reply; 11+ messages in thread
From: Luben Tuikov @ 2022-12-12 17:48 UTC (permalink / raw)
To: AMD Graphics; +Cc: Alex Deucher, Luben Tuikov, Christian König
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
requested memory exists, and to allow for non-exclusive domain allocations, as
there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
/*
* If GTT is part of requested domains the check must succeed to
- * allow fall back to GTT
+ * allow fall back to GTT.
+ *
+ * Note that allocations can request from either domain. For
+ * this reason, check either in non-exclusive way, and if
+ * neither satisfies, fail the validation.
*/
if (domain & AMDGPU_GEM_DOMAIN_GTT) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
-
- if (size < man->size)
+ if (man && size < man->size)
return true;
- else
- goto fail;
}
if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
-
- if (size < man->size)
+ if (man && size < man->size)
return true;
- else
- goto fail;
}
-
/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
- return true;
-fail:
DRM_DEBUG("BO size %lu > total memory in domain: %llu\n", size,
man->size);
return false;
base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f
--
2.39.0.rc2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-12 17:48 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Luben Tuikov
@ 2022-12-12 19:19 ` Christian König
2022-12-12 23:44 ` Luben Tuikov
0 siblings, 1 reply; 11+ messages in thread
From: Christian König @ 2022-12-12 19:19 UTC (permalink / raw)
To: Luben Tuikov, AMD Graphics; +Cc: Alex Deucher
Am 12.12.22 um 18:48 schrieb Luben Tuikov:
> Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
> requested memory exists, and to allow for non-exclusive domain allocations, as
> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
> AMDGPU_GEM_DOMAIN_GTT.
>
> Cc: Alex Deucher <Alexander.Deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
> 1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
>
> /*
> * If GTT is part of requested domains the check must succeed to
> - * allow fall back to GTT
> + * allow fall back to GTT.
> + *
> + * Note that allocations can request from either domain. For
> + * this reason, check either in non-exclusive way, and if
> + * neither satisfies, fail the validation.
That's not correct, the original logic was completely intentional.
If both VRAM and GTT are specified it's valid if the size fits only into
GTT.
Regards,
Christian.
> */
> if (domain & AMDGPU_GEM_DOMAIN_GTT) {
> man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
> -
> - if (size < man->size)
> + if (man && size < man->size)
> return true;
> - else
> - goto fail;
> }
>
> if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
> man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
> -
> - if (size < man->size)
> + if (man && size < man->size)
> return true;
> - else
> - goto fail;
> }
>
> -
> /* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
> - return true;
>
> -fail:
> DRM_DEBUG("BO size %lu > total memory in domain: %llu\n", size,
> man->size);
> return false;
>
> base-commit: 20e03e7f6e8efd42168db6d3fe044b804e0ede8f
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-12 19:19 ` Christian König
@ 2022-12-12 23:44 ` Luben Tuikov
2022-12-13 4:13 ` Luben Tuikov
2022-12-13 7:00 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Christian König
0 siblings, 2 replies; 11+ messages in thread
From: Luben Tuikov @ 2022-12-12 23:44 UTC (permalink / raw)
To: Christian König, AMD Graphics; +Cc: Alex Deucher
On 2022-12-12 14:19, Christian König wrote:
> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>> Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
>> requested memory exists, and to allow for non-exclusive domain allocations, as
>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>> AMDGPU_GEM_DOMAIN_GTT.
>>
>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>> ---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
>>
>> /*
>> * If GTT is part of requested domains the check must succeed to
>> - * allow fall back to GTT
>> + * allow fall back to GTT.
>> + *
>> + * Note that allocations can request from either domain. For
>> + * this reason, check either in non-exclusive way, and if
>> + * neither satisfies, fail the validation.
>
> That's not correct, the original logic was completely intentional.
>
> If both VRAM and GTT are specified it's valid if the size fits only into
> GTT.
Given that this patch fixes a kernel oops, should this patch then fail the validation,
i.e. return false?
This would then fail, in amdgpu_ttm_reserve_tmr():
ret = amdgpu_bo_create_kernel_at(adev,
adev->gmc.real_vram_size - adev->mman.discovery_tmr_size,
adev->mman.discovery_tmr_size,
AMDGPU_GEM_DOMAIN_VRAM |
AMDGPU_GEM_DOMAIN_GTT,
&adev->mman.discovery_memory,
NULL);
Regards,
Luben
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-12 23:44 ` Luben Tuikov
@ 2022-12-13 4:13 ` Luben Tuikov
2022-12-13 4:16 ` [PATCH v3] drm/amdgpu: Fix size validation for non-exclusive domains (v3) Luben Tuikov
2022-12-13 7:00 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Christian König
1 sibling, 1 reply; 11+ messages in thread
From: Luben Tuikov @ 2022-12-13 4:13 UTC (permalink / raw)
To: Christian König, AMD Graphics; +Cc: Alex Deucher
On 2022-12-12 18:44, Luben Tuikov wrote:
> On 2022-12-12 14:19, Christian König wrote:
>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
>>> requested memory exists, and to allow for non-exclusive domain allocations, as
>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>> AMDGPU_GEM_DOMAIN_GTT.
>>>
>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
>>>
>>> /*
>>> * If GTT is part of requested domains the check must succeed to
>>> - * allow fall back to GTT
>>> + * allow fall back to GTT.
>>> + *
>>> + * Note that allocations can request from either domain. For
>>> + * this reason, check either in non-exclusive way, and if
>>> + * neither satisfies, fail the validation.
>>
>> That's not correct, the original logic was completely intentional.
>>
>> If both VRAM and GTT are specified it's valid if the size fits only into
>> GTT.
>
> Given that this patch fixes a kernel oops, should this patch then fail the validation,
> i.e. return false?
>
> This would then fail, in amdgpu_ttm_reserve_tmr():
>
> ret = amdgpu_bo_create_kernel_at(adev,
> adev->gmc.real_vram_size - adev->mman.discovery_tmr_size,
> adev->mman.discovery_tmr_size,
> AMDGPU_GEM_DOMAIN_VRAM |
> AMDGPU_GEM_DOMAIN_GTT,
> &adev->mman.discovery_memory,
> NULL);
>
I've rewritten the patch to be minimal and just check that the memory manager
is not NULL. The patch follows this email.
Regards,
Luben
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v3] drm/amdgpu: Fix size validation for non-exclusive domains (v3)
2022-12-13 4:13 ` Luben Tuikov
@ 2022-12-13 4:16 ` Luben Tuikov
0 siblings, 0 replies; 11+ messages in thread
From: Luben Tuikov @ 2022-12-13 4:16 UTC (permalink / raw)
To: AMD Graphics; +Cc: Alex Deucher, Luben Tuikov, Christian König
Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
requested memory exists, else we get a kernel oops when dereferencing "man".
v2: Make the patch standalone, i.e. not dependent on local patches.
v3: Preserve old behaviour and just check that the manager pointer is not
NULL.
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index fd3ab4b5e5bb1f..43b01f8287ea52 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -448,12 +448,12 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
/*
* If GTT is part of requested domains the check must succeed to
- * allow fall back to GTT
+ * allow fall back to GTT.
*/
if (domain & AMDGPU_GEM_DOMAIN_GTT) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_TT);
- if (size < man->size)
+ if (man && size < man->size)
return true;
else
goto fail;
@@ -462,13 +462,12 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
if (domain & AMDGPU_GEM_DOMAIN_VRAM) {
man = ttm_manager_type(&adev->mman.bdev, TTM_PL_VRAM);
- if (size < man->size)
+ if (man && size < man->size)
return true;
else
goto fail;
}
-
/* TODO add more domains checks, such as AMDGPU_GEM_DOMAIN_CPU */
return true;
base-commit: 814a56e6601ce866638d4ef005f099119bee39e8
--
2.39.0.rc2
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-12 23:44 ` Luben Tuikov
2022-12-13 4:13 ` Luben Tuikov
@ 2022-12-13 7:00 ` Christian König
2022-12-13 9:52 ` Luben Tuikov
2022-12-13 11:40 ` Lazar, Lijo
1 sibling, 2 replies; 11+ messages in thread
From: Christian König @ 2022-12-13 7:00 UTC (permalink / raw)
To: Luben Tuikov, AMD Graphics; +Cc: Alex Deucher
Am 13.12.22 um 00:44 schrieb Luben Tuikov:
> On 2022-12-12 14:19, Christian König wrote:
>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
>>> requested memory exists, and to allow for non-exclusive domain allocations, as
>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>> AMDGPU_GEM_DOMAIN_GTT.
>>>
>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
>>>
>>> /*
>>> * If GTT is part of requested domains the check must succeed to
>>> - * allow fall back to GTT
>>> + * allow fall back to GTT.
>>> + *
>>> + * Note that allocations can request from either domain. For
>>> + * this reason, check either in non-exclusive way, and if
>>> + * neither satisfies, fail the validation.
>> That's not correct, the original logic was completely intentional.
>>
>> If both VRAM and GTT are specified it's valid if the size fits only into
>> GTT.
> Given that this patch fixes a kernel oops, should this patch then fail the validation,
> i.e. return false?
It should be sufficient if a BO fits into the GTT domain for size
validation. If we haven't initialized the GTT domain and end up here we
should probably just ignore it.
>
> This would then fail, in amdgpu_ttm_reserve_tmr():
>
> ret = amdgpu_bo_create_kernel_at(adev,
> adev->gmc.real_vram_size - adev->mman.discovery_tmr_size,
> adev->mman.discovery_tmr_size,
> AMDGPU_GEM_DOMAIN_VRAM |
> AMDGPU_GEM_DOMAIN_GTT,
As I said before using amdgpu_bo_create_kernel_at() with VRAM|GTT
doesn't make any sense at all. We should probably drop the domain
parameter altogether.
Regards,
Christian.
> &adev->mman.discovery_memory,
> NULL);
>
> Regards,
> Luben
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-13 7:00 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Christian König
@ 2022-12-13 9:52 ` Luben Tuikov
2022-12-13 11:40 ` Lazar, Lijo
1 sibling, 0 replies; 11+ messages in thread
From: Luben Tuikov @ 2022-12-13 9:52 UTC (permalink / raw)
To: Christian König, AMD Graphics; +Cc: Alex Deucher
On 2022-12-13 02:00, Christian König wrote:
> Am 13.12.22 um 00:44 schrieb Luben Tuikov:
>> On 2022-12-12 14:19, Christian König wrote:
>>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain manager for the
>>>> requested memory exists, and to allow for non-exclusive domain allocations, as
>>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>>> AMDGPU_GEM_DOMAIN_GTT.
>>>>
>>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
>>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct amdgpu_device *adev,
>>>>
>>>> /*
>>>> * If GTT is part of requested domains the check must succeed to
>>>> - * allow fall back to GTT
>>>> + * allow fall back to GTT.
>>>> + *
>>>> + * Note that allocations can request from either domain. For
>>>> + * this reason, check either in non-exclusive way, and if
>>>> + * neither satisfies, fail the validation.
>>> That's not correct, the original logic was completely intentional.
>>>
>>> If both VRAM and GTT are specified it's valid if the size fits only into
>>> GTT.
>> Given that this patch fixes a kernel oops, should this patch then fail the validation,
>> i.e. return false?
>
> It should be sufficient if a BO fits into the GTT domain for size
> validation. If we haven't initialized the GTT domain and end up here we
> should probably just ignore it.
By "should probably just ignore it" do you mean "return true;" or "return false;"?
>
>>
>> This would then fail, in amdgpu_ttm_reserve_tmr():
>>
>> ret = amdgpu_bo_create_kernel_at(adev,
>> adev->gmc.real_vram_size - adev->mman.discovery_tmr_size,
>> adev->mman.discovery_tmr_size,
>> AMDGPU_GEM_DOMAIN_VRAM |
>> AMDGPU_GEM_DOMAIN_GTT,
>
> As I said before using amdgpu_bo_create_kernel_at() with VRAM|GTT
> doesn't make any sense at all. We should probably drop the domain
> parameter altogether.
I agree. We can do this in another patch.
Regards,
Luben
>
> Regards,
> Christian.
>
>> &adev->mman.discovery_memory,
>> NULL);
>>
>> Regards,
>> Luben
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-13 7:00 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Christian König
2022-12-13 9:52 ` Luben Tuikov
@ 2022-12-13 11:40 ` Lazar, Lijo
2022-12-13 11:52 ` Christian König
1 sibling, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2022-12-13 11:40 UTC (permalink / raw)
To: amd-gfx
On 12/13/2022 12:30 PM, Christian König wrote:
> Am 13.12.22 um 00:44 schrieb Luben Tuikov:
>> On 2022-12-12 14:19, Christian König wrote:
>>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain
>>>> manager for the
>>>> requested memory exists, and to allow for non-exclusive domain
>>>> allocations, as
>>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>>> AMDGPU_GEM_DOMAIN_GTT.
>>>>
>>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19 +++++++------------
>>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct
>>>> amdgpu_device *adev,
>>>> /*
>>>> * If GTT is part of requested domains the check must succeed to
>>>> - * allow fall back to GTT
>>>> + * allow fall back to GTT.
>>>> + *
>>>> + * Note that allocations can request from either domain. For
>>>> + * this reason, check either in non-exclusive way, and if
>>>> + * neither satisfies, fail the validation.
>>> That's not correct, the original logic was completely intentional.
>>>
>>> If both VRAM and GTT are specified it's valid if the size fits only into
>>> GTT.
>> Given that this patch fixes a kernel oops, should this patch then fail
>> the validation,
>> i.e. return false?
>
> It should be sufficient if a BO fits into the GTT domain for size
> validation. If we haven't initialized the GTT domain and end up here we
> should probably just ignore it.
>
>>
>> This would then fail, in amdgpu_ttm_reserve_tmr():
>>
>> ret = amdgpu_bo_create_kernel_at(adev,
>> adev->gmc.real_vram_size - adev->mman.discovery_tmr_size,
>> adev->mman.discovery_tmr_size,
>> AMDGPU_GEM_DOMAIN_VRAM |
>> AMDGPU_GEM_DOMAIN_GTT,
>
> As I said before using amdgpu_bo_create_kernel_at() with VRAM|GTT
> doesn't make any sense at all. We should probably drop the domain
> parameter altogether.
>
What is the alternative planned to prevent usage of VRAM at fixed offsets?
BTW, AMDGPU_GEM_DOMAIN_GTT for above doesn't make any sense. Discovery
region is always in VRAM domain.
Thanks,
Lijo
> Regards,
> Christian.
>
>> &adev->mman.discovery_memory,
>> NULL);
>>
>> Regards,
>> Luben
>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-13 11:40 ` Lazar, Lijo
@ 2022-12-13 11:52 ` Christian König
2022-12-13 12:41 ` Lazar, Lijo
0 siblings, 1 reply; 11+ messages in thread
From: Christian König @ 2022-12-13 11:52 UTC (permalink / raw)
To: Lazar, Lijo, amd-gfx
Am 13.12.22 um 12:40 schrieb Lazar, Lijo:
> On 12/13/2022 12:30 PM, Christian König wrote:
>> Am 13.12.22 um 00:44 schrieb Luben Tuikov:
>>> On 2022-12-12 14:19, Christian König wrote:
>>>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain
>>>>> manager for the
>>>>> requested memory exists, and to allow for non-exclusive domain
>>>>> allocations, as
>>>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>>>> AMDGPU_GEM_DOMAIN_GTT.
>>>>>
>>>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>>>> ---
>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19
>>>>> +++++++------------
>>>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct
>>>>> amdgpu_device *adev,
>>>>> /*
>>>>> * If GTT is part of requested domains the check must
>>>>> succeed to
>>>>> - * allow fall back to GTT
>>>>> + * allow fall back to GTT.
>>>>> + *
>>>>> + * Note that allocations can request from either domain. For
>>>>> + * this reason, check either in non-exclusive way, and if
>>>>> + * neither satisfies, fail the validation.
>>>> That's not correct, the original logic was completely intentional.
>>>>
>>>> If both VRAM and GTT are specified it's valid if the size fits only
>>>> into
>>>> GTT.
>>> Given that this patch fixes a kernel oops, should this patch then
>>> fail the validation,
>>> i.e. return false?
>>
>> It should be sufficient if a BO fits into the GTT domain for size
>> validation. If we haven't initialized the GTT domain and end up here
>> we should probably just ignore it.
>>
>>>
>>> This would then fail, in amdgpu_ttm_reserve_tmr():
>>>
>>> ret = amdgpu_bo_create_kernel_at(adev,
>>> adev->gmc.real_vram_size -
>>> adev->mman.discovery_tmr_size,
>>> adev->mman.discovery_tmr_size,
>>> AMDGPU_GEM_DOMAIN_VRAM |
>>> AMDGPU_GEM_DOMAIN_GTT,
>>
>> As I said before using amdgpu_bo_create_kernel_at() with VRAM|GTT
>> doesn't make any sense at all. We should probably drop the domain
>> parameter altogether.
>>
>
> What is the alternative planned to prevent usage of VRAM at fixed
> offsets?
>
> BTW, AMDGPU_GEM_DOMAIN_GTT for above doesn't make any sense. Discovery
> region is always in VRAM domain.
Well that was my point, reserved regions are always in VRAM.
We probably don't need to ability to reserve in any other domain so we
can drop the domain parameter here and just always assume that we need VRAM.
Regards,
Christian.
>
> Thanks,
> Lijo
>
>> Regards,
>> Christian.
>>
>>> &adev->mman.discovery_memory,
>>> NULL);
>>>
>>> Regards,
>>> Luben
>>>
>>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains
2022-12-13 11:52 ` Christian König
@ 2022-12-13 12:41 ` Lazar, Lijo
0 siblings, 0 replies; 11+ messages in thread
From: Lazar, Lijo @ 2022-12-13 12:41 UTC (permalink / raw)
To: Christian König, amd-gfx
On 12/13/2022 5:22 PM, Christian König wrote:
> Am 13.12.22 um 12:40 schrieb Lazar, Lijo:
>> On 12/13/2022 12:30 PM, Christian König wrote:
>>> Am 13.12.22 um 00:44 schrieb Luben Tuikov:
>>>> On 2022-12-12 14:19, Christian König wrote:
>>>>> Am 12.12.22 um 18:48 schrieb Luben Tuikov:
>>>>>> Fix amdgpu_bo_validate_size() to check whether the TTM domain
>>>>>> manager for the
>>>>>> requested memory exists, and to allow for non-exclusive domain
>>>>>> allocations, as
>>>>>> there would be if the domain is a mask, e.g. AMDGPU_GEM_DOMAIN_VRAM |
>>>>>> AMDGPU_GEM_DOMAIN_GTT.
>>>>>>
>>>>>> Cc: Alex Deucher <Alexander.Deucher@amd.com>
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 19
>>>>>> +++++++------------
>>>>>> 1 file changed, 7 insertions(+), 12 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> index fd3ab4b5e5bb1f..e0f103f0ec2178 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
>>>>>> @@ -448,31 +448,26 @@ static bool amdgpu_bo_validate_size(struct
>>>>>> amdgpu_device *adev,
>>>>>> /*
>>>>>> * If GTT is part of requested domains the check must
>>>>>> succeed to
>>>>>> - * allow fall back to GTT
>>>>>> + * allow fall back to GTT.
>>>>>> + *
>>>>>> + * Note that allocations can request from either domain. For
>>>>>> + * this reason, check either in non-exclusive way, and if
>>>>>> + * neither satisfies, fail the validation.
>>>>> That's not correct, the original logic was completely intentional.
>>>>>
>>>>> If both VRAM and GTT are specified it's valid if the size fits only
>>>>> into
>>>>> GTT.
>>>> Given that this patch fixes a kernel oops, should this patch then
>>>> fail the validation,
>>>> i.e. return false?
>>>
>>> It should be sufficient if a BO fits into the GTT domain for size
>>> validation. If we haven't initialized the GTT domain and end up here
>>> we should probably just ignore it.
>>>
>>>>
>>>> This would then fail, in amdgpu_ttm_reserve_tmr():
>>>>
>>>> ret = amdgpu_bo_create_kernel_at(adev,
>>>> adev->gmc.real_vram_size -
>>>> adev->mman.discovery_tmr_size,
>>>> adev->mman.discovery_tmr_size,
>>>> AMDGPU_GEM_DOMAIN_VRAM |
>>>> AMDGPU_GEM_DOMAIN_GTT,
>>>
>>> As I said before using amdgpu_bo_create_kernel_at() with VRAM|GTT
>>> doesn't make any sense at all. We should probably drop the domain
>>> parameter altogether.
>>>
>>
>> What is the alternative planned to prevent usage of VRAM at fixed
>> offsets?
>>
>> BTW, AMDGPU_GEM_DOMAIN_GTT for above doesn't make any sense. Discovery
>> region is always in VRAM domain.
>
>
> Well that was my point, reserved regions are always in VRAM.
>
> We probably don't need to ability to reserve in any other domain so we
> can drop the domain parameter here and just always assume that we need
> VRAM.
>
Got it. Thanks!
Thanks,
Lijo
> Regards,
> Christian.
>
>>
>> Thanks,
>> Lijo
>>
>>> Regards,
>>> Christian.
>>>
>>>> &adev->mman.discovery_memory,
>>>> NULL);
>>>>
>>>> Regards,
>>>> Luben
>>>>
>>>
>
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2022-12-13 12:41 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-10 9:24 [PATCH] drm/amdgpu: Fix an oops with GTT | VRAM allocation Luben Tuikov
2022-12-12 17:48 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Luben Tuikov
2022-12-12 19:19 ` Christian König
2022-12-12 23:44 ` Luben Tuikov
2022-12-13 4:13 ` Luben Tuikov
2022-12-13 4:16 ` [PATCH v3] drm/amdgpu: Fix size validation for non-exclusive domains (v3) Luben Tuikov
2022-12-13 7:00 ` [PATCH v2] drm/amdgpu: Fix size validation for non-exclusive domains Christian König
2022-12-13 9:52 ` Luben Tuikov
2022-12-13 11:40 ` Lazar, Lijo
2022-12-13 11:52 ` Christian König
2022-12-13 12:41 ` Lazar, Lijo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox