* [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change
@ 2025-10-15 15:10 Jonathan Cavitt
2025-10-15 15:34 ` Matt Roper
0 siblings, 1 reply; 3+ messages in thread
From: Jonathan Cavitt @ 2025-10-15 15:10 UTC (permalink / raw)
To: intel-xe
Cc: saurabhg.gupta, alex.zuo, jonathan.cavitt, michal.wajdeczko,
matthew.d.roper
On today's driver, xe_device_get_gt(xe, 0); can never return NULL.
Hardware-wise there's always at least one tile, and every tilie has a
primary GT. If something went wrong during init of tile or GT and we
couldn't create/initialize the structures, then we already aborted the
device probe immediately and we'll never get further on to places in the
code that would be chasing a NULL pointer.
However, there's currently ongoing work to allow the primary GT to be
disabled via configfs for debugging purposes. Once that lands, it will
be possible for this query to return a NULL pointer. Expand the error
handling code for guc_g2g_alloc to manage this in the future.
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
---
drivers/gpu/drm/xe/xe_guc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
index d94490979adc..fbe66722e852 100644
--- a/drivers/gpu/drm/xe/xe_guc.c
+++ b/drivers/gpu/drm/xe/xe_guc.c
@@ -467,9 +467,13 @@ static int guc_g2g_alloc(struct xe_guc *guc)
if (gt->info.id != 0) {
struct xe_gt *root_gt = xe_device_get_gt(xe, 0);
- struct xe_guc *root_guc = &root_gt->uc.guc;
+ struct xe_guc *root_guc;
struct xe_bo *bo;
+ if (!root_gt)
+ return -ENODEV;
+
+ root_guc = &root_gt->uc.guc;
bo = xe_bo_get(root_guc->g2g.bo);
if (!bo)
return -ENODEV;
--
2.43.0
^ permalink raw reply related [flat|nested] 3+ messages in thread* Re: [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change
2025-10-15 15:10 [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change Jonathan Cavitt
@ 2025-10-15 15:34 ` Matt Roper
2025-10-15 16:07 ` Daniele Ceraolo Spurio
0 siblings, 1 reply; 3+ messages in thread
From: Matt Roper @ 2025-10-15 15:34 UTC (permalink / raw)
To: Jonathan Cavitt; +Cc: intel-xe, saurabhg.gupta, alex.zuo, michal.wajdeczko
On Wed, Oct 15, 2025 at 03:10:54PM +0000, Jonathan Cavitt wrote:
> On today's driver, xe_device_get_gt(xe, 0); can never return NULL.
> Hardware-wise there's always at least one tile, and every tilie has a
> primary GT. If something went wrong during init of tile or GT and we
> couldn't create/initialize the structures, then we already aborted the
> device probe immediately and we'll never get further on to places in the
> code that would be chasing a NULL pointer.
>
> However, there's currently ongoing work to allow the primary GT to be
> disabled via configfs for debugging purposes. Once that lands, it will
> be possible for this query to return a NULL pointer. Expand the error
> handling code for guc_g2g_alloc to manage this in the future.
Rather than blindly adding checks to any low-level code that calls
xe_device_get_gt() (and then raising errors that we hope will do
something reasonable when propagated up the call stack), I think we
should be putting more thought into what these functions are and how
they get called. Functions like guc_g2g_alloc() are something that get
called deep within the callstack of GuC-to-GuC communication. We should
be thinking about what the purpose/use of this code is. If GuC-to-GuC
always requires communication with the first tile's primary GT, then G2G
itself should just be disabled at the higher level so that we never call
down to this part of the code (e.g., in xe_guc_g2g_wanted that already
has other "is this possible?" checks). Or if there's other feasible use
cases (e.g., a media GuC on one tile talking to a media GuC on another
tile), then we should be changing the code to remove the hardcoded use
of "gt0" so that those use cases work.
Matt
>
> Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> Suggested-by: Matt Roper <matthew.d.roper@intel.com>
> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
> ---
> drivers/gpu/drm/xe/xe_guc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
> index d94490979adc..fbe66722e852 100644
> --- a/drivers/gpu/drm/xe/xe_guc.c
> +++ b/drivers/gpu/drm/xe/xe_guc.c
> @@ -467,9 +467,13 @@ static int guc_g2g_alloc(struct xe_guc *guc)
>
> if (gt->info.id != 0) {
> struct xe_gt *root_gt = xe_device_get_gt(xe, 0);
> - struct xe_guc *root_guc = &root_gt->uc.guc;
> + struct xe_guc *root_guc;
> struct xe_bo *bo;
>
> + if (!root_gt)
> + return -ENODEV;
> +
> + root_guc = &root_gt->uc.guc;
> bo = xe_bo_get(root_guc->g2g.bo);
> if (!bo)
> return -ENODEV;
> --
> 2.43.0
>
--
Matt Roper
Graphics Software Engineer
Linux GPU Platform Enablement
Intel Corporation
^ permalink raw reply [flat|nested] 3+ messages in thread* Re: [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change
2025-10-15 15:34 ` Matt Roper
@ 2025-10-15 16:07 ` Daniele Ceraolo Spurio
0 siblings, 0 replies; 3+ messages in thread
From: Daniele Ceraolo Spurio @ 2025-10-15 16:07 UTC (permalink / raw)
To: Matt Roper, Jonathan Cavitt
Cc: intel-xe, saurabhg.gupta, alex.zuo, michal.wajdeczko
On 10/15/2025 8:34 AM, Matt Roper wrote:
> On Wed, Oct 15, 2025 at 03:10:54PM +0000, Jonathan Cavitt wrote:
>> On today's driver, xe_device_get_gt(xe, 0); can never return NULL.
>> Hardware-wise there's always at least one tile, and every tilie has a
>> primary GT. If something went wrong during init of tile or GT and we
>> couldn't create/initialize the structures, then we already aborted the
>> device probe immediately and we'll never get further on to places in the
>> code that would be chasing a NULL pointer.
>>
>> However, there's currently ongoing work to allow the primary GT to be
>> disabled via configfs for debugging purposes. Once that lands, it will
>> be possible for this query to return a NULL pointer. Expand the error
>> handling code for guc_g2g_alloc to manage this in the future.
> Rather than blindly adding checks to any low-level code that calls
> xe_device_get_gt() (and then raising errors that we hope will do
> something reasonable when propagated up the call stack), I think we
> should be putting more thought into what these functions are and how
> they get called. Functions like guc_g2g_alloc() are something that get
> called deep within the callstack of GuC-to-GuC communication. We should
> be thinking about what the purpose/use of this code is. If GuC-to-GuC
> always requires communication with the first tile's primary GT, then G2G
> itself should just be disabled at the higher level so that we never call
> down to this part of the code (e.g., in xe_guc_g2g_wanted that already
> has other "is this possible?" checks). Or if there's other feasible use
> cases (e.g., a media GuC on one tile talking to a media GuC on another
> tile), then we should be changing the code to remove the hardcoded use
> of "gt0" so that those use cases work.
>
>
> Matt
The use of gt0 is purely for convenience here and the code could be
reworked to not use it.
A single object is allocated for all G2G buffers and to make things
simple the object is owned by gt0, so the logic (simplified) is as follows:
if (gt_id == 0) {
allocate buffer
} else {
fetch gt0
get gtg buffer from gt0
}
We could change the ownership to the first found GuC instead, or move
the object outside of the GuC structure entirely and into xe_device.
Daniele
>
>> Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Suggested-by: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> ---
>> drivers/gpu/drm/xe/xe_guc.c | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index d94490979adc..fbe66722e852 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -467,9 +467,13 @@ static int guc_g2g_alloc(struct xe_guc *guc)
>>
>> if (gt->info.id != 0) {
>> struct xe_gt *root_gt = xe_device_get_gt(xe, 0);
>> - struct xe_guc *root_guc = &root_gt->uc.guc;
>> + struct xe_guc *root_guc;
>> struct xe_bo *bo;
>>
>> + if (!root_gt)
>> + return -ENODEV;
>> +
>> + root_guc = &root_gt->uc.guc;
>> bo = xe_bo_get(root_guc->g2g.bo);
>> if (!bo)
>> return -ENODEV;
>> --
>> 2.43.0
>>
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2025-10-15 16:07 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15 15:10 [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change Jonathan Cavitt
2025-10-15 15:34 ` Matt Roper
2025-10-15 16:07 ` Daniele Ceraolo Spurio
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox