Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
To: Matt Roper <matthew.d.roper@intel.com>,
	Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <saurabhg.gupta@intel.com>,
	<alex.zuo@intel.com>, <michal.wajdeczko@intel.com>
Subject: Re: [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change
Date: Wed, 15 Oct 2025 09:07:43 -0700	[thread overview]
Message-ID: <bd3b5e8a-0961-4bf3-a79d-fa426627ed25@intel.com> (raw)
In-Reply-To: <20251015153440.GK1207432@mdroper-desk1.amr.corp.intel.com>



On 10/15/2025 8:34 AM, Matt Roper wrote:
> On Wed, Oct 15, 2025 at 03:10:54PM +0000, Jonathan Cavitt wrote:
>> On today's driver, xe_device_get_gt(xe, 0); can never return NULL.
>> Hardware-wise there's always at least one tile, and every tilie has a
>> primary GT.  If something went wrong during init of tile or GT and we
>> couldn't create/initialize the structures, then we already aborted the
>> device probe immediately and we'll never get further on to places in the
>> code that would be chasing a NULL pointer.
>>
>> However, there's currently ongoing work to allow the primary GT to be
>> disabled via configfs for debugging purposes.  Once that lands, it will
>> be possible for this query to return a NULL pointer.  Expand the error
>> handling code for guc_g2g_alloc to manage this in the future.
> Rather than blindly adding checks to any low-level code that calls
> xe_device_get_gt() (and then raising errors that we hope will do
> something reasonable when propagated up the call stack), I think we
> should be putting more thought into what these functions are and how
> they get called.  Functions like guc_g2g_alloc() are something that get
> called deep within the callstack of GuC-to-GuC communication.  We should
> be thinking about what the purpose/use of this code is.  If GuC-to-GuC
> always requires communication with the first tile's primary GT, then G2G
> itself should just be disabled at the higher level so that we never call
> down to this part of the code (e.g., in xe_guc_g2g_wanted that already
> has other "is this possible?" checks).  Or if there's other feasible use
> cases (e.g., a media GuC on one tile talking to a media GuC on another
> tile), then we should be changing the code to remove the hardcoded use
> of "gt0" so that those use cases work.
>
>
> Matt

The use of gt0 is purely for convenience here and the code could be 
reworked to not use it.
A single object is allocated for all G2G buffers and to make things 
simple the object is owned by gt0, so the logic (simplified) is as follows:

if (gt_id == 0) {
     allocate buffer
} else {
     fetch gt0
     get gtg buffer from gt0
}

We could change the ownership to the first found GuC instead, or move 
the object outside of the GuC structure entirely and into xe_device.

Daniele

>
>> Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
>> Suggested-by: Matt Roper <matthew.d.roper@intel.com>
>> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
>> ---
>>   drivers/gpu/drm/xe/xe_guc.c | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/xe/xe_guc.c b/drivers/gpu/drm/xe/xe_guc.c
>> index d94490979adc..fbe66722e852 100644
>> --- a/drivers/gpu/drm/xe/xe_guc.c
>> +++ b/drivers/gpu/drm/xe/xe_guc.c
>> @@ -467,9 +467,13 @@ static int guc_g2g_alloc(struct xe_guc *guc)
>>   
>>   	if (gt->info.id != 0) {
>>   		struct xe_gt *root_gt = xe_device_get_gt(xe, 0);
>> -		struct xe_guc *root_guc = &root_gt->uc.guc;
>> +		struct xe_guc *root_guc;
>>   		struct xe_bo *bo;
>>   
>> +		if (!root_gt)
>> +			return -ENODEV;
>> +
>> +		root_guc = &root_gt->uc.guc;
>>   		bo = xe_bo_get(root_guc->g2g.bo);
>>   		if (!bo)
>>   			return -ENODEV;
>> -- 
>> 2.43.0
>>


      reply	other threads:[~2025-10-15 16:07 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-15 15:10 [PATCH] drm/xe/xe_guc: Expand guc_g2g_alloc error hanlding for future change Jonathan Cavitt
2025-10-15 15:34 ` Matt Roper
2025-10-15 16:07   ` Daniele Ceraolo Spurio [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bd3b5e8a-0961-4bf3-a79d-fa426627ed25@intel.com \
    --to=daniele.ceraolospurio@intel.com \
    --cc=alex.zuo@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=jonathan.cavitt@intel.com \
    --cc=matthew.d.roper@intel.com \
    --cc=michal.wajdeczko@intel.com \
    --cc=saurabhg.gupta@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox