Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Maíra Canal" <mcanal@igalia.com>
To: phasta@kernel.org, "Matthew Brost" <matthew.brost@intel.com>,
	"Danilo Krummrich" <dakr@kernel.org>,
	"Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Tvrtko Ursulin" <tvrtko.ursulin@igalia.com>,
	"Simona Vetter" <simona@ffwll.ch>,
	"David Airlie" <airlied@gmail.com>,
	"Melissa Wen" <mwen@igalia.com>,
	"Lucas Stach" <l.stach@pengutronix.de>,
	"Russell King" <linux+etnaviv@armlinux.org.uk>,
	"Christian Gmeiner" <christian.gmeiner@gmail.com>,
	"Lucas De Marchi" <lucas.demarchi@intel.com>,
	"Thomas Hellström" <thomas.hellstrom@linux.intel.com>,
	"Rodrigo Vivi" <rodrigo.vivi@intel.com>,
	"Boris Brezillon" <boris.brezillon@collabora.com>,
	"Rob Herring" <robh@kernel.org>,
	"Steven Price" <steven.price@arm.com>,
	"Liviu Dudau" <liviu.dudau@arm.com>
Cc: kernel-dev@igalia.com, dri-devel@lists.freedesktop.org,
	etnaviv@lists.freedesktop.org, intel-xe@lists.freedesktop.org
Subject: Re: [PATCH v2 6/8] drm/etnaviv: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Date: Mon, 2 Jun 2025 08:36:24 -0300	[thread overview]
Message-ID: <177ea4e7-b05d-4ee0-8b5e-e3dfd67491a0@igalia.com> (raw)
In-Reply-To: <964e59ba1539083ef29b06d3c78f5e2e9b138ab8.camel@mailbox.org>

Hi Philipp,

On 02/06/25 04:28, Philipp Stanner wrote:
> On Fri, 2025-05-30 at 11:01 -0300, Maíra Canal wrote:

[...]

>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> index
>> 7146069a98492f5fab2a49d96e2054f649e1fe3d..46f5391e84a12232b247886cf13
>> 11f8e09f42f04 100644
>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c
>> @@ -40,11 +40,11 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>>   	int change;
>>   
>>   	/*
>> -	 * If the GPU managed to complete this jobs fence, the
>> timout is
>> -	 * spurious. Bail out.
>> +	 * If the GPU managed to complete this jobs fence, the
>> timeout has
>> +	 * fired before free-job worker. The timeout is spurious, so
>> bail out.
>>   	 */
>>   	if (dma_fence_is_signaled(submit->out_fence))
>> -		goto out_no_timeout;
>> +		return DRM_GPU_SCHED_STAT_NO_HANG;
>>   
>>   	/*
>>   	 * If the GPU is still making forward progress on the front-
>> end (which
>> @@ -70,7 +70,7 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>>   		gpu->hangcheck_dma_addr = dma_addr;
>>   		gpu->hangcheck_primid = primid;
>>   		gpu->hangcheck_fence = gpu->completed_fence;
>> -		goto out_no_timeout;
>> +		return DRM_GPU_SCHED_STAT_NO_HANG;
>>   	}
>>   
>>   	/* block scheduler */
>> @@ -86,10 +86,7 @@ static enum drm_gpu_sched_stat
>> etnaviv_sched_timedout_job(struct drm_sched_job
>>   	drm_sched_resubmit_jobs(&gpu->sched);
>>   
>>   	drm_sched_start(&gpu->sched, 0);
>> -	return DRM_GPU_SCHED_STAT_RESET;
>>   
>> -out_no_timeout:
>> -	list_add(&sched_job->list, &sched_job->sched->pending_list);
> 
> Here you actually remove the manipulation of the scheduler internals,
> but you didn't in v3d. Just to point that out.
> 
> 
> And BTW I'm just seeing that the pending_list gets manipulated here
> with the scheduler's workqueues running and no locks being hold.
> 
> Oh man :(
> 
> That is most certainly a bug, and I recommend that the etnaviv
> maintainers at least add the appropriate lock here and backport that
> since it can race any time.
> 
> 
> But thx for working on that, Maíra. Good that we can remove the stuff
> this way.
> 
> Thinking about it, this patch even fixes a bug. So could contain a
> Fixes: tag. But I'm not sure if it's worth it to mark the entire series
> for Stable. Opinions?

I believe the best way to fix this bug would be doing something similar
to what I did to v3d: send a temporary fix adding the locks, which will
be backported to stable. I'll send a fix to Etnaviv today.

Thanks for the review, Phillip!

Best Regards,
- Maíra

> 
> 
> P.
> 
> 
>>   	return DRM_GPU_SCHED_STAT_RESET;
>>   }
>>   
>>
> 


  reply	other threads:[~2025-06-02 11:36 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-30 14:01 [PATCH v2 0/8] drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
2025-05-30 14:01 ` [PATCH v2 1/8] drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET Maíra Canal
2025-05-30 14:01 ` [PATCH v2 2/8] drm/sched: Allow drivers to skip the reset and keep on running Maíra Canal
2025-06-02  7:06   ` Philipp Stanner
2025-05-30 14:01 ` [PATCH v2 3/8] drm/sched: Reduce scheduler's timeout for timeout tests Maíra Canal
2025-06-02  8:54   ` Philipp Stanner
2025-06-02  9:06   ` Tvrtko Ursulin
2025-05-30 14:01 ` [PATCH v2 4/8] drm/sched: Add new test for DRM_GPU_SCHED_STAT_NO_HANG Maíra Canal
2025-06-02  9:34   ` Tvrtko Ursulin
2025-05-30 14:01 ` [PATCH v2 5/8] drm/v3d: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset Maíra Canal
2025-06-02  7:13   ` Philipp Stanner
2025-06-02 11:27     ` Maíra Canal
2025-05-30 14:01 ` [PATCH v2 6/8] drm/etnaviv: " Maíra Canal
2025-06-02  7:28   ` Philipp Stanner
2025-06-02 11:36     ` Maíra Canal [this message]
2025-06-02 11:59       ` Philipp Stanner
2025-05-30 14:01 ` [PATCH v2 7/8] drm/xe: " Maíra Canal
2025-06-02  7:47   ` Philipp Stanner
2025-05-30 14:01 ` [PATCH v2 8/8] drm/panfrost: " Maíra Canal
2025-05-30 14:08 ` ✗ CI.Patch_applied: failure for drm/sched: Allow drivers to skip the reset with DRM_GPU_SCHED_STAT_NO_HANG Patchwork

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=177ea4e7-b05d-4ee0-8b5e-e3dfd67491a0@igalia.com \
    --to=mcanal@igalia.com \
    --cc=airlied@gmail.com \
    --cc=boris.brezillon@collabora.com \
    --cc=christian.gmeiner@gmail.com \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=dakr@kernel.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=etnaviv@lists.freedesktop.org \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=kernel-dev@igalia.com \
    --cc=l.stach@pengutronix.de \
    --cc=linux+etnaviv@armlinux.org.uk \
    --cc=liviu.dudau@arm.com \
    --cc=lucas.demarchi@intel.com \
    --cc=matthew.brost@intel.com \
    --cc=mwen@igalia.com \
    --cc=phasta@kernel.org \
    --cc=robh@kernel.org \
    --cc=rodrigo.vivi@intel.com \
    --cc=simona@ffwll.ch \
    --cc=steven.price@arm.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tvrtko.ursulin@igalia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox