AMD-GFX Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: "Khatri, Sunil" <sukhatri@amd.com>
To: "Christian König" <christian.koenig@amd.com>,
	"Sunil Khatri" <sunil.khatri@amd.com>,
	"Alex Deucher" <alexander.deucher@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu/userq: make sure only one reset work thread runs at a time
Date: Tue, 12 May 2026 15:01:24 +0530	[thread overview]
Message-ID: <3b683519-d634-47a5-a004-eaa9a7aa587c@amd.com> (raw)
In-Reply-To: <d16f309d-27fd-4f50-8e56-3839501a1196@amd.com>


On 12-05-2026 01:52 pm, Christian König wrote:
> On 5/12/26 09:04, Sunil Khatri wrote:
>> CPU0:   hang_detect_work → directly calls reset_work()
>> CPU1:   evict_all → queues reset_work (via workqueue)
>>
>> There is a possibility of two reset thread running at same time.
>> To avoid that we add a per queue manager flag to avoid duplication.
> Clear NAK, that doesn't make sense.
>
> All reset work must run on a single threaded reset queue, so only one work at a time can run.
>
> If multiple reset sources trigger at the same time (which is quite common) then the ones handled by a reset are canceled as soon as the reset is completed.

Got it probably the reason of two instances running is different as we 
discussed. Shared another patch for an open bug we found.

Regards
Sunil Khatri

>
> Regards,
> Christian.
>
>> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 16 ++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h |  1 +
>>   2 files changed, 17 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> index 0a1fc45f5b4e..1440f51b667f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c
>> @@ -109,6 +109,19 @@ static void amdgpu_userq_mgr_reset_work(struct work_struct *work)
>>   	if (!amdgpu_gpu_recovery)
>>   		return;
>>   
>> +	/*
>> +	 * Prevent concurrent/duplicate reset executions. Both hang_detect_work
>> +	 * (direct call) and evict_all (via schedule+flush_work) can invoke this
>> +	 * function simultaneously. Use an atomic test-and-set so only the first
>> +	 * caller proceeds; the second exits early.
>> +	 *
>> +	 * Note: amdgpu_in_reset() cannot be used here because in_gpu_reset is
>> +	 * only set deep inside amdgpu_device_gpu_recover(), well after we've
>> +	 * already entered this function.
>> +	 */
>> +	if (atomic_cmpxchg(&uq_mgr->reset_in_progress, 0, 1) != 0)
>> +		return;
>> +
>>   	/*
>>   	 * Iterate through all queue types to detect and reset problematic queues
>>   	 * Process each queue type in the defined order
>> @@ -145,6 +158,8 @@ static void amdgpu_userq_mgr_reset_work(struct work_struct *work)
>>   
>>   		amdgpu_device_gpu_recover(adev, NULL, &reset_context);
>>   	}
>> +
>> +	atomic_set(&uq_mgr->reset_in_progress, 0);
>>   }
>>   
>>   static void amdgpu_userq_hang_detect_work(struct work_struct *work)
>> @@ -1304,6 +1319,7 @@ int amdgpu_userq_mgr_init(struct amdgpu_userq_mgr *userq_mgr, struct drm_file *f
>>   
>>   	INIT_DELAYED_WORK(&userq_mgr->resume_work, amdgpu_userq_restore_worker);
>>   	INIT_WORK(&userq_mgr->reset_work, amdgpu_userq_mgr_reset_work);
>> +	atomic_set(&userq_mgr->reset_in_progress, 0);
>>   	return 0;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
>> index 49b33e2d6932..2748ecc0f6c9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h
>> @@ -129,6 +129,7 @@ struct amdgpu_userq_mgr {
>>   	 * Reset work which is used when eviction fails.
>>   	 */
>>   	struct work_struct		reset_work;
>> +	atomic_t			reset_in_progress;
>>   	atomic_t                        userq_count[AMDGPU_RING_TYPE_MAX];
>>   };
>>   

      reply	other threads:[~2026-05-12  9:31 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-12  7:04 [PATCH] drm/amdgpu/userq: make sure only one reset work thread runs at a time Sunil Khatri
2026-05-12  8:22 ` Christian König
2026-05-12  9:31   ` Khatri, Sunil [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3b683519-d634-47a5-a004-eaa9a7aa587c@amd.com \
    --to=sukhatri@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=christian.koenig@amd.com \
    --cc=sunil.khatri@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox