Intel-XE Archive on lore.kernel.org
 help / color / mirror / Atom feed
From: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
To: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Cc: <intel-xe@lists.freedesktop.org>, <matthew.brost@intel.com>,
	<stuart.summers@intel.com>
Subject: Re: [PATCH v2 3/9] drm/xe/multi_queue: Store primary LRC and position info in LRC
Date: Tue, 5 May 2026 11:51:33 -0700	[thread overview]
Message-ID: <afo8Nb87noZPWK9D@soc-5CG1426VCC.clients.intel.com> (raw)
In-Reply-To: <afo6ru4u83dDRJFC@nvishwa1-desk>

On Tue, May 05, 2026 at 11:45:02AM -0700, Niranjana Vishwanathapura wrote:
>On Tue, May 05, 2026 at 11:35:23AM -0700, Umesh Nerlige Ramappa wrote:
>>On Mon, May 04, 2026 at 08:46:10PM -0700, Niranjana Vishwanathapura wrote:
>>>On Fri, May 01, 2026 at 05:53:36PM -0700, Umesh Nerlige Ramappa wrote:
>>>>Given an LRC belonging to the secondary queue, in order to check if its
>>>>context group is active, we need to check the LRC of the primary queue.
>>>>In addition to that we want to compare the secondary queue position to
>>>>CSMQDEBUG register to check if the queue itself is active.
>>>>
>>>>To do so, store primary LRC and position information in the LRC as well
>>>>as take a reference to the primary LRC from each LRC in the queue group.
>>>>
>>>>A note on references involved:
>>>>
>>>>- In general the Queue takes a ref on its LRC.
>>>>- In addition, for multi-queue,
>>>>a. Primary Queue takes a ref for each Secondary LRC.
>>>>b. Each Secondary Queue takes a ref to the Primary Queue
>>>>
>>>>In the current patch, each LRC in the queue group is storing a pointer
>>>>to Primary LRC. There is a small window of time in the primary queue
>>>>free path where the primary LRC may be freed before the secondary LRC.
>>>>
>>>>__xe_exec_queue_fini(q); // frees|puts primary q LRCs
>>>>...
>>>>window where secondary Q LRC is pointing to invalid primary LRC
>>>>...
>>>>__xe_exec_queue_free(q); // frees|puts secondary q LRCs in multi-Q case
>>>>
>>>>In this window the reference in Secondary LRC is invalid. While there
>>>>may be nothing accessing the secondary LRCs reference, to be safe, this
>>>>patch is taking a reference to Primary LRC from the secondary LRC.
>>>>
>>>>Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
>>>>---
>>>>v2:
>>>>- Store primary LRC instead of primary queue (Niranjana)
>>>>- Drop the valid flag and check if primary_lrc is NULL (Niranjana)
>>>>- Document/Revisit references (Matt/Umesh)
>>>>---
>>>>drivers/gpu/drm/xe/xe_exec_queue.c | 23 ++++++++++++++++++++---
>>>>drivers/gpu/drm/xe/xe_lrc.h        |  5 +++++
>>>>drivers/gpu/drm/xe/xe_lrc_types.h  |  8 ++++++++
>>>>3 files changed, 33 insertions(+), 3 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>index b287d0e0e60a..e34601d28520 100644
>>>>--- a/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>+++ b/drivers/gpu/drm/xe/xe_exec_queue.c
>>>>@@ -129,8 +129,14 @@ static void xe_exec_queue_group_cleanup(struct xe_exec_queue *q)
>>>>		return;
>>>>
>>>>	/* Primary queue cleanup */
>>>>-	xa_for_each(&group->xa, idx, lrc)
>>>>+	xa_for_each(&group->xa, idx, lrc) {
>>>>+		/* drop secondary lrc ref to primary lrc */
>>>>+		xe_lrc_put(lrc->multi_queue.primary_lrc);
>>>>+		/* drop primary queue ref to secondary lrc */
>>>>		xe_lrc_put(lrc);
>>>>+	}
>>>>+	/* drop primary lrc ref to itself */
>>>>+	xe_lrc_put(q->lrc[0]);
>>>>
>>>>	xa_destroy(&group->xa);
>>>>	mutex_destroy(&group->list_lock);
>>>>@@ -275,8 +281,15 @@ static void xe_exec_queue_set_lrc(struct xe_exec_queue *q, struct xe_lrc *lrc, u
>>>>{
>>>>	xe_assert(gt_to_xe(q->gt), idx < q->width);
>>>>
>>>>-	scoped_guard(spinlock, &q->lrc_lookup_lock)
>>>>+	scoped_guard(spinlock, &q->lrc_lookup_lock) {
>>>>		q->lrc[idx] = lrc;
>>>>+		if (xe_exec_queue_is_multi_queue(q)) {
>>>>+			struct xe_lrc *primary_lrc = q->multi_queue.group->primary->lrc[0];
>>>>+
>>>>+			lrc->multi_queue.pos = q->multi_queue.pos;
>>>
>>>I think q->multi_queue.pos is not yet set for secondary queues at this point.
>>>It is set later in the xe_exec_queue_group_add() call.
>>
>>Hmm, I recall seeing that a while ago and that's why I leaned toward 
>>having a lrc->q (back reference to the queue that the LRC belongs 
>>to, but that's prohibited). Anyways, I will try to move this logic 
>>outsize to a late point then.
>>
>>>
>>>>+			lrc->multi_queue.primary_lrc = xe_lrc_get(primary_lrc);
>>>
>>>I think we don't need to get/put the primary_lrc reference.
>>>Each queue holds a reference to its LRC. The secondary queues holds a reference
>>>to the primary queue. So, essentially, the secondary LRCs are holding a reference
>>>to primary lrc. So, I think, we don't need to hold reference again.
>>
>>I would look at the queue and LRC as different objects in memory.
>>
>>In taking a reference here, I am addressing only one specific corner 
>>case:
>>
>>- Consider all secondary queues are freed
>>- Since primary queue holds a reference to all secondary LRCs, all   
>>secondary LRCs are still in memory.
>>- When all references to primary queue are released, we end up in   
>>xe_exec_queue_fini(primary_q).
>>
>>In this function the LRC for primary_q is freed first, and then the 
>>secondary_q lrcs are freed, so there is a small window of time where 
>>the reference from secondary LRC to primary LRC is invalid. IMO, 
>>that small windows of time is enough for strange issues.
>>
>
>By that time, all secondary queues have been deregistered and destroyed.
>So, there shouldn't be any secondary LRC trying to access the primary
>LRC. May be we can set primary_lrc to NULL while releaseing LRC reference
>which should be sane enough and we really don't need primary_lrc reference
>taking.

ok, that's right, I will drop this reference and update the commit 
message.

Thanks,
Umesh
>
>Niranjana
>
>>Umesh
>>>
>>>Niranjana
>>>
>>>>+		}
>>>>+	}
>>>>}
>>>>
>>>>/**
>>>>@@ -388,8 +401,12 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q, u32 exec_queue_flags)
>>>>
>>>>			xe_exec_queue_set_lrc(q, lrc, i);
>>>>
>>>>-			if (__lrc)
>>>>+			if (__lrc) {
>>>>+				if (xe_exec_queue_is_multi_queue(q))
>>>>+					xe_lrc_put(__lrc->multi_queue.primary_lrc);
>>>>+
>>>>				xe_lrc_put(__lrc);
>>>>+			}
>>>>			__lrc = lrc;
>>>>
>>>>		} while (marker != xe_vf_migration_fixups_complete_count(q->gt));
>>>>diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h
>>>>index 97aef0327fc8..3d0bf4a7bfa0 100644
>>>>--- a/drivers/gpu/drm/xe/xe_lrc.h
>>>>+++ b/drivers/gpu/drm/xe/xe_lrc.h
>>>>@@ -91,6 +91,11 @@ static inline size_t xe_lrc_ring_size(void)
>>>>	return SZ_16K;
>>>>}
>>>>
>>>>+static inline bool xe_lrc_is_multi_queue(struct xe_lrc *lrc)
>>>>+{
>>>>+	return lrc->multi_queue.primary_lrc;
>>>>+}
>>>>+
>>>>size_t xe_gt_lrc_hang_replay_size(struct xe_gt *gt, enum xe_engine_class class);
>>>>size_t xe_gt_lrc_size(struct xe_gt *gt, enum xe_engine_class class);
>>>>u32 xe_lrc_pphwsp_offset(struct xe_lrc *lrc);
>>>>diff --git a/drivers/gpu/drm/xe/xe_lrc_types.h b/drivers/gpu/drm/xe/xe_lrc_types.h
>>>>index 5a718f759ed6..0a5c13ec2ad7 100644
>>>>--- a/drivers/gpu/drm/xe/xe_lrc_types.h
>>>>+++ b/drivers/gpu/drm/xe/xe_lrc_types.h
>>>>@@ -63,6 +63,14 @@ struct xe_lrc {
>>>>
>>>>	/** @ctx_timestamp: readout value of CTX_TIMESTAMP on last update */
>>>>	u64 ctx_timestamp;
>>>>+
>>>>+	/** @multi_queue: Multi queue LRC related information */
>>>>+	struct {
>>>>+		/** @multi_queue.primary_lrc: Primary lrc of this multi-queue group*/
>>>>+		struct xe_lrc *primary_lrc;
>>>>+		/** @multi_queue.pos: Position of LRC within the multi-queue group */
>>>>+		u8 pos;
>>>>+	} multi_queue;
>>>>};
>>>>
>>>>struct xe_lrc_snapshot;
>>>>-- 
>>>>2.43.0
>>>>

  reply	other threads:[~2026-05-05 18:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-02  0:53 [PATCH v2 0/9] Support run ticks for multi-queue use case Umesh Nerlige Ramappa
2026-05-02  0:53 ` [PATCH v2 1/9] drm/xe/lrc: Use 64 bit ctx timestamp in the LRC snapshot Umesh Nerlige Ramappa
2026-05-04 23:51   ` Niranjana Vishwanathapura
2026-05-02  0:53 ` [PATCH v2 2/9] drm/xe: Add timestamp_ms to " Umesh Nerlige Ramappa
2026-05-04 23:59   ` Niranjana Vishwanathapura
2026-05-05 18:03     ` Umesh Nerlige Ramappa
2026-05-02  0:53 ` [PATCH v2 3/9] drm/xe/multi_queue: Store primary LRC and position info in LRC Umesh Nerlige Ramappa
2026-05-05  3:46   ` Niranjana Vishwanathapura
2026-05-05 18:35     ` Umesh Nerlige Ramappa
2026-05-05 18:45       ` Niranjana Vishwanathapura
2026-05-05 18:51         ` Umesh Nerlige Ramappa [this message]
2026-05-02  0:53 ` [PATCH v2 4/9] drm/xe/multi_queue: Add helpers to access CS QUEUE TIMESTAMP from lrc Umesh Nerlige Ramappa
2026-05-05  4:00   ` Niranjana Vishwanathapura
2026-05-02  0:53 ` [PATCH v2 5/9] drm/xe/lrc: Refactor out engine id to hwe conversion Umesh Nerlige Ramappa
2026-05-05  4:16   ` Niranjana Vishwanathapura
2026-05-02  0:53 ` [PATCH v2 6/9] drm/xe/multi_queue: Capture queue run times for active queues Umesh Nerlige Ramappa
2026-05-05  4:12   ` Niranjana Vishwanathapura
2026-05-05 19:02     ` Umesh Nerlige Ramappa
2026-05-02  0:53 ` [PATCH v2 7/9] drm/xe/multi_queue: Add trace event for the multi queue timestamp Umesh Nerlige Ramappa
2026-05-05  4:19   ` Niranjana Vishwanathapura
2026-05-02  0:53 ` [PATCH v2 8/9] drm/xe/multi_queue: Use QUEUE_TIMESTAMP as job timestamp for multi-queue Umesh Nerlige Ramappa
2026-05-05  4:20   ` Niranjana Vishwanathapura
2026-05-02  0:53 ` [PATCH v2 9/9] drm/xe/multi_queue: Whitelist QUEUE_TIMESTAMP register Umesh Nerlige Ramappa
2026-05-05  4:25   ` Niranjana Vishwanathapura
2026-05-05 17:58     ` Umesh Nerlige Ramappa
2026-05-05 18:34       ` Niranjana Vishwanathapura
2026-05-05 19:06         ` Umesh Nerlige Ramappa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afo8Nb87noZPWK9D@soc-5CG1426VCC.clients.intel.com \
    --to=umesh.nerlige.ramappa@intel.com \
    --cc=intel-xe@lists.freedesktop.org \
    --cc=matthew.brost@intel.com \
    --cc=niranjana.vishwanathapura@intel.com \
    --cc=stuart.summers@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox